Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.
Rieneck, Klaus; Bak, Mads; Jønson, Lars
the feasibility of predicting the fetal KEL1 phenotype using next-generation sequencing (NGS) technology. STUDY DESIGN AND METHODS: The KEL1/2 single-nucleotide polymorphism was polymerase chain reaction (PCR) amplified with one adjoining base, and the PCR product was sequenced using a genome analyzer (GAIIx......, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...
Forster, Michael; Forster, Peter; Elsharawy, Abdou; Hemmrich, Georg; Kreck, Benjamin; Wittig, Michael; Thomsen, Ingo; Stade, Björn; Barann, Matthias; Ellinghaus, David; Petersen, Britt-Sabina; May, Sandra; Melum, Espen; Schilhabel, Markus B.; Keller, Andreas; Schreiber, Stefan; Rosenstiel, Philip; Franke, Andre
Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5–60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences. PMID:22965131
Zhang, Yuan; Sun, Yanni; Cole, James R
Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.
Full Text Available Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.
Scholtalbers, J.; Rossler, J.; Sorn, P.; Graaf, J. de; Boisguerin, V.; Castle, J.; Sahin, U.
SUMMARY: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input
Chitty, Lyn S; Mason, Sarah; Barrett, Angela N; McKay, Fiona; Lench, Nicholas; Daley, Rebecca; Jenkins, Lucy A
Abstract Objective Accurate prenatal diagnosis of genetic conditions can be challenging and usually requires invasive testing. Here, we demonstrate the potential of next-generation sequencing (NGS) for the analysis of cell-free DNA in maternal blood to transform prenatal diagnosis of monogenic disorders. Methods Analysis of cell-free DNA using a PCR and restriction enzyme digest (PCR–RED) was compared with a novel NGS assay in pregnancies at risk of achondroplasia and thanatophoric dysplasia. Results PCR–RED was performed in 72 cases and was correct in 88.6%, inconclusive in 7% with one false negative. NGS was performed in 47 cases and was accurate in 96.2% with no inconclusives. Both approaches were used in 27 cases, with NGS giving the correct result in the two cases inconclusive with PCR–RED. Conclusion NGS provides an accurate, flexible approach to non-invasive prenatal diagnosis of de novo and paternally inherited mutations. It is more sensitive than PCR–RED and is ideal when screening a gene with multiple potential pathogenic mutations. These findings highlight the value of NGS in the development of non-invasive prenatal diagnosis for other monogenic disorders. © 2015 The Authors. Prenatal Diagnosis published by John Wiley & Sons, Ltd. What's already known about this topic? Non-invasive prenatal diagnosis (NIPD) using PCR-based methods has been reported for the detection or exclusion of individual paternally inherited or de novo alleles in maternal plasma. What does this study add? NIPD using next generation sequencing provides an accurate, more sensitive approach which can be used to detect multiple mutations in a single assay and so is ideal when screening a gene with multiple potential pathogenic mutations. Next generation sequencing thus provides a flexible approach to non-invasive prenatal diagnosis ideal for use in a busy service laboratory. PMID:25728633
Rasmussen, Maria; Sunde, Lone; Nielsen, Marlene Louise
Aim and Introduction Identification of abnormal kidneys in the fetus may lead to termination of the pregnancy and raises questions about the underlying cause and recurrence risk in future pregnancies. In this study, we investigate the effectiveness of targeted next generation sequencing in fetuse...... no mutations were identified, have been selected for exome sequencing in order to uncover novel genes associated to fetal kidney anomalies.......Aim and Introduction Identification of abnormal kidneys in the fetus may lead to termination of the pregnancy and raises questions about the underlying cause and recurrence risk in future pregnancies. In this study, we investigate the effectiveness of targeted next generation sequencing in fetuses...... postmortem examination. The approximately 110 genes included in the targeted panel were chosen on the basis of their potential involvement in embryonic kidney development, cystic kidney disease, or the renin-angiotensin system. DNA was extracted from fetal tissue samples or cultured chorion villus cells...
Full Text Available Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA, also known as neonatal-onset multisystem inflammatory disease (NOMID. In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program—which detects and avoids common SNPs in gene-specific PCR primers—we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of
Carvill, Gemma L.; Mefford, Heather C.
Next-generation sequencing technologies have revolutionized gene discovery in patients with intellectual disability (ID) and led to an unprecedented expansion in the number of genes implicated in this disorder. We discuss the strategies that have been used to identify these novel genes for both syndromic and nonsyndromic ID and highlight the phenotypic and genetic heterogeneity that underpin this condition. Finally, we discuss the future of defining the genetic etiology of ID, including the r...
Lőrinc S Pongor
Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/
Chitty, Lyn S; Mason, Sarah; Barrett, Angela N; McKay, Fiona; Lench, Nicholas; Daley, Rebecca; Jenkins, Lucy A
Accurate prenatal diagnosis of genetic conditions can be challenging and usually requires invasive testing. Here, we demonstrate the potential of next-generation sequencing (NGS) for the analysis of cell-free DNA in maternal blood to transform prenatal diagnosis of monogenic disorders. Analysis of cell-free DNA using a PCR and restriction enzyme digest (PCR-RED) was compared with a novel NGS assay in pregnancies at risk of achondroplasia and thanatophoric dysplasia. PCR-RED was performed in 72 cases and was correct in 88.6%, inconclusive in 7% with one false negative. NGS was performed in 47 cases and was accurate in 96.2% with no inconclusives. Both approaches were used in 27 cases, with NGS giving the correct result in the two cases inconclusive with PCR-RED. NGS provides an accurate, flexible approach to non-invasive prenatal diagnosis of de novo and paternally inherited mutations. It is more sensitive than PCR-RED and is ideal when screening a gene with multiple potential pathogenic mutations. These findings highlight the value of NGS in the development of non-invasive prenatal diagnosis for other monogenic disorders. © 2015 John Wiley & Sons, Ltd.
Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...
Efthymiou, S; Manole, A; Houlden, H
Purpose of review Neuromuscular diseases are clinically and genetically heterogeneous and probably contains the greatest proportion of causative Mendelian defects than any other group of conditions. These disorders affect muscle and/or nerves with neonatal, childhood or adulthood onset, with significant disability and early mortality. Along with heterogeneity, unidentified and often very large genes, require complementary and comprehensive methods in routine molecular diagnosis. Inevitably this leads to increased diagnostic delays and challenges in the interpretation of genetic variants. Recent findings The application of next-generation sequencing, as a research and diagnostic strategy has made significant progress into solving many of these problems. The analysis of these data is by no means simple and the clinical input is essential to interpret results. Summary In this review, we describe using examples the recent advances in the genetic diagnosis of neuromuscular disorders, in research and clinical practice and the latest developments that are underway in NGS. We also discuss the latest collaborative initiatives such as the Genomics England genome sequencing project that combine rare disease clinical phenotyping with genomics, with the aim of defining the vast majority of rare disease genes in patients as well as modifying risks and pharmacogenomics factors. PMID:27588584
McDaniel, Andrew S.; Stall, Jennifer N.; Hovelson, Daniel H.; Cani, Andi K.; Liu, Chia-Jen; Tomlins, Scott A.; Cho, Kathleen R.
Importance High-grade serous carcinoma (HGSC) is the most prevalent and lethal form of ovarian cancer. HGSCs frequently arise in the distal fallopian tubes rather than the ovary, developing from small precursor lesions called serous tubal intraepithelial carcinomas (TICs or more specifically STICs). While STICs have been reported to harbor TP53 mutations, detailed molecular characterizations of these lesions are lacking. Observations We performed targeted next generation sequencing (NGS) on formalin-fixed, paraffin- embedded tissue from four women, two with HGSC and two with uterine endometrioid carcinoma (UEC) who were diagnosed with synchronous STICs. We detected concordant mutations in both HGSCs with synchronous STICs, including TP53 mutations as well as assumed germline BRCA1/2 alterations, confirming a clonal relationship between these lesions. NGS confirmed the presence of a STIC clonally unrelated to one case of UEC. NGS of the other tubal lesion diagnosed as a STIC unexpectedly supported the lesion as a micrometastasis from the associated UEC. Conclusions and Relevance We demonstrate that targeted NGS can identify genetic lesions in minute lesions such as TICs, and confirm TP53 mutations as early driving events for HGSC. NGS also demonstrated unexpected relationships between presumed STICs and synchronous carcinomas, suggesting potential diagnostic and translational research applications. PMID:26181193
Larsen, Martin Jakob; Burton, Mark; Thomassen, Mads
advantage of targeted NGS is that multiple disease-specific genes can easily be sequenced simultaneously, which is favorable in genetic heterogeneous diseases. Prior to implementation in our diagnostic setting, we aimed to assess the sensitivity and specificity of targeted NGS by sequencing a collection......, respectively. For diagnostics, the sequencing coverage is essential, wherefore a minimum coverage of 30x per nucleotide in the coding regions was used as our primary quality criterion. For the majority of the included genes, we obtained adequate gene coverage, in which we were able to detect 100% of the known......Accurate mutation detection is essential in clinical genetic diagnostics of monogenic hereditary diseases. Targeted next generation sequencing (NGS) provides a promising and cost-effective alternative to Sanger sequencing and MLPA analysis currently used in most diagnostic laboratories. One...
Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni
Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies......, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...
Penelope K Lindeque
Full Text Available BACKGROUND: Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel. METHODOLOGY/PRINCIPLE FINDINGS: Plankton net hauls (200 µm were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups. CONCLUSIONS: Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may
Full Text Available Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome (~3.2 Mb using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14 or triplicate (n=3 to assess concordance of all calls and single nucleotide variant (SNV calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC, variant allele frequency (VAF, variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA with accession number EGAS00001000826.
Qi, Yuan; Liu, Xiuping; Liu, Chang-gong; Wang, Bailing; Hess, Kenneth R.; Symmans, W. Fraser; Shi, Weiwei; Pusztai, Lajos
Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome) (~3.2 Mb) using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14) or triplicate (n=3) to assess concordance of all calls and single nucleotide variant (SNV) calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC), variant allele frequency (VAF), variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA) with accession number EGAS00001000826. PMID:26136146
Faber, Pieter [University of Chicago
The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.
Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni; Grove, Jakob; Mandrup, Susanne; Hougaard, David Michael
Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived neonatal DBSS (neoDBSS) stored at -20°C in the Danish Newborn Screening Biobank. The reference samples were genotyped using an Illumina Omni2.5M array, and all samples were sequenced on a HighSeq2000 Paired-End flow cell. First, we compared the array single nucleotide polymorphism (SNP) genotype data to the single nucleotide variation (SNV) calls from the WGS and WES SNV calls. We also compared the WGS and WES reference sample SNV calls to the DBSS SNV calls. The overall performance of the archived DBSS was similar to the whole blood reference sample. Plotting the error rates relative to coverage revealed that the error rates of DBSS were similar to that of their reference samples. SNVs called with a coverage<×8 had error rates between 1.5 and 35%, whereas the error rates of SNVs called with a coverage≥8 were <1.5%. In conclusion, the wgaDNA amplified from both new and old neonatal DBSS perform as well as their whole-blood reference samples with regards to error rates, strongly indicating that neonatal DBSS collected shortly after birth and stored for decades comprise an
Full Text Available Epilepsy is a neurological disorder characterized by an increased predisposition for seizures. Although this definition suggests that it is a single disorder, epilepsy encompasses a group of disorders with diverse aetiologies and outcomes. A genetic basis for epilepsy syndromes has been postulated for several decades, with several mutations in specific genes identified that have increased our understanding of the genetic influence on epilepsies. With 70-80% of epilepsy cases identified to have a genetic cause, there are now hundreds of genes identified to be associated with epilepsy syndromes which can be analyzed using next generation sequencing (NGS techniques such as targeted gene panels, whole exome sequencing (WES and whole genome sequencing (WGS. For effective use of these methodologies, diagnostic laboratories and clinicians require information on the relevant workflows including analysis and sequencing depth to understand the specific clinical application and diagnostic capabilities of these gene sequencing techniques. As epilepsy is a complex disorder, the differences associated with each technique influence the ability to form a diagnosis along with an accurate detection of the genetic etiology of the disorder. In addition, for diagnostic testing, an important parameter is the cost-effectiveness and the specific diagnostic outcome of each technique. Here, we review these commonly used NGS techniques to determine their suitability for application to epilepsy genetic diagnostic testing.
Houtgast, E.J.; Sima, V.M.; Bertels, K.L.M.; Al-Ars, Z.
We are rapidly entering the era of genomics. The dramatic cost reduction of DNA sequencing due to the introduction of Next Generation Sequencing (NGS) techniques has resulted in an exponential growth of genetics data. The amount of data generated, and its associated processing into useful
Zoll, Jan; Snelders, Eveline; Verweij, Paul E; Melchers, Willem J G
New state-of-the-art techniques in sequencing offer valuable tools in both detection of mycobiota and in understanding of the molecular mechanisms of resistance against antifungal compounds and virulence. Introduction of new sequencing platform with enhanced capacity and a reduction in costs for sequence analysis provides a potential powerful tool in mycological diagnosis and research. In this review, we summarize the applications of next-generation sequencing techniques in mycology.
pathogenic bacteria in water. Recent technological advances in next-generation sequencing (NGS) offer better prospects for detection of pathogenic microorganisms and investigating their diversity . NGS can quickly generate huge amounts of DNA reads, and the technique is affordable . In this study, the pathogens ...
Until recently, the focus in dental research has been on studying a small fraction of the oral microbiome—so-called opportunistic pathogens. With the advent of next-generation sequencing (NGS) technologies, researchers now have the tools that allow for profiling of the microbiomes and metagenomes at
Skotte, Line; Korneliussen, Thorfinn Sand; Albrechtsen, Anders
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture...
Møller, Rikke S.; Dahl, Hans A.; Helbig, Ingo
to as epileptic encephalopathies. The increased knowledge about causative genetic variants has had a major impact on diagnosis of genetic epilepsies and has already been translated into treatment recommendations for a few genes. This article provides an overview of how next generation sequencing has advanced our...
Masser, Dustin R; Stanford, David R; Freeman, Willard M
The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.
Bräutigam, Andrea; Gowik, Udo
Next generation sequencing (NGS) technologies have opened fascinating opportunities for the analysis of plants with and without a sequenced genome on a genomic scale. During the last few years, NGS methods have become widely available and cost effective. They can be applied to a wide variety of biological questions, from the sequencing of complete eukaryotic genomes and transcriptomes, to the genome-scale analysis of DNA-protein interactions. In this review, we focus on the use of NGS for pla...
Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske
The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation...... rates that previously were obtained only from extrapolations of results from in vitro kinetic experiments performed over short timescales. For example, recent next-generation sequencing of ancient DNA reveals purine bases as one of the main targets of postmortem hydrolytic damage, through base...... elimination and strand breakage. It also shows substantially increased rates of DNA base-loss at guanosine. In this review, we argue that the latter results from an electron resonance structure unique to guanosine rather than adenosine having an extra resonance structure over guanosine as previously suggested....
Full Text Available Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care.
Amitrano, Sara; Marozza, Annabella; Somma, Serena; Imperatore, Valentina; Hadjistilianou, Theodora; De Francesco, Sonia; Toti, Paolo; Galimberti, Daniela; Meloni, Ilaria; Cetta, Francesco; Piu, Pietro; Di Marco, Chiara; Dosa, Laura; Lo Rizzo, Caterina; Carignani, Giulia; Mencarelli, Maria Antonietta; Mari, Francesca; Renieri, Alessandra; Ariani, Francesca
In about 50% of sporadic cases of retinoblastoma, no constitutive RB1 mutations are detected by conventional methods. However, recent research suggests that, at least in some of these cases, there is somatic mosaicism with respect to RB1 normal and mutant alleles. The increased availability of next generation sequencing improves our ability to detect the exact percentage of patients with mosaicism. Using this technology, we re-tested a series of 40 patients with sporadic retinoblastoma: 10 of them had been previously classified as constitutional heterozygotes, whereas in 30 no RB1 mutations had been found in lymphocytes. In 3 of these 30 patients, we have now identified low-level mosaic variants, varying in frequency between 8 and 24%. In 7 out of the 10 cases previously classified as heterozygous from testing blood cells, we were able to test additional tissues (ocular tissues, urine and/or oral mucosa): in three of them, next generation sequencing has revealed mosaicism. Present results thus confirm that a significant fraction (6/40; 15%) of sporadic retinoblastoma cases are due to postzygotic events and that deep sequencing is an efficient method to unambiguously distinguish mosaics. Re-testing of retinoblastoma patients through next generation sequencing can thus provide new information that may have important implications with respect to genetic counseling and family care.
Parker, Jayme; Murphy, Molly; Hueffer, Karsten; Chen, Jack
Canine parvovirus (CPV) outbreaks can have a devastating effect in communities with dense dog populations. The interior region of Alaska experienced a CPV outbreak in the winter of 2016 leading to the further investigation of the virus due to reports of increased morbidity and mortality occurring at dog mushing kennels in the area. Twelve rectal-swab specimens from dogs displaying clinical signs consistent with parvoviral-associated disease were processed using next-generation sequencing (NGS...
McDaniel, Andrew S; Stall, Jennifer N; Hovelson, Daniel H; Cani, Andi K; Liu, Chia-Jen; Tomlins, Scott A; Cho, Kathleen R
High-grade serous carcinoma (HGSC) is the most prevalent and lethal form of ovarian cancer. HGSCs frequently arise in the distal fallopian tubes rather than the ovary, developing from small precursor lesions called serous tubal intraepithelial carcinomas (TICs, or more specifically, STICs). While STICs have been reported to harbor TP53 mutations, detailed molecular characterizations of these lesions are lacking. We performed targeted next-generation sequencing (NGS) on formalin-fixed, paraffin-embedded tissue from 4 women, 2 with HGSC and 2 with uterine endometrioid carcinoma (UEC) who were diagnosed as having synchronous STICs. We detected concordant mutations in both HGSCs with synchronous STICs, including TP53 mutations as well as assumed germline BRCA1/2 alterations, confirming a clonal association between these lesions. Next-generation sequencing confirmed the presence of a STIC clonally unrelated to 1 case of UEC, and NGS of the other tubal lesion diagnosed as a STIC unexpectedly supported the lesion as a micrometastasis from the associated UEC. We demonstrate that targeted NGS can identify genetic alterations in minute lesions, such as TICs, and confirm TP53 mutations as early driving events for HGSC. Next-generation sequencing also demonstrated unexpected associations between presumed STICs and synchronous carcinomas, providing evidence that some TICs are actually metastases rather than HGSC precursors.
Full Text Available The yeast two-hybrid (Y2H system exploits host cell genetics in order to display binary protein-protein interactions (PPIs via defined and selectable phenotypes. Numerous improvements have been made to this method, adapting the screening principle for diverse applications, including drug discovery and the scale-up for proteome wide interaction screens in human and other organisms. Here we discuss a systematic workflow and analysis scheme for screening data generated by Y2H and related assays that includes high-throughput selection procedures, readout of comprehensive results via next-generation sequencing (NGS, and the interpretation of interaction data via quantitative statistics. The novel assays and tools will serve the broader scientific community to harness the power of NGS technology to address PPI networks in health and disease. We discuss examples of how this next-generation platform can be applied to address specific questions in diverse fields of biology and medicine.
Suter, Bernhard; Zhang, Xinmin; Pesce, C Gustavo; Mendelsohn, Andrew R; Dinesh-Kumar, Savithramma P; Mao, Jian-Hua
The yeast two-hybrid (Y2H) system exploits host cell genetics in order to display binary protein-protein interactions (PPIs) via defined and selectable phenotypes. Numerous improvements have been made to this method, adapting the screening principle for diverse applications, including drug discovery and the scale-up for proteome wide interaction screens in human and other organisms. Here we discuss a systematic workflow and analysis scheme for screening data generated by Y2H and related assays that includes high-throughput selection procedures, readout of comprehensive results via next-generation sequencing (NGS), and the interpretation of interaction data via quantitative statistics. The novel assays and tools will serve the broader scientific community to harness the power of NGS technology to address PPI networks in health and disease. We discuss examples of how this next-generation platform can be applied to address specific questions in diverse fields of biology and medicine.
Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus
DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.
Full Text Available ABSTRACT Next-generation sequencing (NGS is the catch all terms that used to explain several different modern sequencing technologies which let us to sequence nucleic acids much more rapidly and cheaply than the formerly used Sanger sequencing, and as such have revolutionized the study of molecular biology and genomics with excellent resolution and accuracy. Over the past years, many academic companies and institutions have continued technological advances to expand NGS applications from research to the clinic. In this review, the performance and technical features of current NGS platforms were described. Furthermore, advances in the applying of NGS technologies towards the progress of clinical molecular diagnostics were emphasized. General advantages and disadvantages of each sequencing system are summarized and compared to guide the selection of NGS platforms for specific research aims.
I. M. Barkhatov
Full Text Available The review bears on basic principles and technologies of next-generation sequencing (NGS, as well as its applications for detection of gene mutations in leukemic cells. We discuss some novel data concerning NGS approach to studies of genetic heterogeneity in myeloproliferative disorders, detection of high-risk genes, including drug resistance mutations, epigenomic changes associated with leukemias, as well as molecular aspects of clonal evolution. A special section concerns basic problems with bioinformatics and adequate analysis of large digital databases obtained with NGS approach. Optimal choice of appropriate software is of utmost importance for adequate retrieval and interpretation of the NGS data.
Tabatabaeifar, Siavosh; Kruse, Torben A; Thomassen, Mads
of tumour cells exists. Conclusions: Use of next generation sequencing in oral cavity cancer can give valuable insight into the biology of the disease. By investigating intra tumour heterogeneity we see that the different tumour specimens in each patient are quite homogenous, but evidence of heterogeneous......Background: Oral cavity cancer is a subgroup of head and neck cancer which is the world’s 6th most common cancer form. Oral squamous cell carcinomas (OSCC) constitute almost all oral cavity cancers, and OSCC are primarily attributed by excessive alcohol consumption and tobacco exposure...
Blankenberg, Daniel; Hillman-Jackson, Jennifer
The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is explored through several different types of example analyses. Instructions for running one's own Galaxy server on local hardware or on cloud computing resources are provided. Installing new tools into a personal Galaxy instance is also demonstrated.
Alkhateeb, Abedalrhman; Rueda, Luis
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
developed a targeted sequencing based preimplantation genetic diagnosis (PGD) method for monogenic diseases and tested it in a family suffering from β-thalassaemia major undergoing PGD. Moreover, we developed a method which can achieve detection of point mutation and copy number variation simultaneously......The discovery of genetic factors behind increasing number of human diseases and the growth of education of genetic knowledge to the public make demands for genetic testing increase rapidly. However, traditional genetic testing methods cannot meet all kinds of the requirements. Next generation...... sequencing (NGS) featured with high throughput and low cost of sequencing capacity develops fast, especially with the improvement of its read length, read accuracy and the immergence of small-sized machines, making it a powerful genetic testing tool. In this study, we applied NGS to develop novel genetic...
knowledge we previously know. There is very limited knowledge of East Asia lung cancer genome except enrichment of EGFR mutations and lack of KRAS mutations. We carried out integrated genomic, transcriptomic and methylomic analysis of 335 primary Chinese lung adenocarcinomas (LUAD) and 35 corresponding......Cancer will cause 13 million deaths by the year of 2030, ranking the second leading cause of death worldwide. Previous studies indicate that most of the cancers originate from cells that acquired somatic mutations and evolved as Darwin Theory. Ten biological insights of cancer have been summarized...... recently. Cutting-age technologies like next generation sequencing (NGS) enable exploring cancer genome and evolution much more efficiently. However, integrated cancer genome sequencing studies showed great inter-/intra-tumoral heterogeneity (ITH) and complex evolution patterns beyond the cancer biological...
Full Text Available Next-generation sequencing (NGS has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus; beet curly top virus and beet severe curly top virus (curtovirus; and bean yellow dwarf virus (mastrevirus. The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus and cucumber vein yellowing virus (ipomovirus, family, Potyviridae by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.Keywords: Next-generation sequencing, NGS, plant virology, plant viruses, viroids, resistance to plant viruses by CRISPR-Cas9
Li, Bingshan; Zhan, Xiaowei; Wing, Mary-Kate; Anderson, Paul; Kang, Hyun Min; Abecasis, Goncalo R
Next generation sequencing (NGS) is being widely used to identify genetic variants associated with human disease. Although the approach is cost effective, the underlying data is susceptible to many types of error. Importantly, since NGS technologies and protocols are rapidly evolving, with constantly changing steps ranging from sample preparation to data processing software updates, it is important to enable researchers to routinely assess the quality of sequencing and alignment data prior to downstream analyses. Here we describe QPLOT, an automated tool that can facilitate the quality assessment of sequencing run performance. Taking standard sequence alignments as input, QPLOT generates a series of diagnostic metrics summarizing run quality and produces convenient graphical summaries for these metrics. QPLOT is computationally efficient, generates webpages for interactive exploration of detailed results, and can handle the joint output of many sequencing runs. QPLOT is an automated tool that facilitates assessment of sequence run quality. We routinely apply QPLOT to ensure quick detection of diagnostic of sequencing run problems. We hope that QPLOT will be useful to the community as well.
Full Text Available Background. Next generation sequencing (NGS is being widely used to identify genetic variants associated with human disease. Although the approach is cost effective, the underlying data is susceptible to many types of error. Importantly, since NGS technologies and protocols are rapidly evolving, with constantly changing steps ranging from sample preparation to data processing software updates, it is important to enable researchers to routinely assess the quality of sequencing and alignment data prior to downstream analyses. Results. Here we describe QPLOT, an automated tool that can facilitate the quality assessment of sequencing run performance. Taking standard sequence alignments as input, QPLOT generates a series of diagnostic metrics summarizing run quality and produces convenient graphical summaries for these metrics. QPLOT is computationally efficient, generates webpages for interactive exploration of detailed results, and can handle the joint output of many sequencing runs. Conclusion. QPLOT is an automated tool that facilitates assessment of sequence run quality. We routinely apply QPLOT to ensure quick detection of diagnostic of sequencing run problems. We hope that QPLOT will be useful to the community as well.
Soltis Douglas E
Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance
Henry, Robert J
Next generation Sequencing (NGS) provides a powerful tool for discovery of domestication genes in crop plants and their wild relatives. The accelerated domestication of new plant species as crops may be facilitated by this knowledge. Re-sequencing of domesticated genotypes can identify regions of low diversity associated with domestication. Species-specific data can be obtained from related wild species by whole-genome shot-gun sequencing. This sequence data can be used to design species specific polymerase chain reaction (PCR) primers. Sequencing of the products of PCR amplification of target genes can be used to explore genetic variation in large numbers of genes and gene families. Novel allelic variation in close or distant relatives can be characterized by NGS. Examples of recent applications of NGS to capture of genetic diversity for crop improvement include rice, sugarcane and Eucalypts. Populations of large numbers of individuals can be screened rapidly. NGS supports the rapid domestication of new plant species and the efficient identification and capture of novel genetic variation from related species.
Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P.
Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions. PMID:26908260
Desmedt, Christine; Voet, Thierry; Sotiriou, Christos; Campbell, Peter J
Purpose of the review We are currently on the threshold of a revolution in breast cancer research thanks to the emergence of novel technologies based on next generation sequencing (NGS). In this review, we will describe the different sequencing technologies and platforms, and summarize the main findings from the latest sequencing papers in breast cancer. Recent findings First, the sequencing of a few hundreds of breast tumors has revealed new cancer genes. Although these were not frequently mutated, mutated genes from different patients could be grouped into the deregulation of similar pathways. Second, NGS allowed further exploration of intratumor heterogeneity and revealed that although subclonal mutations were present in all tumors, there was always a dominant clone which comprised at least 50% of the tumor cells. Finally, tumor-specific DNA rearrangements could be detected in the patient’s plasma, suggesting that NGS could be used to personalize the monitoring of the disease. Summary The application of NGS to breast cancer has been associated with tremendous advances and promises for increasing the understanding of the disease. However, there still remain many unanswered questions, such as for example the role of structural changes of tumor genomes in cancer progression and treatment response/resistance. PMID:23014189
Kapgate, S S; Barbuddhe, S B; Kumanan, K
Increased globalisation, climatic changes and wildlife-livestock interface led to emergence of novel viral pathogens or zoonoses that have become serious concern to avian, animal and human health. High biodiversity and bird migration facilitate spread of the pathogen and provide reservoirs for emerging infectious diseases. Current classical diagnostic methods designed to be virus-specific or aim to be limited to group of viral agents, hinder identifying of novel viruses or viral variants. Recently developed approaches of next-generation sequencing (NGS) provide culture-independent methods that are useful for understanding viral diversity and discovery of novel virus, thereby enabling a better diagnosis and disease control. This review discusses the different possible steps of a NGS study utilizing sequence-independent amplification, high-throughput sequencing and bioinformatics approaches to identify novel avian viruses and their diversity. NGS lead to the identification of a wide range of new viruses such as picobirnavirus, picornavirus, orthoreovirus and avian gamma coronavirus associated with fulminating disease in guinea fowl and is also used in describing viral diversity among avian species. The review also briefly discusses areas of viral-host interaction and disease associated causalities with newly identified avian viruses.
Lohmann, Katja; Klein, Christine
The introduction of next generation sequencing (NGS) has led to an exponential increase of elucidated genetic causes in both extremely rare diseases and common but heterogeneous disorders. It can be applied to the whole or to selected parts of the genome (genome or exome sequencing, gene panels). NGS is not only useful in large extended families with linkage information, but may also be applied to detect de novo mutations or mosaicism in sporadic patients without a prior hypothesis about the mutated gene. Currently, NGS is applied in both research and clinical settings, and there is a rapid transition of research findings to diagnostic applications. These developments may greatly help to minimize the "diagnostic odyssey" for patients as whole-genome analysis can be performed in a few days at reasonable costs compared with gene-by-gene analysis based on Sanger sequencing following diverse clinical tests. Despite the enthusiasm about NGS, one has to keep in mind its limitations, such as a coverage and accuracy of define standards for NGS with respect to run quality and variant interpretation, as well as mechanisms of quality control. Further, there are ethical challenges including incidental findings and how to guide unaffected probands seeking direct-to-customer testing. However, taken together, the application of NGS in research and diagnostics provides a tremendous opportunity to better serve our patients.
Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W
Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.
Full Text Available Abstract Background The throughput of next-generation sequencing machines has increased dramatically over the last few years; yet the cost and time for library preparation have not changed proportionally, thus representing the main bottleneck for sequencing large numbers of samples. Here we present an economical, high-throughput library preparation method for the Illumina platform, comprising a 96-well based method for DNA isolation for yeast cells, a low-cost DNA shearing alternative, and adapter ligation using heat inactivation of enzymes instead of bead cleanups. Results Up to 384 whole-genome libraries can be prepared from yeast cells in one week using this method, for less than 15 euros per sample. We demonstrate the robustness of this protocol by sequencing over 1000 yeast genomes at ~30x coverage. The sequence information from 768 yeast segregants derived from two divergent S. cerevisiae strains was used to generate a meiotic recombination map at unprecedented resolution. Comparisons to other datasets indicate a high conservation of recombination at a chromosome-wide scale, but differences at the local scale. Additionally, we detected a high degree of aneuploidy (3.6% by examining the sequencing coverage in these segregants. Differences in allele frequency allowed us to attribute instances of aneuploidy to gains of chromosomes during meiosis or mitosis, both of which showed a strong tendency to missegregate specific chromosomes. Conclusions Here we present a high throughput workflow to sequence genomes of large number of yeast strains at a low price. We have used this workflow to obtain recombination and aneuploidy data from hundreds of segregants, which can serve as a foundation for future studies of linkage, recombination, and chromosomal aberrations in yeast and higher eukaryotes.
Ellis, Jeremy E; Missan, Dara S; Shabilla, Matthew; Martinez, Delyn; Fry, Stephen E
Currently, there is a critical need to rapidly identify infectious organisms in clinical samples. Next-Generation Sequencing (NGS) could surmount the deficiencies of culture-based methods; however, there are no standardized, automated programs to process NGS data. To address this deficiency, we developed the Rapid Infectious Disease Identification (RIDI™) system. The system requires minimal guidance, which reduces operator errors. The system is compatible with the three major NGS platforms. It automatically interfaces with the sequencing system, detects their data format, configures the analysis type, applies appropriate quality control, and analyzes the results. Sequence information is characterized using both the NCBI database and RIDI™ specific databases. RIDI™ was designed to identify high probability sequence matches and more divergent matches that could represent different or novel species. We challenged the system using defined American Type Culture Collection (ATCC) reference standards of 27 species, both individually and in varying combinations. The system was able to rapidly detect known organisms in DNA sequence reads at the genus-level and 75.3% at the species-level in reference standards. It has a limit of detection of 146cells/ml in simulated clinical samples, and is also able to identify the components of polymicrobial samples with 16.9% discrepancy at the genus-level and 31.2% at the species-level. Thus, the system's effectiveness may exceed current methods, especially in situations where culture methods could produce false negatives or where rapid results would influence patient outcomes. Copyright © 2016 Elsevier B.V. All rights reserved.
Hegele, Robert A; Ban, Matthew R; Cao, Henian; McIntyre, Adam D; Robinson, John F; Wang, Jian
To evaluate the potential clinical translation of high-throughput next-generation sequencing (NGS) methods in diagnosis and management of dyslipidemia. Recent NGS experiments indicate that most causative genes for monogenic dyslipidemias are already known. Thus, monogenic dyslipidemias can now be diagnosed using targeted NGS. Targeting of dyslipidemia genes can be achieved by either: designing custom reagents for a dyslipidemia-specific NGS panel; or performing genome-wide NGS and focusing on genes of interest. Advantages of the former approach are lower cost and limited potential to detect incidental pathogenic variants unrelated to dyslipidemia. However, the latter approach is more flexible because masking criteria can be altered as knowledge advances, with no need for re-design of reagents or follow-up sequencing runs. Also, the cost of genome-wide analysis is decreasing and ethical concerns can likely be mitigated. DNA-based diagnosis is already part of the clinical diagnostic algorithms for familial hypercholesterolemia. Furthermore, DNA-based diagnosis is supplanting traditional biochemical methods to diagnose chylomicronemia caused by deficiency of lipoprotein lipase or its co-factors. The increasing availability and decreasing cost of clinical NGS for dyslipidemia means that its potential benefits can now be evaluated on a larger scale.
Zhang, Jun; Chiodini, Rod; Badr, Ahmed; Zhang, Genfa
This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also presents a significant challenge for data storage, analyses, and management solutions. Advanced bioinformatic tools are essential for the successful application of NGS technology. As evidenced throughout this review, NGS technologies will have a striking impact on genomic research and the entire biological field. With its ability to tackle the unsolved challenges unconquered by previous genomic technologies, NGS is likely to unravel the complexity of the human genome in terms of genetic variations, some of which may be confined to susceptible loci for some common human conditions. The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come. Copyright © 2011. Published by Elsevier Ltd.
Connor, Ashton A; Gallinger, Steven
Pancreatic ductal adenocarcinoma (PDAC) has the highest mortality rate of all epithelial malignancies and a paradoxically rising incidence rate. Clinical translation of next generation sequencing (NGS) of tumour and germline samples may ameliorate outcomes by identifying prognostic and predictive genomic and transcriptomic features in appreciable fractions of patients, facilitating enrolment in biomarker-matched trials. Areas covered: The literature on precision oncology is reviewed. It is found that outcomes may be improved across various malignancies, and it is suggested that current issues of adequate tissue acquisition, turnaround times, analytic expertise and clinical trial accessibility may lessen as experience accrues. Also reviewed are PDAC genomic and transcriptomic NGS studies, emphasizing discoveries of promising biomarkers, though these require validation, and the fraction of patients that will benefit from these outside of the research setting is currently unknown. Expert commentary: Clinical use of NGS with PDAC should be used in investigational contexts in centers with multidisciplinary expertise in cancer sequencing and pancreatic cancer management. Biomarker directed studies will improve our understanding of actionable genomic variation in PDAC, and improve outcomes for this challenging disease.
Rossello, Fernando J.; Tothill, Richard W.; Britt, Kara; Marini, Kieren D.; Falzon, Jeanette; Thomas, David M.; Peacock, Craig D.; Marchionni, Luigi; Li, Jason; Bennett, Samara; Tantoso, Erwin; Brown, Tracey; Chan, Philip; Martelotto, Luciano G.; Watkins, D. Neil
Next-generation sequencing (NGS) studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC), a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations. PMID:24086345
Fernando J Rossello
Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.
Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard
Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T.; Scarpa, Aldo
Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for
Gelderman, Grant; Contreras, Lydia M
Next generation sequencing (NGS) has revolutionized the way by which we engineer metabolism by radically altering the path to genome-wide inquiries. This is due to the fact that NGS approaches offer several powerful advantages over traditional methods that include the ability to fully sequence hundreds to thousands of genes in a single experiment and simultaneously detect homozygous and heterozygous deletions, alterations in gene copy number, insertions, translocations, and exome-wide substitutions that include "hot-spot mutations." This chapter describes the use of these technologies as a sequencing technique for transcriptome analysis and discovery of regulatory RNA elements in the context of three main platforms: Illumina HiSeq, 454 pyrosequencing, and SOLiD sequencing. Specifically, this chapter focuses on the use of Illumina HiSeq, since it is the most widely used platform for RNA discovery and transcriptome analysis. Regulatory RNAs have now been found in all branches of life. In bacteria, noncoding small RNAs (sRNAs) are involved in highly sophisticated regulatory circuits that include quorum sensing, carbon metabolism, stress responses, and virulence (Gorke and Vogel, Gene Dev 22:2914-2925, 2008; Gottesman, Trends Genet 21:399-404, 2005; Romby et al., Curr Opin Microbiol 9:229-236, 2006). Further characterization of the underlying regulation of gene expression remains poorly understood given that it is estimated that over 60% of all predicted genes remain hypothetical and the 5' and 3' untranslated regions are unknown for more than 90% of the genes (Siegel et al., Trends Parasitol 27:434-441, 2011). Importantly, manipulation of the posttranscriptional regulation that occurs at the level of RNA stability and export, trans-splicing, polyadenylation, protein translation, and protein stability via untranslated regions (Clayton, EMBO J 21:1881-1888, 2002; Haile and Papadopoulou, Curr Opin Microbiol 10:569-577, 2007) could be highly beneficial to metabolic
Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The
Ivanova, Natalia V; Kuzmina, Maria L; Braukmann, Thomas W A; Borisenko, Alex V; Zakharov, Evgeny V
DNA-based testing has been gaining acceptance as a tool for authentication of a wide range of food products; however, its applicability for testing of herbal supplements remains contentious. We utilized Sanger and Next-Generation Sequencing (NGS) for taxonomic authentication of fifteen herbal supplements representing three different producers from five medicinal plants: Echinacea purpurea, Valeriana officinalis, Ginkgo biloba, Hypericum perforatum and Trigonella foenum-graecum. Experimental design included three modifications of DNA extraction, two lysate dilutions, Internal Amplification Control, and multiple negative controls to exclude background contamination. Ginkgo supplements were also analyzed using HPLC-MS for the presence of active medicinal components. All supplements yielded DNA from multiple species, rendering Sanger sequencing results for rbcL and ITS2 regions either uninterpretable or non-reproducible between the experimental replicates. Overall, DNA from the manufacturer-listed medicinal plants was successfully detected in seven out of eight dry herb form supplements; however, low or poor DNA recovery due to degradation was observed in most plant extracts (none detected by Sanger; three out of seven-by NGS). NGS also revealed a diverse community of fungi, known to be associated with live plant material and/or the fermentation process used in the production of plant extracts. HPLC-MS testing demonstrated that Ginkgo supplements with degraded DNA contained ten key medicinal components. Quality control of herbal supplements should utilize a synergetic approach targeting both DNA and bioactive components, especially for standardized extracts with degraded DNA. The NGS workflow developed in this study enables reliable detection of plant and fungal DNA and can be utilized by manufacturers for quality assurance of raw plant materials, contamination control during the production process, and the final product. Interpretation of results should involve an
Natalia V Ivanova
Full Text Available DNA-based testing has been gaining acceptance as a tool for authentication of a wide range of food products; however, its applicability for testing of herbal supplements remains contentious.We utilized Sanger and Next-Generation Sequencing (NGS for taxonomic authentication of fifteen herbal supplements representing three different producers from five medicinal plants: Echinacea purpurea, Valeriana officinalis, Ginkgo biloba, Hypericum perforatum and Trigonella foenum-graecum. Experimental design included three modifications of DNA extraction, two lysate dilutions, Internal Amplification Control, and multiple negative controls to exclude background contamination. Ginkgo supplements were also analyzed using HPLC-MS for the presence of active medicinal components.All supplements yielded DNA from multiple species, rendering Sanger sequencing results for rbcL and ITS2 regions either uninterpretable or non-reproducible between the experimental replicates. Overall, DNA from the manufacturer-listed medicinal plants was successfully detected in seven out of eight dry herb form supplements; however, low or poor DNA recovery due to degradation was observed in most plant extracts (none detected by Sanger; three out of seven-by NGS. NGS also revealed a diverse community of fungi, known to be associated with live plant material and/or the fermentation process used in the production of plant extracts. HPLC-MS testing demonstrated that Ginkgo supplements with degraded DNA contained ten key medicinal components.Quality control of herbal supplements should utilize a synergetic approach targeting both DNA and bioactive components, especially for standardized extracts with degraded DNA. The NGS workflow developed in this study enables reliable detection of plant and fungal DNA and can be utilized by manufacturers for quality assurance of raw plant materials, contamination control during the production process, and the final product. Interpretation of results should
Cseke Leland J
Full Text Available Abstract Background Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. Results We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. Conclusions The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.
Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning
As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species.
Full Text Available As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species.
Milicchio, Franco; Rose, Rebecca; Bian, Jiang; Min, Jae; Prosperi, Mattia
High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a 'cultural' gap between the end user and the developer. Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users' needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations. In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare.
Introduction It has become increasingly difficult to attain high resolution HLA typing without ambiguities when employing SBT and PCR-SSP. NGS is well-suited for HLA typing as it delivers highly accurate and unambiguous results. We validated a NGS protocol developed for use with the Illumina Mi......Seq. Methods and Materials Without any prior NGS experience, we implemented a protocol consisting of LR-PCR of 5 loci (HLA-A, B, C, DRB1, and DQB1), library prep with two different indexing strategies: a) locus-specific indexing and b) sample-specific indexing. Sequencing was paired-end 250 bp sequencing......, and sequence analysis was performed with Twin HLA (Omixon). Results Two sequencing runs of the Omixon Holotype X4 kit yielded an average output of 5.3 Gb per run. Analysis was limited to 20K reads for all the indexes resulting in an average consensus coverage of 332 for locus specific indexing and 59...
Lloyd Rhiannon E
-coding genes were shown to be under strong negative (purifying selection, with genes under the strongest pressure (Complex 4 also being the most highly expressed, highlighting their potentially crucial functions in the mitochondrial respiratory chain. Conclusions Next generation sequencing of long-PCR amplicons using single taxon or multi-taxon approaches enabled two new species of Xenopus mtDNA to be fully characterized. We anticipate our complete mitochondrial genome amplification methods to be applicable to other amphibians, helpful for identifying the most appropriate markers for differentiating species, populations and resolving phylogenies, a pressing need since amphibians are undergoing drastic global decline. Our mtDNAs also provide templates for conserved primer design and the assembly of RNA and DNA reads following high throughput “omic” techniques such as RNA- and ChIP-seq. These could help us better understand how processes such mitochondrial replication and gene expression influence xenopus growth and development, as well as how they evolved and are regulated.
Joensen, Katrine Grimstrup; Engsbro, A L Ø; Lukjancenko, Oksana
The accurate microbiological diagnosis of diarrhoea involves numerous laboratory tests and, often, the pathogen is not identified in time to guide clinical management. With next-generation sequencing (NGS) becoming cheaper, it has huge potential in routine diagnostics. The aim of this study...... was to evaluate the potential of NGS-based diagnostics through direct sequencing of faecal samples. Fifty-eight clinical faecal samples were obtained from patients with diarrhoea as part of the routine diagnostics at Hvidovre University Hospital, Denmark. Ten samples from healthy individuals were also included...
Tabatabaeifar, Siavosh; Kruse, Torben A; Thomassen, Mads
Head and neck squamous cell carcinoma (HNSCC) can primarily be attributed to alcohol consumption, tobacco use and infection with human papilloma virus. The heterogeneous nature of HNSCC has exposed a lack of tools for clinicians to provide more accurate prognosis. There is a need for biomarkers...... that can characterise the diversity of the cancer, and perhaps in the future, some of these biomarkers can point to targets for use in targeted and personalised medicine. The introduction of next generation sequencing (NGS) has allowed researches to sequence thousands of genes at a time through fast...
Mende, Daniel R; Waller, Alison S; Sunagawa, Shinichi
the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition......Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators...... with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved...
Cabanski Christopher R
Full Text Available Abstract Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.
Ballester, Leomar Y; Luthra, Rajyalakshmi; Kanagal-Shamanna, Rashmi; Singh, Rajesh R
The huge parallel sequencing capabilities of next generation sequencing technologies have made them the tools of choice to characterize genomic aberrations for research and diagnostic purposes. For clinical applications, screening the whole genome or exome is challenging owing to the large genomic area to be sequenced, associated costs, complexity of data, and lack of known clinical significance of all genes. Consequently, routine screening involves limited markers with established clinical relevance. This process, referred to as targeted genome sequencing, requires selective enrichment of the genomic areas comprising these markers via one of several primer or probe-based enrichment strategies, followed by sequencing of the enriched genomic areas. Here, the authors review current target enrichment approaches and next generation sequencing platforms, focusing on the underlying principles, capabilities, and limitations of each technology along with validation and implementation for clinical testing.
Full Text Available Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
Full Text Available Next generation sequencing (NGS instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, only three research groups working in plant sciences have exploited this potentiality. They showed that pooled NGS can provide results in excellent agreement with those obtained by individual Sanger sequencing. Aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method we will explain in detail the variations in study design and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled next generation sequencing can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity and Tajima’s D. Finally we will discuss applications and future perspectives of the multiplexed NGS approach.
generation sequencing (NGS) has numerous advantages compared with conventional typing techniques. ..... SSP = sequence-specific primers; bold font highlights the differences observed between conventional techniques and MR and HR.
Fumagalli, Matteo; Garrett Vieira, Filipe Jorge; Korneliussen, Thorfinn Sand
Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the da...
Full Text Available Over 70 different Charcot-Marie-Tooth disease (CMT–associated genes have now been discovered and their number is growing. Conventional genetic testing for all CMT genes is cumbersome, expensive, and impractical in an individual patient. Next-generation sequencing (NGS technology allows cost-effective sequencing of large scale DNA, even entire exome (coding sequences or whole genome and thus, NGS platform can be employed to effectively target a large number or all CMT-related genes for accurate diagnosis. This overview discusses how NGS can be strategically used for genetic diagnosis in patients with CMT or unexplained neuropathy. A comment is made to combine simple clinical and electrophysiological algorithm to assign patients to major CMT subtypes and then employ NGS to screen for all known mutations in the subtype-specific CMT gene panel.
Warton, Kristina; Lin, Vita; Navin, Tina; Armstrong, Nicola J; Kaplan, Warren; Ying, Kevin; Gloss, Brian; Mangs, Helena; Nair, Shalima S; Hacker, Neville F; Sutherland, Robert L; Clark, Susan J; Samimi, Goli
Free circulating DNA (fcDNA) has many potential clinical applications, due to the non-invasive way in which it is collected. However, because of the low concentration of fcDNA in blood, genome-wide analysis carries many technical challenges that must be overcome before fcDNA studies can reach their full potential. There are currently no definitive standards for fcDNA collection, processing and whole-genome sequencing. We report novel detailed methodology for the capture of high-quality methylated fcDNA, library preparation and downstream genome-wide Next-Generation Sequencing. We also describe the effects of sample storage, processing and scaling on fcDNA recovery and quality. Use of serum versus plasma, and storage of blood prior to separation resulted in genomic DNA contamination, likely due to leukocyte lysis. Methylated fcDNA fragments were isolated from 5 donors using a methyl-binding protein-based protocol and appear as a discrete band of ~180 bases. This discrete band allows minimal sample loss at the size restriction step in library preparation for Next-Generation Sequencing, allowing for high-quality sequencing from minimal amounts of fcDNA. Following sequencing, we obtained 37 × 10(6)-86 × 10(6) unique mappable reads, representing more than 50% of total mappable reads. The methylation status of 9 genomic regions as determined by DNA capture and sequencing was independently validated by clonal bisulphite sequencing. Our optimized methods provide high-quality methylated fcDNA suitable for whole-genome sequencing, and allow good library complexity and accurate sequencing, despite using less than half of the recommended minimum input DNA.
Vuyisich, Momchilo [Los Alamos National Laboratory
NGS technology overview: (1) NGS library preparation - Nucleic acids extraction, Sample quality control, RNA conversion to cDNA, Addition of sequencing adapters, Quality control of library; (2) Sequencing - Clonal amplification of library fragments, (except PacBio), Sequencing by synthesis, Data output (reads and quality); and (3) Data analysis - Read mapping, Genome assembly, Gene expression, Operon structure, sRNA discovery, and Epigenetic analyses.
Technological developments in DNA sequencing are an excellent example of how major advances in scientific techniques can lead to numerous conceptual discoveries in the life sciences. Starting in the 1970s with the development of DNA sequencing by chain termination methods, through the introduction
Full Text Available Novel DNA sequencing techniques, referred to as “next-generation” sequencing (NGS, provide high speed and throughput that can produce an enormous volume of sequences with many possible applications in research and diagnostic settings. In this article, we provide an overview of the many applications of NGS in diagnostic virology. NGS techniques have been used for high-throughput whole viral genome sequencing, such as sequencing of new influenza viruses, for detection of viral genome variability and evolution within the host, such as investigation of human immunodeficiency virus and human hepatitis C virus quasispecies, and monitoring of low-abundance antiviral drug-resistance mutations. NGS techniques have been applied to metagenomics-based strategies for the detection of unexpected disease-associated viruses and for the discovery of novel human viruses, including cancer-related viruses. Finally, the human virome in healthy and disease conditions has been described by NGS-based metagenomics.
Wain, John; Keddy, Karen H.; Hendriksen, Rene S.
The publication of studies using next generation sequencing to analyse large numbers of bacterial isolates from global epidemics is transforming microbiology, epidemiology and public health. The emergence of multidrug resistant Salmonella Typhimurium ST313 is one example. While the epidemiology...
a sensitivity and specificity of over 99%, which can provide accurate and reliable results and thus avoid most of invasive process compared to standard prenatal test. Moreover，we also designed probes for genes related to Monogenetic disorders and conducted target region sequencing for parents, proband......There are nearly 7000 rare diseases that have been reported in the world. Although most of them occur with a frequency of less than one in 2000, in total about 6% of the population suffers from rare diseases. These rare diseases are often caused by changes in genes, which is currently lack...... of effective treatment. The rapid development of next generation sequencing technology boosts the discovery of new causative gene for these rare diseases, as well as the genetic diagnosis in clinic practice. Carrier screening, prenatal diagnosis and newborn screening are wildly used in the world to prevent...
Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt
To ensure efficient sequencing, the DNA of next-generation sequencing (NGS) libraries must be quantified correctly. Therefore, an accurate, sensitive and stable method for DNA quantification is crucial. In this study, seven different methods for DNA quantification were compared to each other...... by quantifying NGS libraries for the Ion TorrentTM and Illumina1 platforms as well as dsDNA oligos with known DNA concentrations. Rather large variations in library concentration estimates were observed. The differences between the highest and lowest concentration estimates varied with a factor of 5......–100 depending on the library concentration. The Bioanalyzer, TapeStation and Qubit1 instruments gave concentrations closest to the expected when quantifying dsDNA oligos. At very low concentrations (2–4 pg/ul) only the Bioanalyzer could reliably quantify the dsDNA oligos....
incapacitating illness, lack of adequate control measures, and the ease of production of large quantities of virus. Characterisation by sequencing is...ability to induce a fatal or seriously incapacitating illness, the lack of adequate control measures, and the ease of production of large...Location Family Neutralising antibody detected Culex annulirostris (mosquito) DPP1163 1987 Darwin, NT Rhabdoviridae Cattle, buffalo
Sep 14, 2016 ... Rolling circle amplification is a simple approach of enriching populations of single-stranded DNA plant begomovirus ... sequencing by enriching it using rolling circle amplification then determination of the diversity of the cassava mosaic .... pair ended reads while 4 libraries had less than 0.06 ng/ul and had ...
Mertens, F.; El-Sharawy, A.; Sauer, S.; Van Helvoort, J.; Van der Zaag, P.J.; Franke, A.; Nilsson, M.; Lehrach. H.; Brookes, A.
In this review we discuss the latest targeted enrichment methods, and aspects of their utilization along with second generation sequencing for complex genome analysis. In doing so we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a
labutti, Kurt; Kuo, Alan; Grigoriev, Igor; Copeland, Alex
Repetitive organisms pose a challenge for short read assembly, and typically only unique regions and repeat regions shorter than the read length, can be accurately assembled. Recently, we have been investigating the use of Pacific Biosciences reads for de novo fungal assembly. We will present an assessment of the quality and degree of repeat reconstruction possible in a fungal genome using long read technology. We will also compare differences in assembly of repeat content using short read and long read technology.
Full Text Available Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a specific to the sequencing platform and library construction method (e.g., adapter sequences; (b have been introduced by experimental design (e.g., sample barcodes; or (c constitute some biological signal (e.g., splice leader sequences in nematodes. Our software FLEXBAR enables accurate recognition, sorting and trimming of sequence tags with maximal flexibility, based on exact overlap sequence alignment. The software supports data formats from all current sequencing platforms, including color-space reads. FLEXBAR maintains read pairings and processes separate barcode reads on demand. Our software facilitates the fine-grained adjustment of sequence tag detection parameters and search regions. FLEXBAR is a multi-threaded software and combines speed with precision. Even complex read processing scenarios might be executed with a single command line call. We demonstrate the utility of the software in terms of read mapping applications, library demultiplexing and splice leader detection. FLEXBAR and additional information is available for academic use from the website: http://sourceforge.net/projects/flexbar/.
Børsting, Claus; Morling, Niels
matured during the last 10 years, and the quality of the sequences has reached a level where NGS is used in clinical diagnostics of humans. Forensic genetic laboratories have also explored NGS technologies and especially in the last year, there has been a small explosion in the number of scientific...... articles and presentations at conferences with forensic aspects of NGS. These contributions have demonstrated that NGS offers new possibilities for forensic genetic case work. More information may be obtained from unique samples in a single experiment by analyzing combinations of markers (STRs, SNPs......, insertion/deletions, mRNA) that cannot be analyzed simultaneously with the standard PCR-CE methods used today. The true variation in core forensic STR loci has been uncovered, and previously unknown STR alleles have been discovered. The detailed sequence information may aid mixture interpretation...
Hidajat, Rachmat; Nickols, Brian [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Forrester, Naomi [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Tretyakova, Irina [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Weaver, Scott [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Pushko, Peter, E-mail: email@example.com [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States)
Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.
Hidajat, Rachmat; Nickols, Brian; Forrester, Naomi; Tretyakova, Irina; Weaver, Scott; Pushko, Peter
Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.
Svensen, Nina; Peersen, Olve B; Jaffrey, Samie R
Methods for displaying large numbers of peptides on solid surfaces are essential for high-throughput characterization of peptide function and binding properties. Here we describe a method for converting the >10(7) flow cell-bound clusters of identical DNA strands generated by the Illumina DNA sequencing technology into clusters of complementary RNA, and subsequently peptide clusters. We modified the flow-cell-bound primers with ribonucleotides thus enabling them to be used by poliovirus polymerase 3D(pol) . The primers hybridize to the clustered DNA thus leading to RNA clusters. The RNAs fold into functional protein- or small molecule-binding aptamers. We used the mRNA-display approach to synthesize flow-cell-tethered peptides from these RNA clusters. The peptides showed selective binding to cognate antibodies. The methods described here provide an approach for using DNA clusters to template peptide synthesis on an Illumina flow cell, thus providing new opportunities for massively parallel peptide-based assays. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D
- Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.
Deurenberg, Ruud H.; Bathoorn, Erik; Chlebowicz, Monika A.; Couto, Natacha; Ferdous, Mithila; Garcia-Cobos, Silvia; Kooistra-Smid, Anna M. D.; Raangs, Erwin C.; Rosema, Sigrid; Veloo, Alida C. M.; Zhou, Kai; Friedrich, Alexander W.; Rossen, John W. A.
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data,
Deurenberg, Ruud H.; Bathoorn, Erik; Chlebowicz, Monika A.; Monge Gomes do Couto, Natacha; Ferdous, Mithila; Garcia-Cobos, Silvia; Kooistra-Smid, Anna M. D.; Raangs, Erwin C.; Rosema, Sigrid; Veloo, Alida C. M.; Zhou, Kai; Friedrich, Alexander W.; Rossen, John W. A.
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data,
Elingaramil, Sauli; Li, Xiaolong; He, Nongyue
Next-generation sequencing technologies, microarrays and advances in bio nanotechnology have had an enormous impact on research within a short time frame. This impact appears certain to increase further as many biomedical institutions are now acquiring these prevailing new technologies. Beyond conventional sampling of genome content, wide-ranging applications are rapidly evolving for next-generation sequencing, microarrays and nanotechnology. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted re sequencing and discovery of transcription factor binding sites, noncoding RNA expression profiling and molecular diagnostics. This paper thus discusses current applications of nanotechnology, next-generation sequencing technologies and microarrays in biomedical research and highlights the transforming potential these technologies offer.
Patel, Nirali M; Michelini, Vanessa V; Snell, Jeff M; Balu, Saianand; Hoyle, Alan P; Parker, Joel S; Hayward, Michele C; Eberhard, David A; Salazar, Ashley H; McNeillie, Patrick; Xu, Jia; Huettner, Claudia S; Koyama, Takahiko; Utro, Filippo; Rhrissorrakrai, Kahn; Norel, Raquel; Bilal, Erhan; Royyuru, Ajay; Parida, Laxmi; Earp, H Shelton; Grilley-Olson, Juneko E; Hayes, D Neil; Harvey, Stephen J; Sharpless, Norman E; Kim, William Y
Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human "molecular tumor boards" (MTBs). The purpose of this study was to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials. The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who
Loka, Tobias P; Tausch, Simon H; Dabrowski, Piotr Wojciech; Radonic, Aleksandar; Nitsche, Andreas; Renard, Bernhard Y
In Next Generation Sequencing (NGS), re-identification of individuals and other privacy-breaching strategies can be applied even for anonymized data. This also holds true for applications in which human DNA is acquired as a by-product, e.g. for viral or metagenomic samples from a human host. Conventional data protection strategies including cryptography and post-hoc filtering are only appropriate for the final and processed sequencing data. This can result in an insufficient level of data protection and a considerable time delay in the further analysis workflow. We present PriLive, a novel tool for the automated removal of sensitive data while the sequencing machine is running. Thereby, human sequence information can be detected and removed before being completely produced. This facilitates the compliance with strict data protection regulations. The unique characteristic to cause almost no time delay for further analyses is also a clear benefit for applications other than data protection. Especially if the sequencing data are dominated by known background signals, PriLive considerably accelerates consequent analyses by having only fractions of input data. Besides these conceptual advantages, PriLive achieves filtering results at least as accurate as conventional post-hoc filtering tools. PriLive is open-source software available at https://gitlab.com/rki_bioinformatics/PriLive. RenardB@rki.de. Supplementary data are available at Bioinformatics online.
Fahnøe, Ulrik; Orton, Richard; Höper, Dirk
Next Generation Sequencing (NGS) has rapidly become the preferred technology in nucleotide sequencing, and can be applied to unravel molecular adaptation of RNA viruses such as Classical Swine Fever Virus (CSFV). However, the detection of low frequency variants within viral populations by NGS...
Quek, Kelly; Nones, Katia; Patch, Ann-Marie; Fink, J Lynn; Newell, Felicity; Cloonan, Nicole; Miller, David; Fadlullah, Muhammad Z H; Kassahn, Karin; Christ, Angelika N; Bruxner, Timothy J C; Manning, Suzanne; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Steptoe, Anita; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Wilson, Peter; Biankin, Andrew V; Pearson, John V; Waddell, Nic; Grimmond, Sean M
Somatic rearrangements, which are commonly found in human cancer genomes, contribute to the progression and maintenance of cancers. Conventionally, the verification of somatic rearrangements comprises many manual steps and Sanger sequencing. This is labor intensive when verifying a large number of rearrangements in a large cohort. To increase the verification throughput, we devised a high-throughput workflow that utilizes benchtop next-generation sequencing and in-house bioinformatics tools to link the laboratory processes. In the proposed workflow, primers are automatically designed. PCR and an optional gel electrophoresis step to confirm the somatic nature of the rearrangements are performed. PCR products of somatic events are pooled for Ion Torrent PGM and/or Illumina MiSeq sequencing, the resulting sequence reads are assembled into consensus contigs by a consensus assembler, and an automated BLAT is used to resolve the breakpoints to base level. We compared sequences and breakpoints of verified somatic rearrangements between the conventional and high-throughput workflow. The results showed that next-generation sequencing methods are comparable to conventional Sanger sequencing. The identified breakpoints obtained from next-generation sequencing methods were highly accurate and reproducible. Furthermore, the proposed workflow allows hundreds of events to be processed in a shorter time frame compared with the conventional workflow.
Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.
Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
den Dunnen Johan T; van Ommen Gertjan; Ariyurek Yavuz; Buermans Henk PJ; 't Hoen Peter AC
Abstract Background MicroRNAs are small non-coding RNA transcripts that regulate post-transcriptional gene expression. The millions of short sequence reads generated by next generation sequencing technologies make this technique explicitly suitable for profiling of known and novel microRNAs. A modification to the small-RNA expression kit (SREK, Ambion) library preparation method for the SOLiD sequencing platform is described to generate microRNA sequencing libraries that are compatible with t...
Sana, Maria Elena
Since the early 1990s, Sanger method has been the gold standard methodology for sequencing analysis of DNA. Next-generation sequencing (NGS) approaches revolutionized the field of genomics over the last 5 years. These new sequencing technologies make feasible the direct and cost-effective sequencing of genomes at unprecedented scale and speed. Furthermore, the applications of these technologies are wide-spread and have been developed to explore the complex biological systems, among which RNA ...
Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.
Daniel, R; Santos, C; Phillips, C; Fondevila, M; van Oorschot, R A H; Carracedo, A; Lareu, M V; McNevin, D
Forensic phenotyping can provide useful intelligence regarding the biogeographical ancestry (BGA) and externally visible characteristics (EVCs) of the donor of an evidentiary sample. Currently, single nucleotide polymorphism (SNP) based inference of BGA and EVCs is performed most commonly using SNaPshot(®), a single base extension (SBE) assay. However, a single SNaPshot multiplex PCR is limited to 30-40 SNPs. Next generation sequencing (NGS) offers the potential to genotype hundreds to thousands of SNPs from multiple samples in a single experimental run. The PCR multiplexes from five SNaPshot assays (SNPforID 52plex, SNPforID 34plex, Eurasiaplex, IrisPlex and an unpublished BGA assay) were applied to three different DNA template amounts (0.1, 0.2 and 0.3 ng) in three samples (9947A and 007 control DNAs and a male donor). The pooled PCR amplicons containing 136 unique SNPs were sequenced using Life Technologies' Ion Torrent™ PGM system. Approximately 72 Mb of sequence was generated from two 10 Mb Ion 314™ v1 chips. Accurate genotypes were readily obtained from all three template amounts. Of a total of 408 genotypes, 395 (97%) were fully concordant with SNaPshot across all three template amounts. Of those genotypes discordant with SNaPshot, six Ion Torrent sequences (1.5%) were fully concordant with Sanger sequencing across the three template amounts. Seven SNPs (1.7%) were either discordant between template amounts or discordant with Sanger sequencing. Sequence coverage observed in the negative control, and, allele coverage variation for heterozygous genotypes highlights the need to establish a threshold for background levels of sequence output and heterozygous balance. This preliminary study of the Ion Torrent PGM system has demonstrated considerable potential for use in forensic DNA analyses as a low to medium throughput NGS platform using established SNaPshot assays. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Simon H Tausch
Full Text Available The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.We developed RAMBO-K (Read Assignment Method Based On K-mers, a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python are available from http://sourceforge.net/projects/rambok/.
Van Amerongen, Rosa A.; Retèl, Valesca P.; Coupé, Veerle M.H.; Nederlof, Petra M.; Vogel, Maartje J.; Van Harten, Wim H.
Next-generation sequencing (NGS) has reached the molecular diagnostic laboratories. Although the NGS technology aims to improve the effectiveness of therapies by selecting the most promising therapy, concerns are that NGS testing is expensive and that the 'benefits' are not yet in relation to these
Eiler, A.; Drakare, S.; Bertilsson, S.; Pernthaler, J.; Peura, S.; Rofner, C.; Šimek, Karel; Yang, Y.; Znachor, Petr; Lindström, E.S.
Roč. 8, č. 1 (2013), e53516 E-ISSN 1932-6203 R&D Projects: GA ČR(CZ) GA206/08/0015 Institutional support: RVO:60077344 Keywords : phytoplankton * next generation sequencing * diversity Subject RIV: EE - Microbiology, Virology Impact factor: 3.534, year: 2013
Our objective was to evaluate the use of fish larvae for early detection of non-native fishes, comparing traditional and molecular taxonomy based on next-generation DNA sequencing to investigate potential efficiencies. Our approach was to intensively sample a Great Lakes non-nati...
Willenbrock, Hanni; Salomon, Jesper; Søkilde, Rolf
Recently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two te...
Luo, Hong; Mocoeur, Anne Raymonde Joelle; Jing, Hai-Chun
The invention and application of Next-Generation Sequencing (NGS) technologies have revolutionized the study of genetics and genomics. Much research which would not even be considered are nowdays being excuted in many laboratories as routine. In this chapter, we introduce the currently available...
Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.
Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…
Currás-Freixes, Maria; Piñeiro-Yañez, Elena; Montero-Conde, Cristina; Apellániz-Ruiz, María; Calsina, Bruna; Mancikova, Veronika; Remacha, Laura; Richter, Susan; Ercolino, Tonino; Rogowski-Lehmann, Natalie; Deutschbein, Timo; Calatayud, María; Guadalix, Sonsoles; Álvarez-Escolá, Cristina; Lamas, Cristina; Aller, Javier; Sastre-Marcos, Julia; Lázaro, Conxi; Galofré, Juan C.; Patiño-García, Ana; Meoro-Avilés, Amparo; Balmaña-Gelpi, Judith; De Miguel-Novoa, Paz; Balbín, Milagros; Matías-Guiu, Xavier; Letón, Rocío; Inglada-Pérez, Lucía; Torres-Pérez, Rafael; Roldán-Romero, Juan M.; Rodríguez-Antona, Cristina; Fliedner, Stephanie M J; Opocher, Giuseppe; Pacak, Karel; Korpershoek, Esther; de Krijger, Ronald R.; Vroonen, Laurent; Mannelli, Massimo; Fassnacht, Martin; Beuschlein, Felix; Eisenhofer, Graeme; Cascón, Alberto; Al-Shahrour, Fátima; Robledo, Mercedes
Genetic diagnosis is recommended for all pheochromocytoma and paraganglioma (PPGL) cases, as driver mutations are identified in approximately 80% of the cases. As the list of related genes expands, genetic diagnosis becomes more time-consuming, and targeted next-generation sequencing (NGS) has
Calabria, Inés; Pedrola, Laia; Berlanga, Pablo; Aparisi, María José; Sánchez-Izquierdo, Dolors; Cañete, Adela; Cervera, José; Millán, José María; Castel, Victoria
Precision Medicine is an emerging approach for the diagnosis, treatment and prognosis of genetic diseases that enables clinicians to more accurately predict which treatment strategy will be optimal in a patient. The aim of Precision Medicine in Oncology is to integrate clinical, histological, and molecular data in order to obtain a deeper knowledge about the biology and genetics of an individual's tumour. Over the last few years, the implementation of new NGS (Next Generation Sequencing) technologies into clinical practice has been essential. There is a wide variety of NGS techniques that can be used in this context. The correct interpretation of molecular changes detected by these techniques is paramount for their appropriate use. In this review, a discussion is presented on the main NGS sequencing technologies that can be used to improve the diagnosis, prognosis, and treatment of oncology patients. Copyright © 2016 Asociación Española de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.
Full Text Available Recent advance in sequencing technology has enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform using next-generation sequencing (NGS technology for clinical use, which can provide mutation and copy number variation data. NGS was performed with paired-end library enriched with exons of 183 cancer-related genes. Normal and tumor tissue pairs of 60 colorectal adenocarcinomas were used to test feasibility. Somatic mutation and copy number alteration were analyzed. A total of 526 somatic non-synonymous sequence variations were found in 113 genes. Among these, 278 single nucleotide variations were 232 different somatic point mutations. 216 SNV were 79 known single nucleotide polymorphisms in the dbSNP. 32 indels were 28 different indel mutations. Median number of mutated gene per tumor was 4 (range 0-23. Copy number gain (>X2 fold was found in 65 genes in 40 patients, whereas copy number loss (
Khandelwal, Garima; Girotti, María Romina; Smowton, Christopher; Taylor, Sam; Wirth, Christopher; Dynowski, Marek; Frese, Kristopher K; Brady, Ged; Dive, Caroline; Marais, Richard; Miller, Crispin
Patient-derived xenograft (PDX) and circulating tumor cell-derived explant (CDX) models are powerful methods for the study of human disease. In cancer research, these methods have been applied to multiple questions, including the study of metastatic progression, genetic evolution, and therapeutic drug responses. As PDX and CDX models can recapitulate the highly heterogeneous characteristics of a patient tumor, as well as their response to chemotherapy, there is considerable interest in combining them with next-generation sequencing to monitor the genomic, transcriptional, and epigenetic changes that accompany oncogenesis. When used for this purpose, their reliability is highly dependent on being able to accurately distinguish between sequencing reads that originate from the host, and those that arise from the xenograft itself. Here, we demonstrate that failure to correctly identify contaminating host reads when analyzing DNA- and RNA-sequencing (DNA-Seq and RNA-Seq) data from PDX and CDX models is a major confounding factor that can lead to incorrect mutation calls and a failure to identify canonical mutation signatures associated with tumorigenicity. In addition, a highly sensitive algorithm and open source software tool for identifying and removing contaminating host sequences is described. Importantly, when applied to PDX and CDX models of melanoma, these data demonstrate its utility as a sensitive and selective tool for the correction of PDX- and CDX-derived whole-exome and RNA-Seq data. Implications: This study describes a sensitive method to identify contaminating host reads in xenograft and explant DNA- and RNA-Seq data and is applicable to other forms of deep sequencing. Mol Cancer Res; 15(8); 1012-6. ©2017 AACR . ©2017 American Association for Cancer Research.
Full Text Available The complete genome of human cytomegalovirus (HCMV was elucidated almost 25 years ago using a traditional cloning and Sanger sequencing approach. Analysis of the genetic content of additional laboratory and clinical isolates has lead to a better, albeit still incomplete, definition of the coding potential and diversity of wild-type HCMV strains. The introduction of a new generation of massively parallel sequencing technologies, collectively called next-generation sequencing, has profoundly increased the throughput and resolution of the genomics field. These increased possibilities are already leading to a better understanding of the circulating diversity of HCMV clinical isolates. The higher resolution of next-generation sequencing provides new opportunities in the study of intrahost viral population structures. Furthermore, deep sequencing enables novel diagnostic applications for sensitive drug resistance mutation detection. RNA-seq applications have changed the picture of the HCMV transcriptome, which resulted in proof of a vast amount of splicing events and alternative transcripts. This review discusses the application of next-generation sequencing technologies, which has provided a clearer picture of the intricate nature of the HCMV genome. The continuing development and application of novel sequencing technologies will further augment our understanding of this ubiquitous, but elusive, herpesvirus.
Sijmons, Steven; Van Ranst, Marc; Maes, Piet
The complete genome of human cytomegalovirus (HCMV) was elucidated almost 25 years ago using a traditional cloning and Sanger sequencing approach. Analysis of the genetic content of additional laboratory and clinical isolates has lead to a better, albeit still incomplete, definition of the coding potential and diversity of wild-type HCMV strains. The introduction of a new generation of massively parallel sequencing technologies, collectively called next-generation sequencing, has profoundly increased the throughput and resolution of the genomics field. These increased possibilities are already leading to a better understanding of the circulating diversity of HCMV clinical isolates. The higher resolution of next-generation sequencing provides new opportunities in the study of intrahost viral population structures. Furthermore, deep sequencing enables novel diagnostic applications for sensitive drug resistance mutation detection. RNA-seq applications have changed the picture of the HCMV transcriptome, which resulted in proof of a vast amount of splicing events and alternative transcripts. This review discusses the application of next-generation sequencing technologies, which has provided a clearer picture of the intricate nature of the HCMV genome. The continuing development and application of novel sequencing technologies will further augment our understanding of this ubiquitous, but elusive, herpesvirus. PMID:24603756
Escobar-Gutiérrez, Alejandro; Vazquez-Pichardo, Mauricio; Cruz-Rivera, Mayra; Rivera-Osorio, Pilar; Carpio-Pedroza, Juan Carlos; Ruíz-Pacheco, Juan Alberto; Ruiz-Tovar, Karina
Here, we describe a transmission event of hepatitis C virus (HCV) among injection drug users. Next-generation sequencing (NGS) was used to assess the intrahost viral genetic variation. Deep amplicon sequencing of HCV hypervariable region 1 allowed for a detailed analysis of the structure of the viral population. Establishment of the genetic relatedness between cases was accomplished by phylogenetic analysis. NGS is a powerful tool with applications in molecular epidemiology studies and outbreak investigations. PMID:22301026
Escobar-Gutiérrez, Alejandro; Vazquez-Pichardo, Mauricio; Cruz-Rivera, Mayra; Rivera-Osorio, Pilar; Carpio-Pedroza, Juan Carlos; Ruíz-Pacheco, Juan Alberto; Ruiz-Tovar, Karina; Vaughan, Gilberto
Here, we describe a transmission event of hepatitis C virus (HCV) among injection drug users. Next-generation sequencing (NGS) was used to assess the intrahost viral genetic variation. Deep amplicon sequencing of HCV hypervariable region 1 allowed for a detailed analysis of the structure of the viral population. Establishment of the genetic relatedness between cases was accomplished by phylogenetic analysis. NGS is a powerful tool with applications in molecular epidemiology studies and outbre...
Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.
Nicholas A Be
Full Text Available Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy.
Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin
Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937
Full Text Available Many viruses, including the clinically relevant RNA viruses HIV and HCV, exist in large populations and display high genetic heterogeneity within and between infected hosts. Assessing intra-patient viral genetic diversity is essential for understanding the evolutionary dynamics of viruses, for designing effective vaccines, and for the success of antiviral therapy. Next-generation sequencing technologies allow the rapid and cost-effective acquisition of thousands to millions of short DNA sequences from a single sample. However, this approach entails several challenges in experimental design and computational data analysis. Here, we review the entire process of inferring viral diversity from sample collection to computing measures of genetic diversity. We discuss sample preparation, including reverse transcription and amplification, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination. The use of different next-generation sequencing platforms and their sequencing error profiles are compared in the context of various applications of diversity estimation, ranging from the detection of single nucleotide variants to the reconstruction of whole-genome haplotypes. We describe the statistical and computational challenges arising from these technical artifacts, and we review existing approaches, including available software, for their solution. Finally, we discuss open problems, and highlight successful biomedical applications and potential future clinical use of next-generation sequencing to estimate viral diversity.
Jonathan B Puritz
Full Text Available The field of phylogeography has long since realized the need and utility of incorporating nuclear DNA (nDNA sequences into analyses. However, the use of nDNA sequence data, at the population level, has been hindered by technical laboratory difficulty, sequencing costs, and problematic analytical methods dealing with genotypic sequence data, especially in non-model organisms. Here, we present a method utilizing the 454 GS-FLX Titanium pyrosequencing platform with the capacity to simultaneously sequence two species of sea star (Meridiastra calcar and Parvulastra exigua at five different nDNA loci across 16 different populations of 20 individuals each per species. We compare results from 3 populations with traditional Sanger sequencing based methods, and demonstrate that this next-generation sequencing platform is more time and cost effective and more sensitive to rare variants than Sanger based sequencing. A crucial advantage is that the high coverage of clonally amplified sequences simplifies haplotype determination, even in highly polymorphic species. This targeted next-generation approach can greatly increase the use of nDNA sequence loci in phylogeographic and population genetic studies by mitigating many of the time, cost, and analytical issues associated with highly polymorphic, diploid sequence markers.
Fernandes, Gustavo S; Marques, Daniel F; Girardi, Daniel M; Braghiroli, Maria Ignez F; Coudry, Renata A; Meireles, Sibele I; Katz, Artur; Hoff, Paulo M
With the development of next-generation sequencing (NGS) technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0%) were female, and 91 (58.0%) were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6%) had at least one identified gene alteration. Twenty-four patients (15.2%) underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7%) had partial responses, two (8.3%) had stable disease, and 17 (70.8%) had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.
Gustavo S. Fernandes
Full Text Available OBJECTIVES: With the development of next-generation sequencing (NGS technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. METHODS: We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. RESULTS: From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0% were female, and 91 (58.0% were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6% had at least one identified gene alteration. Twenty-four patients (15.2% underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7% had partial responses, two (8.3% had stable disease, and 17 (70.8% had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. CONCLUSION: We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.
Jan G A M L Uitdewilligen
Full Text Available Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect. This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1-3 511. Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107 was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60-80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with
Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta
The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease d...
Graciet, Emmanuelle; O'Maoiléidigh, Diarmuid Seosamh; Wellmer, Frank
Over the past 20 years, classic genetic approaches have shown that the developmental program underlying flower formation involves a large number of transcriptional regulators. However, the target genes of these transcription factors, as well as the gene regulatory networks they control, remain largely unknown. Chromatin immunoprecipitation coupled to next-generation sequencing (ChIP-Seq), which allows the identification of transcription factor binding sites on a genome-wide scale, has been successfully applied to a number of transcription factors in Arabidopsis. The ChIP-Seq procedure involves chemical cross-linking of proteins to DNA, followed by chromatin fragmentation and immunoprecipitation of specific protein-DNA complexes. The regions of the genome bound by a specific transcription factor can then be identified after next-generation sequencing.
Full Text Available Classification of pediatric brain tumors with unusual histologic and clinical features may be a diagnostic challenge to the pathologist. We present a case of a 12-year-old girl with a primary intracranial tumor. The tumor classification was not certain initially, and the site of origin and clinical behavior were unusual. Genomic characterization of the tumor using a Clinical Laboratory Improvement Amendment (CLIA-certified next-generation sequencing assay assisted in the diagnosis and translated into patient benefit, albeit transient. Our case argues that next generation sequencing may play a role in the pathological classification of pediatric brain cancers and guiding targeted therapy, supporting additional studies of genetically targeted therapeutics.
Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh
MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.
Kroll, Jose E.; Kim, Jihoon; Ohno-Machado, Lucila; de Souza, Sandro J.
Motivation. Alternative splicing events (ASEs) are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many dif...
Ji, Boyang; Nielsen, Jens
Changes in the human gut microbiome are associated with altered human metabolism and health, yet the mechanisms of interactions between microbial species and human metabolism have not been clearly elucidated. Next-generation sequencing has revolutionized the human gut microbiome research, but most current applications concentrate on studying the microbial diversity of communities and have at best provided associations between specific gut bacteria and human health. However, little is known ab...
Shen, Kang-Ning; Chang, Chih-Wei; Chen, Ching-Hung; Hsiao, Chung-Der
In this study, the complete mitogenome sequence of the Regal angelfish, Pygoplites diacanthus (Perciformes: Pomacanthidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome consisting of 16,784 bp includes 13 protein coding genes, 22 transfer RNAs, and two ribosomal RNAs genes. The overall base composition of Regal angelfish is 28.5% for A, 28.9% for C, 16.3% for G, 26.4% for T and show 85% identities to flame angelfish Centropyge loricula. The complete mitogenome of the Regal angelfish provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for marine angelfish phylogeny.
Shen, Kang-Ning; Chang, Chih-Wei; Chen, Ching-Hung; Chassaing, Alexandre; Hsiao, Chung-Der
In this study, the complete mitogenome sequence of the Japanese angelfish, Centropyge interrupta (Perciformes: Pomacanthidae), has been sequenced by the next-generation sequencing method. The assembled mitogenome consisting of 16,595 bp includes 13 protein coding genes, 22 transfer RNAs, and two ribosomal RNAs genes. The overall base composition of Japanese angelfish is 27.5% for A, 29.3% for C, 17.3% for G, 25.9% for T, and shows 85% identities to flame angelfish Centropyge loriculus. The complete mitogenome of the Japanese angelfish provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for marine angelfish phylogeny.
Shen, Kang-Ning; Chang, Chih-Wei; Loh, Kar-Hoe; Chen, Ching-Hung; Hsiao, Chung-Der
In this study, the complete mitogenome sequence of the Clarion angelfish, Holacanthus clarionensis (Perciformes: Pomacanthidae) has been sequenced by next-generation sequencing method. The length of the assembled mitogenome is 16,615 bp, including 13 protein coding genes, 22 transfer RNAs, and two ribosomal RNAs genes. The overall base composition of Clarion angelfish is 28.3% for A, 29.3% for C, 16.5% for G, 25.9% for T and show 85% identities to flame angelfish Centropyge loriculus. The complete mitogenome of the Clarion angelfish provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for marine angelfish phylogeny.
Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine
at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...
Full Text Available The analysis of next-generation sequence (NGS data is often a fragmented step-wise process. For example, multiple pieces of software are typically needed to map NGS reads, extract variant sites, and construct a DNA sequence matrix containing only single nucleotide polymorphisms (i.e., a SNP matrix for a set of individuals. The management and chaining of these software pieces and their outputs can often be a cumbersome and difficult task. Here, we present CFSAN SNP Pipeline, which combines into a single package the mapping of NGS reads to a reference genome with Bowtie2, processing of those mapping (BAM files using SAMtools, identification of variant sites using VarScan, and production of a SNP matrix using custom Python scripts. We also introduce a Python package (CFSAN SNP Mutator that when given a reference genome will generate variants of known position against which we validate our pipeline. We created 1,000 simulated Salmonella enterica sp. enterica Serovar Agona genomes at 100× and 20× coverage, each containing 500 SNPs, 20 single-base insertions and 20 single-base deletions. For the 100× dataset, the CFSAN SNP Pipeline recovered 98.9% of the introduced SNPs and had a false positive rate of 1.04 × 10−6; for the 20× dataset 98.8% of SNPs were recovered and the false positive rate was 8.34 × 10−7. Based on these results, CFSAN SNP Pipeline is a robust and accurate tool that it is among the first to combine into a single executable the myriad steps required to produce a SNP matrix from NGS data. Such a tool is useful to those working in an applied setting (e.g., food safety traceback investigations as well as for those interested in evolutionary questions.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone
Full Text Available Open source single nucleotide polymorphism (SNP discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2, SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a
Keller, A; Danner, N; Grimmer, G; Ankenbrand, M; von der Ohe, K; von der Ohe, W; Rost, S; Härtel, S; Steffan-Dewenter, I
The identification of pollen plays an important role in ecology, palaeo-climatology, honey quality control and other areas. Currently, expert knowledge and reference collections are essential to identify pollen origin through light microscopy. Pollen identification through molecular sequencing and DNA barcoding has been proposed as an alternative approach, but the assessment of mixed pollen samples originating from multiple plant species is still a tedious and error-prone task. Next-generation sequencing has been proposed to avoid this hindrance. In this study we assessed mixed pollen probes through next-generation sequencing of amplicons from the highly variable, species-specific internal transcribed spacer 2 region of nuclear ribosomal DNA. Further, we developed a bioinformatic workflow to analyse these high-throughput data with a newly created reference database. To evaluate the feasibility, we compared results from classical identification based on light microscopy from the same samples with our sequencing results. We assessed in total 16 mixed pollen samples, 14 originated from honeybee colonies and two from solitary bee nests. The sequencing technique resulted in higher taxon richness (deeper assignments and more identified taxa) compared to light microscopy. Abundance estimations from sequencing data were significantly correlated with counted abundances through light microscopy. Simulation analyses of taxon specificity and sensitivity indicate that 96% of taxa present in the database are correctly identifiable at the genus level and 70% at the species level. Next-generation sequencing thus presents a useful and efficient workflow to identify pollen at the genus and species level without requiring specialised palynological expert knowledge. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.
Maria Ximena Sosa
Full Text Available We describe methods for rapid sequencing of the entire human mitochondrial genome (mtgenome, which involve long-range PCR for specific amplification of the mtgenome, pyrosequencing, quantitative mapping of sequence reads to identify sequence variants and heteroplasmy, as well as de novo sequence assembly. These methods have been used to study 40 publicly available HapMap samples of European (CEU and African (YRI ancestry to demonstrate a sequencing error rate <5.63×10(-4, nucleotide diversity of 1.6×10(-3 for CEU and 3.7×10(-3 for YRI, patterns of sequence variation consistent with earlier studies, but a higher rate of heteroplasmy varying between 10% and 50%. These results demonstrate that next-generation sequencing technologies allow interrogation of the mitochondrial genome in greater depth than previously possible which may be of value in biology and medicine.
The application of next-generation sequencing (NGS) technologies for the development of simple sequence repeat (SSR) or microsatellite loci for genetic research in the botanical sciences is described. The major advantage of using NGS methods to isolate SSR loci is their ability to quickly and cost-e...
Rockenbauer, Eszter; Hansen, Stine; Mikkelsen, Martin
We sequenced the D21S11 locus in 77 individuals from Danish paternity cases using 454 FLX next generation sequencing (NGS) technology. All samples were also typed with the AmpFlSTR(®) Profiler Plus(®) or the AmpFlSTR(®) Identifiler(®) PCR Amplification kits as part of paternity investigations...
Full Text Available The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays.Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis. Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay.The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.
Ralf, Arwin; Montiel González, Diego; Zhong, Kaiyin; Kayser, Manfred
Next generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.
Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer; Jun, Albert S; Asnaghi, Laura; Salzberg, Steven L; Eberhart, Charles G
We test the ability of next-generation sequencing, combined with computational analysis, to identify a range of organisms causing infectious keratitis. This retrospective study evaluated 16 cases of infectious keratitis and four control corneas in formalin-fixed tissues from the pathology laboratory. Infectious cases also were analyzed in the microbiology laboratory using culture, polymerase chain reaction, and direct staining. Classified sequence reads were analyzed with two different metagenomics classification engines, Kraken and Centrifuge, and visualized using the Pavian software tool. Sequencing generated 20 to 46 million reads per sample. On average, 96% of the reads were classified as human, 0.3% corresponded to known vectors or contaminant sequences, 1.7% represented microbial sequences, and 2.4% could not be classified. The two computational strategies successfully identified the fungal, bacterial, and amoebal pathogens in most patients, including all four bacterial and mycobacterial cases, five of six fungal cases, three of three Acanthamoeba cases, and one of three herpetic keratitis cases. In several cases, additional potential pathogens also were identified. In one case with cytomegalovirus identified by Kraken and Centrifuge, the virus was confirmed by direct testing, while two where Staphylococcus aureus or cytomegalovirus were identified by Centrifuge but not Kraken could not be confirmed. Confirmation was not attempted for an additional three potential pathogens identified by Kraken and 11 identified by Centrifuge. Next generation sequencing combined with computational analysis can identify a wide range of pathogens in formalin-fixed corneal specimens, with potential applications in clinical diagnostics and research.
Full Text Available Microsatellites, or simple sequence repeats (SSRs, are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq and related tools for mining and development of microsatellites in plants.
Daoud, Hussein; Luco, Stephanie M.; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M.; Graham, Gail E.; Richer, Julie; Armour, Christine; Bulman, Dennis E.; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A.; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M.; Dyment, David A.
Background: Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. Methods: We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children’s Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype–phenotype correlations. Results: Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys–Drash syndrome. Interpretation: This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. PMID:27241786
Daoud, Hussein; Luco, Stephanie M; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M; Graham, Gail E; Richer, Julie; Armour, Christine; Bulman, Dennis E; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M; Dyment, David A
Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children's Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype-phenotype correlations. Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys-Drash syndrome. This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. © 2016 Canadian Medical Association or its licensors.
Allard Marc W
Full Text Available Abstract Background Next-Generation Sequencing (NGS is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of S. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 S. Montevideo isolates previously published during an NGS-based molecular epidemiological case study. Results Intraserovar lineages of S. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between S. Montevideo and other distinct serovars. Much less variability was discovered within an individual S. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data. Conclusions Implementation of a validated pipeline for NGS data acquisition and
Rama R Gullapalli
Full Text Available The Human Genome Project (HGP provided the initial draft of mankind′s DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.  We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it′s hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Gullapalli, Rama R; Desai, Ketaki V; Santana-Santos, Lucas; Kant, Jeffrey A; Becich, Michael J
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized. We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Xia, Junfeng; Wang, Qingguo; Jia, Peilin; Wang, Bing; Pao, William; Zhao, Zhongming
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits. © 2012 Wiley Periodicals, Inc.
Yun, Sajung; Yun, Sijung
Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
Marosy, Beth A.; Craig, Brian D.; Hetrick, Kurt N.; Witmer, P. Dane; Ling, Hua; Griffith, Sean M.; Myers, Ben; Ostrander, Elaine A.; Stanford, Janet L.; Brody, Lawrence C.; Doheny, Kimberly F.
This unit describes a protocol for generating exome enriched sequencing libraries using DNA extracted from Formalin Fixed Paraffin Embedded (FFPE) samples. Utilizing commercially available kits, we present a low input FFPE workflow starting with 50ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution targeted selection for exons, followed by sequencing using the Illumina next generation short read sequencing platform. PMID:28075488
Rico, Ciro; Normandeau, Eric; Dion-Côté, Anne-Marie; Rico, María Inés; Côté, Guillaume; Bernatchez, Louis
Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes.
Kawahara-Miki, Ryouka; Sano, Satoshi; Nunome, Mitsuo; Shimmura, Tsuyoshi; Kuwayama, Takehito; Takahashi, Shinji; Kawashima, Takaharu; Matsuda, Yoichi; Yoshimura, Takashi; Kono, Tomohiro
The Japanese quail has several advantages as a laboratory animal for biological and biomedical investigations. In this study, the draft genome of the Japanese quail was sequenced and assembled using next-generation sequencing technology. To improve the quality of the assembly, the sequence reads from the Japanese quail were aligned against the reference genome of the chicken. The final draft assembly consisted of 1.75 Gbp with an N50 contig length of 11,409 bp. On the basis of the draft genome sequence obtained, we developed 100 microsatellite markers and used these markers to evaluate the genetic variability and diversity of 11 lines of Japanese quail. Furthermore, we identified Japanese quail orthologs of spermatogenesis markers and analyzed their expression using in situ hybridization. The Japanese quail genome sequence obtained in the present study could enhance the value of this species as a model animal. Copyright © 2013 Elsevier Inc. All rights reserved.
Rico, Ciro; Normandeau, Eric; Dion-Côté, Anne-Marie; Rico, María Inés; Côté, Guillaume; Bernatchez, Louis
Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes. PMID:24296905
Beránek, Martin; Sirák, Igor; Vošmik, Milan; Petera, Jiří; Drastíková, Monika; Palička, Vladimír
The aims of the study were: i) to compare circulating tumor DNA (ctDNA) yields obtained by different manual extraction procedures, ii) to evaluate the addition of various carrier molecules into the plasma to improve ctDNA extraction recovery, and iii) to use next generation sequencing (NGS) technology to analyze KRAS, BRAF, and NRAS somatic mutations in ctDNA from patients with metastatic colorectal cancer. Venous blood was obtained from patients who suffered from metastatic colorectal carcinoma. For plasma ctDNA extraction, the following carriers were tested: carrier RNA, polyadenylic acid, glycogen, linear acrylamide, yeast tRNA, salmon sperm DNA, and herring sperm DNA. Each extract was characterized by quantitative real-time PCR and next generation sequencing. The addition of polyadenylic acid had a significant positive effect on the amount of ctDNA eluted. The sequencing data revealed five cases of ctDNA mutated in KRAS and one patient with a BRAF mutation. An agreement of 86% was found between tumor tissues and ctDNA. Testing somatic mutations in ctDNA seems to be a promising tool to monitor dynamically changing genotypes of tumor cells circulating in the body. The optimized process of ctDNA extraction should help to obtain more reliable sequencing data in patients with metastatic colorectal cancer.
Michael V Zaragoza
Full Text Available Mutations in mitochondrial DNA (mtDNA may cause maternally-inherited cardiomyopathy and heart failure. In homoplasmy all mtDNA copies contain the mutation. In heteroplasmy there is a mixture of normal and mutant copies of mtDNA. The clinical phenotype of an affected individual depends on the type of genetic defect and the ratios of mutant and normal mtDNA in affected tissues. We aimed at determining the sensitivity of next-generation sequencing compared to Sanger sequencing for mutation detection in patients with mitochondrial cardiomyopathy. We studied 18 patients with mitochondrial cardiomyopathy and two with suspected mitochondrial disease. We "shotgun" sequenced PCR-amplified mtDNA and multiplexed using a single run on Roche's 454 Genome Sequencer. By mapping to the reference sequence, we obtained 1,300x average coverage per case and identified high-confidence variants. By comparing these to >400 mtDNA substitution variants detected by Sanger, we found 98% concordance in variant detection. Simulation studies showed that >95% of the homoplasmic variants were detected at a minimum sequence coverage of 20x while heteroplasmic variants required >200x coverage. Several Sanger "misses" were detected by 454 sequencing. These included the novel heteroplasmic 7501T>C in tRNA serine 1 in a patient with sudden cardiac death. These results support a potential role of next-generation sequencing in the discovery of novel mtDNA variants with heteroplasmy below the level reliably detected with Sanger sequencing. We hope that this will assist in the identification of mtDNA mutations and key genetic determinants for cardiomyopathy and mitochondrial disease.
Hoffman, Jodi D; Greger, Valerie; Strovel, Erin T; Blitzer, Miriam G; Umbarger, Mark A; Kennedy, Caleb; Bishop, Brian; Saunders, Patrick; Porreca, Gregory J; Schienda, Jaclyn; Davie, Jocelyn; Hallam, Stephanie; Towne, Charles
Tay-Sachs disease (TSD) is the prototype for ethnic-based carrier screening, with a carrier rate of ∼1/27 in Ashkenazi Jews and French Canadians. HexA enzyme analysis is the current gold standard for TSD carrier screening (detection rate ∼98%), but has technical limitations. We compared DNA analysis by next-generation DNA sequencing (NGS) plus an assay for the 7.6 kb deletion to enzyme analysis for TSD carrier screening using 74 samples collected from participants at a TSD family conference. ...
Sims, David; Mendes-Pereira, Ana M; Frankum, Jessica; Burgess, Darren; Cerone, Maria-Antonietta; Lombardelli, Cristina; Mitsopoulos, Costas; Hakas, Jarle; Murugaesu, Nirupa; Isacke, Clare M; Fenwick, Kerry; Assiotis, Ioannis; Kozarewa, Iwanka; Zvelebil, Marketa; Ashworth, Alan; Lord, Christopher J
RNA interference (RNAi) screening is a state-of-the-art technology that enables the dissection of biological processes and disease-related phenotypes. The commercial availability of genome-wide, short hairpin RNA (shRNA) libraries has fueled interest in this area but the generation and analysis of these complex data remain a challenge. Here, we describe complete experimental protocols and novel open source computational methodologies, shALIGN and shRNAseq, that allow RNAi screens to be rapidly deconvoluted using next generation sequencing. Our computational pipeline offers efficient screen analysis and the flexibility and scalability to quickly incorporate future developments in shRNA library technology.
Michael D. Boyle
Full Text Available Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.
Boyle, Michael D
Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.
Yigit, Erbay; Feehery, George R; Langhorst, Bradley W; Stewart, Fiona J; Dimalanta, Eileen T; Pradhan, Sriharsa; Slatko, Barton; Gardner, Andrew F; McFarland, James; Sumner, Christine; Davis, Theodore B
"Microbiome" is used to describe the communities of microorganisms and their genes in a particular environment, including communities in association with a eukaryotic host or part of a host. One challenge in microbiome analysis concerns the presence of host DNA in samples. Removal of host DNA before sequencing results in greater sequence depth of the intended microbiome target population. This unit describes a novel method of microbial DNA enrichment in which methylated host DNA such as human genomic DNA is selectively bound and separated from microbial DNA before next-generation sequencing (NGS) library construction. This microbiome enrichment technique yields a higher fraction of microbial sequencing reads and improved read quality resulting in a reduced cost of downstream data generation and analysis. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Kuśmirek, Wiktor; Nowak, Robert M.; Neumann, Łukasz
The next generation sequencing techniques produce a large amount of sequencing data. Some part of the genome are composed of repetitive DNA sequences, which are very problematic for the existing genome assemblers. We propose a modification of the algorithm for a DNA assembly, which uses the relative frequency of reads to properly reconstruct repetitive sequences. The new approach was implemented and tested, as a demonstration of the capability of our software we present some results for model organisms. The new implementation, using a three-layer software architecture was selected, where the presentation layer, data processing layer, and data storage layer were kept separate. Source code as well as demo application with web interface and the additional data are available at project web-page: http://dnaasm.sourceforge.net.
Sucher, Nikolaus J; Hennell, James R; Carles, Maria C
DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.
Full Text Available High-throughput next-generation sequencing (NGS technology produces a tremendous amount of raw sequence data. The challenges for researchers are to process the raw data, to map the sequences to genome, to discover variants that are different from the reference genome, and to prioritize/rank the variants for the question of interest. The recent development of many computational algorithms and programs has vastly improved the ability to translate sequence data into valuable information for disease gene identification. However, the NGS data analysis is complex and could be overwhelming for researchers who are not familiar with the process. Here, we outline the analysis pipeline and describe some of the most commonly used principles and tools for analyzing NGS data for disease gene identification.
Verma, Renu; Sharma, Prakash C
Gastric cancer (GC) is one of the leading causes of cancer related mortality in the world. Being asymptomatic in nature till advanced stage, diagnosis of gastric cancer becomes difficult in early stages of the disease. The onset and progression of gastric cancer has been attributed to multiple factors including genetic alterations, epigenetic modifications, Helicobacter pylori and Epstein-Barr Virus (EBV) infection, and dietary habits. Next Generation Sequencing (NGS) based approaches viz . Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES), RNA-Seq, and targeted sequencing have expanded the knowledge base of molecular pathogenesis of gastric cancer. In this review, we highlight recent NGS-based advances covering various genetic alterations (Microsatellite Instability, Single Nucleotide Variations, and Copy Number Variations), epigenetic changes (DNA methylation, histone modification, microRNAs) and differential gene expression during gastric tumorigenesis. We also briefly discuss the current and future potential biomarkers, drugs and therapeutic approaches available for the management of gastric cancer.
Full Text Available Structural variants are genomic rearrangements larger than 50 bp accounting for around1% of the variation among human genomes. They impact on phenotypic diversityand play a role in various diseases including neurological/neurocognitive disordersand cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approacheshave been proposed in the literature. In this mini review we describe and summarisethe latest tools – and their underlying algorithms – designed for the analysis ofwhole-genome sequencing, whole-exome sequencing, custom captures and ampliconsequencing data, pointing out the major advantages/drawbacks. We also report asummary of the most recent applications of third-generation sequencing platforms.This assessment provides a guided indication – with particular emphasis on humangenetics and copy number variants – for researchers involved in the investigation of thesegenomic events.
Qiu, Biyuan; Ma, Tao; Peng, Chunyan; Zheng, Xiaoqin; Yang, Jiyun
The diagnosis of oculocutaneous albinism (OCA) is established using clinical signs and symptoms. OCA is, however, a highly genetically heterogeneous disease with mutations identified in at least nineteen unique genes, many of which produce overlapping phenotypic traits. Thus, differentiating genetic OCA subtypes for diagnoses and genetic counseling is challenging, based on clinical presentation alone, and would benefit from a comprehensive molecular diagnostic. To develop and validate a more comprehensive, targeted, next-generation-sequencing-based diagnostic for the identification of OCA-causing variants. The genomic DNA samples from 28 OCA probands were analyzed by targeted next-generation sequencing (NGS), and the candidate variants were confirmed through Sanger sequencing. We observed mutations in the TYR, OCA2, and SLC45A2 genes in 25/28 (89%) patients with OCA. We identified 38 pathogenic variants among these three genes, including 5 novel variants: c.1970G>T (p.Gly657Val), c.1669A>C (p.Thr557Pro), c.2339-2A>C, and c.1349C>G (p.Thr450Arg) in OCA2; c.459_470delTTTTGCTGCCGA (p.Ala155_Phe158del) in SLC45A2. Our findings expand the mutational spectrum of OCA in the Chinese population, and the assay we developed should be broadly useful as a molecular diagnostic and as an aid for genetic counseling for OCA patients.
Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
Full Text Available Identification of driver mutations in lung adenocarcinoma has led to development of targeted agents that are already approved for clinical use or are in clinical trials. Therefore, the number of biomarkers that will be needed to assess is expected to rapidly increase. This calls for the implementation of methods probing the mutational status of multiple genes for inoperable cases, for which limited cytological or bioptic material is available. Cytology specimens from 38 lung adenocarcinomas were subjected to the simultaneous assessment of 504 mutational hotspots of 22 lung cancer-associated genes using 10 nanograms of DNA and Ion Torrent PGM next-generation sequencing. Thirty-six cases were successfully sequenced (95%. In 24/36 cases (67% at least one mutated gene was observed, including EGFR, KRAS, PIK3CA, BRAF, TP53, PTEN, MET, SMAD4, FGFR3, STK11, MAP2K1. EGFR and KRAS mutations, respectively found in 6/36 (16% and 10/36 (28% cases, were mutually exclusive. Nine samples (25% showed concurrent alterations in different genes. The next-generation sequencing test used is superior to current standard methodologies, as it interrogates multiple genes and requires limited amounts of DNA. Its applicability to routine cytology samples might allow a significant increase in the fraction of lung cancer patients eligible for personalized therapy.
Balliu, Brunilda; Uh, Hae-Won; Tsonaka, Roula; Boehringer, Stefan; Helmer, Quinta; Houwing-Duistermaat, Jeanine J
In this analysis, we investigate the contributions that linkage-based methods, such as identical-by-descent mapping, can make to association mapping to identify rare variants in next-generation sequencing data. First, we identify regions in which cases share more segments identical-by-descent around a putative causal variant than do controls. Second, we use a two-stage mixed-effect model approach to summarize the single-nucleotide polymorphism data within each region and include them as covariates in the model for the phenotype. We assess the impact of linkage disequilibrium in determining identical-by-descent states between individuals by using markers with and without linkage disequilibrium for the first part and the impact of imputation in testing for association by using imputed genome-wide association studies or raw sequence markers for the second part. We apply the method to next-generation sequencing longitudinal family data from Genetic Association Workshop 18 and identify a significant region at chromosome 3: 40249244-41025167 (p-value = 2.3 × 10(-3)).
Roy, Somak; LaFramboise, William A; Nikiforov, Yuri E; Nikiforova, Marina N; Routbort, Mark J; Pfeifer, John; Nagarajan, Rakesh; Carter, Alexis B; Pantanowitz, Liron
-Next-generation sequencing (NGS) is revolutionizing the discipline of laboratory medicine, with a deep and direct impact on patient care. Although it empowers clinical laboratories with unprecedented genomic sequencing capability, NGS has brought along obvious and obtrusive informatics challenges. Bioinformatics and clinical informatics are separate disciplines with typically a small degree of overlap, but they have been brought together by the enthusiastic adoption of NGS in clinical laboratories. The result has been a collaborative environment for the development of novel informatics solutions. Sustaining NGS-based testing in a regulated clinical environment requires institutional support to build and maintain a practical, robust, scalable, secure, and cost-effective informatics infrastructure. -To discuss the novel NGS informatics challenges facing pathology laboratories today and offer solutions and future developments to address these obstacles. -The published literature pertaining to NGS informatics was reviewed. The coauthors, experts in the fields of molecular pathology, precision medicine, and pathology informatics, also contributed their experiences. -The boundary between bioinformatics and clinical informatics has significantly blurred with the introduction of NGS into clinical molecular laboratories. Next-generation sequencing technology and the data derived from these tests, if managed well in the clinical laboratory, will redefine the practice of medicine. In order to sustain this progress, adoption of smart computing technology will be essential. Computational pathologists will be expected to play a major role in rendering diagnostic and theranostic services by leveraging "Big Data" and modern computing tools.
Sturk-Andreaggi, Kimberly; Peck, Michelle A; Boysen, Cecilie; Dekker, Patrick; McMahon, Timothy P; Marshall, Charla K
The feasibility of generating mitochondrial DNA (mtDNA) data has expanded considerably with the advent of next-generation sequencing (NGS), specifically in the generation of entire mtDNA genome (mitogenome) sequences. However, the analysis of these data has emerged as the greatest challenge to implementation in forensics. To address this need, a custom toolkit for use in the CLC Genomics Workbench (QIAGEN, Hilden, Germany) was developed through a collaborative effort between the Armed Forces Medical Examiner System - Armed Forces DNA Identification Laboratory (AFMES-AFDIL) and QIAGEN Bioinformatics. The AFDIL-QIAGEN mtDNA Expert, or AQME, generates an editable mtDNA profile that employs forensic conventions and includes the interpretation range required for mtDNA data reporting. AQME also integrates an mtDNA haplogroup estimate into the analysis workflow, which provides the analyst with phylogenetic nomenclature guidance and a profile quality check without the use of an external tool. Supplemental AQME outputs such as nucleotide-per-position metrics, configurable export files, and an audit trail are produced to assist the analyst during review. AQME is applied to standard CLC outputs and thus can be incorporated into any mtDNA bioinformatics pipeline within CLC regardless of sample type, library preparation or NGS platform. An evaluation of AQME was performed to demonstrate its functionality and reliability for the analysis of mitogenome NGS data. The study analyzed Illumina mitogenome data from 21 samples (including associated controls) of varying quality and sample preparations with the AQME toolkit. A total of 211 tool edits were automatically applied to 130 of the 698 total variants reported in an effort to adhere to forensic nomenclature. Although additional manual edits were required for three samples, supplemental tools such as mtDNA haplogroup estimation assisted in identifying and guiding these necessary modifications to the AQME-generated profile. Along
Full Text Available Accessory, supernumerary, or—most simply—B chromosomes, are found in many eukaryotic karyotypes. These small chromosomes do not follow the usual pattern of segregation, but rather are transmitted in a higher than expected frequency. As increasingly being demonstrated by next-generation sequencing (NGS, their structure comprises fragments of standard (A chromosomes, although in some plant species, their sequence also includes contributions from organellar genomes. Transcriptomic analyses of various animal and plant species have revealed that, contrary to what used to be the common belief, some of the B chromosome DNA is protein-encoding. This review summarizes the progress in understanding B chromosome biology enabled by the application of next-generation sequencing technology and state-of-the-art bioinformatics. In particular, a contrast is drawn between a direct sequencing approach and a strategy based on a comparative genomics as alternative routes that can be taken towards the identification of B chromosome sequences.
Thompson, Rose; Drew, Cheney J G; Thomas, Rhys H
There has been an academic "gold rush" with researchers mining the deep seams of whole-exome and whole-genome sequencing since 2008. Although undoubtedly a major advance initially for identifying new disease-associated genes for rare monogenetic disorders--more recently, common and complex conditions have been successfully studied using these techniques. With great power comes great responsibility, however, and we must not forget that next generation sequencing produces unique ethical conundrums and validation challenges. We review the progression of published papers using whole-exome sequencing from a clinical and technical viewpoint before then reflecting on the key arguments that need to be fully understood before these tools can become a routine part of clinical practice and we ask what may be the role for the biomedical scientists? Copyright © 2012 Elsevier Inc. All rights reserved.
Van Borm, S; Wang, J; Granberg, F; Colling, A
Recent advancements in DNA sequencing methodologies and sequence data analysis have revolutionised research in many areas of biology and medicine, including veterinary infection biology. New technology is poised to bridge the gap between the research and diagnostic laboratory. This paper defines the potential diagnostic value and purposes of next-generation sequencing (NGS) applications in veterinary infection biology and explores their compatibility with the existing validation principles and methods of the World Organisation for Animal Health. Critical parameters for validation and quality control (quality metrics) are suggested, with reference to established validation and quality assurance guidelines for NGS-based methods of diagnosing human heritable diseases. Although most currently described NGS applications in veterinary infection biology are not primary diagnostic tests that directly result in control measures, this critical reflection on the advantages and remaining challenges of NGS technology should stimulate discussion on its diagnostic value and on the potential to validate NGS methods and monitor their diagnostic performance.
Lyu, Yuqiang; Huang, Jing; Zhang, Kaihui; Liu, Guohua; Gao, Min; Gai, Zhongtao; Liu, Yi
To explore the clinical and genetic features of a Chinese boy with oculocutaneous albinism. The clinical features of the patient were analyzed. The DNA of the patient and his parents was extracted and sequenced by next generation exome capture sequencing. The nature and impact of detected mutation were predicted and validated. The child has displayed strabismus, poor vision, nystagmus and brown hair. DNA sequencing showed that the patient has carried compound heterozygous mutations of the TYRP1 gene, namely c.1214C>A (p.T405N) and c.1333dupG, which were inherited from his mother and father, respectively. Neither mutation was reported previously. The child has suffered from oculocutaneous albinism type Ⅲ caused by mutations of the TYRP1 gene.
Bolz, Hanno Jörn
Within a few years, high-throughput sequencing (next-generation sequencing, NGS) has become a routine method in genetic diagnostics and has largely replaced conventional Sanger sequencing. The complexity of NGS data requires sound bioinformatic analysis: pinpointing the disease-causing variants may be difficult, and erroneous interpretations must be avoided. When looking at the group of retinal dystrophies as an example of eye disorders with extensive genetic heterogeneity, one can clearly say that NGS-based diagnostics yield important information for most patients and physicians, and that it has furthered our knowledge significantly. Furthermore, NGS has accelerated ophthalmogenetic research aimed at the identification of novel eye disease genes. Georg Thieme Verlag KG Stuttgart · New York.
Skotte, Line; Korneliussen, Thorfinn Sand; Albrechtsen, Anders
computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies...... of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach...... to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains...
Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; Salvatore, Francesco; D'Argenio, Valeria
Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics.
Fonager, Jannik; Larsson, Jonas T; Hussing, Christian
BACKGROUND: The current widely applied standard method to screen for HIV-1 genotypic resistance is based on Sanger population sequencing (Sseq), which does not allow for the identification of minority variants (MVs) below the limit of detection for the Sseq-method in patients receiving integrase...... strand-transfer inhibitors (INSTI). Next generation sequencing (NGS) has facilitated the detection of MVs at a much deeper level than Sseq. OBJECTIVES: Here, we compared Illumina MiSeq and Sseq approaches to evaluate the detection of MVs involved in resistance to the three commonly used INSTI......: raltegravir (RAL), elvitegravir (EVG) and dolutegravir (DTG). STUDY DESIGN: NGS and Sseq were used to analyze RT-PCR products of the HIV-1 integrase coding region from six patients and in serial samples from two patients. NGS sequences were assembled and analyzed using the low frequency variant detection...
Full Text Available Next-generation sequencing (NGS technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology’s flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics.
Kwok, Hin; Chiang, Alan Kwok Shing
Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Ghoneim, Dalia H; Myers, Jason R; Tuttle, Emily; Paciorkowski, Alex R
Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms. Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of
Garone, Caterina; Bordoni, Andreina; Gutierrez Rios, Purificacion; Calvo, Sarah E.; Ripolone, Michela; Ranieri, Michela; Rizzuti, Mafalda; Villa, Luisa; Magri, Francesca; Corti, Stefania; Bresolin, Nereo; Mootha, Vamsi K.; Moggio, Maurizio; DiMauro, Salvatore; Comi, Giacomo P.; Sciacco, Monica
The molecular diagnosis of mitochondrial disorders still remains elusive in a large proportion of patients, but advances in next generation sequencing are significantly improving our chances to detect mutations even in sporadic patients. Syndromes associated with mitochondrial DNA multiple deletions are caused by different molecular defects resulting in a wide spectrum of predominantly adult-onset clinical presentations, ranging from progressive external ophthalmoplegia to multi-systemic disorders of variable severity. The mutations underlying these conditions remain undisclosed in half of the affected subjects. We applied next-generation sequencing of known mitochondrial targets (MitoExome) to probands presenting with adult-onset mitochondrial myopathy and harbouring mitochondrial DNA multiple deletions in skeletal muscle. We identified autosomal recessive mutations in the DGUOK gene (encoding mitochondrial deoxyguanosine kinase), which has previously been associated with an infantile hepatocerebral form of mitochondrial DNA depletion. Mutations in DGUOK occurred in five independent subjects, representing 5.6% of our cohort of patients with mitochondrial DNA multiple deletions, and impaired both muscle DGUOK activity and protein stability. Clinical presentations were variable, including mitochondrial myopathy with or without progressive external ophthalmoplegia, recurrent rhabdomyolysis in a young female who had received a liver transplant at 9 months of age and adult-onset lower motor neuron syndrome with mild cognitive impairment. These findings reinforce the concept that mutations in genes involved in deoxyribonucleotide metabolism can cause diverse clinical phenotypes and suggest that DGUOK should be screened in patients harbouring mitochondrial DNA deletions in skeletal muscle. PMID:23043144
Brhelova, Eva; Antonova, Mariya; Pardy, Filip; Kocmanova, Iva; Mayer, Jiri; Racil, Zdenek; Lengerova, Martina
Rapid identification and characterization of multidrug-resistant Klebsiella pneumoniae strains is necessary due to the increasing frequency of severe infections in patients. The decreasing cost of next-generation sequencing enables us to obtain a comprehensive overview of genetic information in one step. The aim of this study is to demonstrate and evaluate the utility and scope of the application of web-based databases to next-generation sequenced (NGS) data. The whole genomes of 11 clinical Klebsiella pneumoniae isolates were sequenced using Illumina MiSeq. Selected web-based tools were used to identify a variety of genetic characteristics, such as acquired antimicrobial resistance genes, multilocus sequence types, plasmid replicons, and identify virulence factors, such as virulence genes, cps clusters, urease-nickel clusters and efflux systems. Using web-based tools hosted by the Center for Genomic Epidemiology, we detected resistance to 8 main antimicrobial groups with at least 11 acquired resistance genes. The isolates were divided into eight sequence types (ST11, 23, 37, 323, 433, 495 and 562, and a new one, ST1646). All of the isolates carried replicons of large plasmids. Capsular types, virulence factors and genes coding AcrAB and OqxAB efflux pumps were detected using BIGSdb-Kp, whereas the selected virulence genes, identified in almost all of the isolates, were detected using CLC Genomic Workbench software. Applying appropriate web-based online tools to NGS data enables the rapid extraction of comprehensive information that can be used for more efficient diagnosis and treatment of patients, while data processing is free of charge, easy and time-efficient.
Meiring, Tracy L; Salimo, Anna T; Coetzee, Beatrix; Maree, Hans J; Moodley, Jennifer; Hitzeroth, Inga I; Freeborough, Michael-John; Rybicki, Ed P; Williamson, Anna-Lise
Human papillomavirus (HPV) is the aetiological agent for cervical cancer and genital warts. Concurrent HPV and HIV infection in the South African population is high. HIV positive (+) women are often infected with multiple, rare and undetermined HPV types. Data on HPV incidence and genotype distribution are based on commercial HPV detection kits, but these kits may not detect all HPV types in HIV + women. The objectives of this study were to (i) identify the HPV types not detected by commercial genotyping kits present in a cervical specimen from an HIV positive South African woman using next generation sequencing, and (ii) determine if these types were prevalent in a cohort of HIV-infected South African women. Total DNA was isolated from 109 cervical specimens from South African HIV + women. A specimen within this cohort representing a complex multiple HPV infection, with 12 HPV genotypes detected by the Roche Linear Array HPV genotyping (LA) kit, was selected for next generation sequencing analysis. All HPV types present in this cervical specimen were identified by Illumina sequencing of the extracted DNA following rolling circle amplification. The prevalence of the HPV types identified by sequencing, but not included in the Roche LA, was then determined in the 109 HIV positive South African women by type-specific PCR. Illumina sequencing identified a total of 16 HPV genotypes in the selected specimen, with four genotypes (HPV-30, 74, 86 and 90) not included in the commercial kit. The prevalence's of HPV-30, 74, 86 and 90 in 109 HIV positive South African women were found to be 14.6%, 12.8%, 4.6% and 8.3% respectively. Our results indicate that there are HPV types, with substantial prevalence, in HIV positive women not being detected in molecular epidemiology studies using commercial kits. The significance of these types in relation to cervical disease remains to be investigated.
Stephanie M Willerth
Full Text Available With an estimated 38 million people worldwide currently infected with human immunodeficiency virus (HIV, and an additional 4.1 million people becoming infected each year, it is important to understand how this virus mutates and develops resistance in order to design successful therapies.We report a novel experimental method for amplifying full-length HIV genomes without the use of sequence-specific primers for high throughput DNA sequencing, followed by assembly of full length viral genome sequences from the resulting large dataset. Illumina was chosen for sequencing due to its ability to provide greater coverage of the HIV genome compared to prior methods, allowing for more comprehensive characterization of the heterogeneity present in the HIV samples analyzed. Our novel amplification method in combination with Illumina sequencing was used to analyze two HIV populations: a homogenous HIV population based on the canonical NL4-3 strain and a heterogeneous viral population obtained from a HIV patient's infected T cells. In addition, the resulting sequence was analyzed using a new computational approach to obtain a consensus sequence and several metrics of diversity.This study demonstrates how a lower bias amplification method in combination with next generation DNA sequencing provides in-depth, complete coverage of the HIV genome, enabling a stronger characterization of the quasispecies present in a clinically relevant HIV population as well as future study of how HIV mutates in response to a selective pressure.
Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro
The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as "DECS-C," is a powerful method for detecting novel plant viruses.
Groisberg, Roman; Roszik, Jason; Conley, Anthony; Patel, Shreyaskumar R; Subbiah, Vivek
Sarcomas are rare, heterogeneous group of soft tissue and bone tumors. Precise diagnosis of specific subtypes is challenging using conventional methods. Herein, we review the role of next-generation sequencing (NGS) technology that is used for rapid sequencing of DNA and RNA. Recent sarcoma specific studies recommend that molecular genetic testing should be added at diagnosis for appropriate clinical management in addition to diagnosis by expert pathologists. NGS has already been used to identify potentially actionable mutations, copy number alterations, and gene fusions. Rationally, choosing a drug based on an individual patient profile aka: "precision oncology" has been so far limited to few case reports in sarcomas. As we improve our ability to deliver personalized medicine using all modalities including conventional therapy, more patients may eventually benefit. As the cost and capacity of NGS outpace Moore's law, so does the probability of success.
Zhu, Ting; Feng, Shaoshu; Liu, Xin; Li, Qingwei
Fundulus heteroclitus (Actinopteri, Cyprinodontiformes, Fundulidae), with a remarkable tolerance to osmotic stress and water temperatures, are regarded as a significant evolution model. Herein, we report the assembled complete sequence of the mummichog mitochondrial genome based on the next-generation sequencing data. The mitogenome is determined to be 16 528 bp in length and shows an organization typical of vertebrate mitochondrial genomes, including 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and 1 control region (D-loop). Overall GC content of the genome is 39.72%. Using Oryzias latipes as the outgroup, the phylogenetic analysis of 16 complete mitochondrial genomes from Cyprinodontiformes showed that F. heteroclitus together with other three Fundulus species form a cluster with strong bootstrap supports. The genus Fundulus is closely related to the genus Xenotoca.
Zhao, Yue; Zhang, Hong; Xia, Xue-shan
Inherited cardiomyopathy is the most common hereditary cardiac disease. It also causes a significant proportion of sudden cardiac deaths in young adults and athletes. So far, approximately one hundred genes have been reported to be involved in cardiomyopathies through different mechanisms. Therefore, the identification of the genetic basis and disease mechanisms of cardiomyopathies are important for establishing a clinical diagnosis and genetic testing. Next-generation semiconductor sequencing (NGSS) technology platform is a high-throughput sequencer capable of analyzing clinically derived genomes with high productivity, sensitivity and specificity. It was launched in 2010 by Life Technologies of USA, and it is based on a high density semiconductor chip, which was covered with tens of thousands of wells. NGSS has been successfully used in candidate gene mutation screening to identify hereditary disease. In this review, we summarize these genetic variations, challenge and application of NGSS in inherited cardiomyopathy, and its value in disease diagnosis, prevention and treatment.
Bian, Jiawen; Zhou, Xiaobo
The rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Hidden Markov models (HMMs) have wide applications in pattern recognition as well as Bioinformatics such as transcription factor binding sites and cis-regulatory modules detection. An application of HMM is introduced in this chapter with the in-deep developing of NGS. Single nucleotide variants (SNVs) inferred from NGS are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNV detection capability in the regulatory regions of the genome. A specific HMM is developed for this purpose to infer the genotype for each position on the genome by incorporating the mapping quality of each read and the corresponding base quality on the reads into the emission probability of HMM. The procedure and the implementation of the algorithm is presented in detail for understanding and programming.
Noguera-Julian, Marc; Edgil, Dianna; Harrigan, P Richard; Sandstrom, Paul; Godfrey, Catherine; Paredes, Roger
High-quality, simplified, and low-cost human immunodeficiency virus (HIV) drug resistance tests that are able to provide timely actionable HIV resistance data at individual, population, and programmatic levels are needed to confront the emerging drug-resistant HIV epidemic. Next-generation sequencing technologies embedded in automated cloud-computing analysis environments are ideally suited for such endeavor. Whereas NGS can reduce costs over Sanger sequencing, automated analysis pipelines make NGS accessible to molecular laboratories regardless of the available bioinformatic skills. They can also produce highly structured, high-quality data that could be examined by healthcare officials and program managers on a real-time basis to allow timely public health action. Here we discuss the opportunities and challenges of such an approach. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: firstname.lastname@example.org.
Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta; Mikkelsen, Martin; Johansen, Peter; Børsting, Claus; Morling, Niels
The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72 individuals using only 24 barcoded libraries.
Bokulich, Nicholas A.; Joseph, C. M. Lucy; Allen, Greg; Benson, Andrew K.; Mills, David A.
While wine fermentation has long been known to involve complex microbial communities, the composition and role of bacteria other than a select set of lactic acid bacteria (LAB) has often been assumed either negligible or detrimental. This study served as a pilot study for using barcoded amplicon next-generation sequencing to profile bacterial community structure in wines and grape musts, comparing the taxonomic depth achieved by sequencing two different domains of prokaryotic 16S rDNA (V4 and V5). This study was designed to serve two goals: 1) to empirically determine the most taxonomically informative 16S rDNA target region for barcoded amplicon sequencing of wine, comparing V4 and V5 domains of bacterial 16S rDNA to terminal restriction fragment length polymorphism (TRFLP) of LAB communities; and 2) to explore the bacterial communities of wine fermentation to better understand the biodiversity of wine at a depth previously unattainable using other techniques. Analysis of amplicons from the V4 and V5 provided similar views of the bacterial communities of botrytized wine fermentations, revealing a broad diversity of low-abundance taxa not traditionally associated with wine, as well as atypical LAB communities initially detected by TRFLP. The V4 domain was determined as the more suitable read for wine ecology studies, as it provided greater taxonomic depth for profiling LAB communities. In addition, targeted enrichment was used to isolate two species of Alphaproteobacteria from a finished fermentation. Significant differences in diversity between inoculated and uninoculated samples suggest that Saccharomyces inoculation exerts selective pressure on bacterial diversity in these fermentations, most notably suppressing abundance of acetic acid bacteria. These results determine the bacterial diversity of botrytized wines to be far higher than previously realized, providing further insight into the fermentation dynamics of these wines, and demonstrate the utility of next-generation
Full Text Available Qing-Xuan Wang, En-Dong Chen, Ye-Feng Cai, Yi-Li Zhou, Zhou-Ci Zheng, Ying-Hao Wang, Yi-Xiang Jin, Wen-Xu Jin, Xiao-Hua Zhang, Ou-Chen Wang Department of Oncology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China Purpose: Thyroid cancer is the most frequent malignancies of the endocrine system, and it has became the fastest growing type of cancer worldwide. Much still remains unknown about the molecular mechanisms of thyroid cancer. Studies have found that some certain relationship between ARAP3 and human cancer. However, the role of ARAP3 in thyroid cancer has not been well explained. This study aimed to investigate the role of ARAP3 gene in papillary thyroid carcinoma. Methods: Whole exon sequence and whole genome sequence of primary papillary thyroid carcinoma (PTC samples and matched adjacent normal thyroid tissue samples were performed and then bioinformatics analysis was carried out. PTC cell lines (TPC1, BCPAP, and KTC-1 with transfection of small interfering RNA were used to investigate the functions of ARAP3 gene, including cell proliferation assay, colony formation assay, migration assay, and invasion assay. Results: Using next-generation sequence and bioinformatics analysis, we found ARAP3 genes may play an important role in thyroid cancer. Downregulation of ARAP3 significantly suppressed PTC cell lines (TPC1, BCPAP, and KTC-1, cell proliferation, colony formation, migration, and invasion. Conclusion: This study indicated that ARAP3 genes have important biological implications and may act as a potentially drugable target in PTC. Keywords: papillary thyroid carcinoma, next-generation sequence, ARAP3, oncogene
Full Text Available To assess the clinical utility of targeted Next-Generation Sequencing (NGS for the diagnosis of Inherited Retinal Dystrophies (IRDs, a total of 109 subjects were enrolled in the study, including 88 IRD affected probands and 21 healthy relatives. Clinical diagnoses included Retinitis Pigmentosa (RP, Leber Congenital Amaurosis (LCA, Stargardt Disease (STGD, Best Macular Dystrophy (BMD, Usher Syndrome (USH, and other IRDs with undefined clinical diagnosis. Participants underwent a complete ophthalmologic examination followed by genetic counseling. A custom AmpliSeq™ panel of 72 IRD-related genes was designed for the analysis and tested using Ion semiconductor Next-Generation Sequencing (NGS. Potential disease-causing mutations were identified in 59.1% of probands, comprising mutations in 16 genes. The highest diagnostic yields were achieved for BMD, LCA, USH, and STGD patients, whereas RP confirmed its high genetic heterogeneity. Causative mutations were identified in 17.6% of probands with undefined diagnosis. Revision of the initial diagnosis was performed for 9.6% of genetically diagnosed patients. This study demonstrates that NGS represents a comprehensive cost-effective approach for IRDs molecular diagnosis. The identification of the genetic alterations underlying the phenotype enabled the clinicians to achieve a more accurate diagnosis. The results emphasize the importance of molecular diagnosis coupled with clinic information to unravel the extensive phenotypic heterogeneity of these diseases.
Full Text Available BACKGROUND: Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. RESULTS: We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics. Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. CONCLUSIONS: NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.
Wilkinson, Mike J; Szabo, Claudia; Ford, Caroline S; Yarom, Yuval; Croxford, Adam E; Camp, Amanda; Gooding, Paul
We estimate the global BOLD Systems database holds core DNA barcodes (rbcL + matK) for about 15% of land plant species and that comprehensive species coverage is still many decades away. Interim performance of the resource is compromised by variable sequence overlap and modest information content within each barcode. Our model predicts that the proportion of species-unique barcodes reduces as the database grows and that 'false' species-unique barcodes remain >5% until the database is almost complete. We conclude the current rbcL + matK barcode is unfit for purpose. Genome skimming and supplementary barcodes could improve diagnostic power but would slow new barcode acquisition. We therefore present two novel Next Generation Sequencing protocols (with freeware) capable of accurate, massively parallel de novo assembly of high quality DNA barcodes of >1400 bp. We explore how these capabilities could enhance species diagnosis in the coming decades.
Vidaki, Athina; Ballard, David; Aliferi, Anastasia; Miller, Thomas H; Barron, Leon P; Syndercombe Court, Denise
The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person's lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2-90 years) analysed with Illumina's genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R 2 =0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R 2 =0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers' potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R 2 =0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next
Full Text Available Abstract Background Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., X. However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. Results We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1 accuracy of allele frequency estimation, (2 accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3 statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. Conclusions Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
Martínez, Francisco; Caro-Llopis, Alfonso; Roselló, Mónica; Oltra, Silvestre; Mayo, Sonia; Monfort, Sandra; Orellana, Carmen
Intellectual disability is a very complex condition where more than 600 genes have been reported. Due to this extraordinary heterogeneity, a large proportion of patients remain without a specific diagnosis and genetic counselling. The need for new methodological strategies in order to detect a greater number of mutations in multiple genes is therefore crucial. In this work, we screened a large panel of 1256 genes (646 pathogenic, 610 candidate) by next-generation sequencing to determine the molecular aetiology of syndromic intellectual disability. A total of 92 patients, negative for previous genetic analyses, were studied together with their parents. Clinically relevant variants were validated by conventional sequencing. A definitive diagnosis was achieved in 29 families by testing the 646 known pathogenic genes. Mutations were found in 25 different genes, where only the genes KMT2D, KMT2A and MED13L were found mutated in more than one patient. A preponderance of de novo mutations was noted even among the X linked conditions. Additionally, seven de novo probably pathogenic mutations were found in the candidate genes AGO1, JARID2, SIN3B, FBXO11, MAP3K7, HDAC2 and SMARCC2. Altogether, this means a diagnostic yield of 39% of the cases (95% CI 30% to 49%). The developed panel proved to be efficient and suitable for the genetic diagnosis of syndromic intellectual disability in a clinical setting. Next-generation sequencing has the potential for high-throughput identification of genetic variations, although the challenges of an adequate clinical interpretation of these variants and the knowledge on further unknown genes causing intellectual disability remain to be solved. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Casaril, Aline Etelvina; de Oliveira, Liliane Prado; Alonso, Diego Peres; de Oliveira, Everton Falcão; Gomes Barrios, Suellem Petilim; de Oliveira Moura Infran, Jucelei; Fernandes, Wagner de Souza; Oshiro, Elisa Teruya; Ferreira, Alda Maria Teixeira; Ribolla, Paulo Eduardo Martins; de Oliveira, Alessandra Gutierrez
Standardization of the methods for extraction of DNA from sand flies is essential for obtaining high efficiency during subsequent molecular analyses, such as the new sequencing methods. Information obtained using these methods may contribute substantially to taxonomic, evolutionary, and eco-epidemiological studies. The aim of the present study was to standardize and compare two methods for the extraction of genomic DNA from sand flies for obtaining DNA in sufficient quantities for next-generation sequencing. Sand flies were collected from the municipalities of Campo Grande, Camapuã, Corumbá and Miranda, state of Mato Grosso do Sul, Brazil. Three protocols using a silica column-based commercial kit (ReliaPrep™ Blood gDNA Miniprep System kit, Promega ® ), and three protocols based on the classical phenol-chloroform extraction method (Uliana et al., 1991), were compared with respect to the yield and quality of the extracted DNA. DNA was quantified using a Qubit 2.0 fluorometer. The presence of sand fly DNA was confirmed by PCR amplification of the IVS6 region (constitutive gene), followed by electrophoresis on a 1.5% agarose gel. A total of 144 male specimens were analyzed, 72 per method. Significant differences were observed between the two methods tested. Protocols 2 and 3 of phenol-chloroform extraction presented significantly better performance than all commercial kit extraction protocols tested. For phenol-chloroform extraction, protocol 3 presented significantly better performance than protocols 1 and 2. The IVS6 region was detected in 70 of 72 (97.22%) samples extracted with phenol, including all samples for protocols 2 and 3. This is the first study on the standardization of methods for the extraction of DNA from sand flies for application to next-generation sequencing, which is a promising tool for entomological and molecular studies of sand flies. Copyright © 2017 Elsevier Inc. All rights reserved.
Jose E. Kroll
Full Text Available Motivation. Alternative splicing events (ASEs are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many different samples need to be compared. Some popular tools for the analysis of ASEs are known to report thousands of events without annotations and/or graphical representations. A new tool for the identification and visualization of ASEs is here described, which can be used by biologists without a solid bioinformatics background.Results. A software suite named Splicing Express was created to perform ASEs analysis from transcriptome sequencing data derived from next-generation DNA sequencing platforms. Its major goal is to serve the needs of biomedical researchers who do not have bioinformatics skills. Splicing Express performs automatic annotation of transcriptome data (GTF files using gene coordinates available from the UCSC genome browser and allows the analysis of data from all available species. The identification of ASEs is done by a known algorithm previously implemented in another tool named Splooce. As a final result, Splicing Express creates a set of HTML files composed of graphics and tables designed to describe the expression profile of ASEs among all analyzed samples. By using RNA-Seq data from the Illumina Human Body Map and the Rat Body Map, we show that Splicing Express is able to perform all tasks in a straightforward way, identifying well-known specific events.Availability and Implementation.Splicing Express is written in Perl and is suitable to run only in UNIX-like systems. More details can be found at: http://www.bioinformatics-brazil.org/splicingexpress.
Full Text Available Abstract Background The complex genome of rapeseed (Brassica napus is not well understood despite the economic importance of the species. Good knowledge of sequence variation is needed for genetics approaches and breeding purposes. We used a diversity set of B. napus representing eight different germplasm types to sequence genome-wide distributed restriction-site associated DNA (RAD fragments for polymorphism detection and genotyping. Results More than 113,000 RAD clusters with more than 20,000 single nucleotide polymorphisms (SNPs and 125 insertions/deletions were detected and characterized. About one third of the RAD clusters and polymorphisms mapped to the Brassica rapa reference sequence. An even distribution of RAD clusters and polymorphisms was observed across the B. rapa chromosomes, which suggests that there might be an equal distribution over the Brassica oleracea chromosomes, too. The representation of Gene Ontology (GO terms for unigenes with RAD clusters and polymorphisms revealed no signature of selection with respect to the distribution of polymorphisms within genes belonging to a specific GO category. Conclusions Considering the decreasing costs for next-generation sequencing, the results of our study suggest that RAD sequencing is not only a simple and cost-effective method for high-density polymorphism detection but also an alternative to SNP genotyping from transcriptome sequencing or SNP arrays, even for species with complex genomes such as B. napus.
Full Text Available Primary immunodeficiencies (PIDs are genetic disorders impairing host immunity, leading to life-threatening infections, autoimmunity, and/or malignancies. Genomic technologies have been critical for expediting the discovery of novel genetic defects underlying PIDs, expanding our knowledge of the complex clinical phenotypes associated with PIDs, and in shifting paradigms of PID pathogenesis. Once considered Mendelian, monogenic, and completely penetrant disorders, genomic studies have redefined PIDs as a heterogeneous group of diseases found in the global population that may arise through multigenic defects, non-germline transmission, and with variable penetrance. This review examines the uses of next-generation DNA sequencing (NGS in the diagnosis of PIDs. While whole genome sequencing identifies variants throughout the genome, whole exome sequencing sequences only the protein-coding regions within a genome, and targeted gene panels sequence only a specific cohort of genes. The advantages and limitations of each sequencing approach are compared. The complexities of variant interpretation and variant validation remain the major challenge in wide-spread implementation of these technologies. Lastly, the roles of NGS in newborn screening and precision therapeutics for individuals with PID are also addressed.
Full Text Available Abstract Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i k-mer-based error correction (KEC and (ii empirical frequency threshold (ET. Both were compared to a previously published clustering algorithm (SHORAH, in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm
Marosy, Beth A; Craig, Brian D; Hetrick, Kurt N; Witmer, P Dane; Ling, Hua; Griffith, Sean M; Myers, Benjamin; Ostrander, Elaine A; Stanford, Janet L; Brody, Lawrence C; Doheny, Kimberly F
This unit describes a technique for generating exome-enriched sequencing libraries using DNA extracted from formalin-fixed paraffin-embedded (FFPE) samples. Utilizing commercially available kits, we present a low-input FFPE workflow starting with 50 ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution-targeted selection for exons, followed by sequencing using the Illumina next-generation short-read sequencing platform. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Rieneck, Klaus; Clausen, Frederik Banch; Dziegiel, Morten Hanefeld
Hemolytic disease of the fetus and newborn (HDFN) is a condition characterized by a decreased lifespan of fetal red blood cells caused by maternally produced allospecific antibodies transferred to the fetus during pregnancy. The antibodies bind to the corresponding blood group antigens on fetal red...... blood cells and induce hemolysis. Cell-free DNA derived fromthe conceptus circulates in maternal blood. Using next-generation sequencing (NGS), it can be determined if this cell-free fetalDNA encodes the corresponding blood group antigen that is the target of the maternal allospecific antibodies....... This determination carries no risk to the fetus. It is important to determine if the fetus is at risk of hemolysis to enable timely intervention. Many tests for blood groups are based solely on the presence or absence of a single nucleotide polymorphism (SNP). Antenatal determination of fetal blood group by...
Pluess-Li, Ying; Bongiovanni, Sandrine; Oakeley, Edward J; Johnson, Keith J; Staedtler, Frank
External access to scientific technology plays an increasingly important part in pharmaceutical R&D. One advantage of accessing technology externally is the avoidance of costs associated with purchase and the reduced time required for developing new methods; in addition, access to external scientific expertise can be beneficial. However, few conceptual frameworks exist for achieving an optimal mix of internal and external technology access. In this review, we describe the virtuous technology cycle (VTC) concept and exemplify its application to next-generation sequencing (NGS). Based on selected examples, we show that the VTC concept can greatly enhance the number of technologies accessed and thus significantly increase flexibility and efficiency in drug discovery. We also discuss the challenges of externally accessing NGS technologies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Jørgensen, Johannes Ravn; Carstensen, Jens Michael; Søren, Knudsen
, Pyrenophora, Epicoccum, Didymella, Alternaria, Bipolaris and Microdochium. The fungal composition and quantities on each seed varied significantly. Some were infected mainly by a single fungus and some were infected by multiple fungi. All seeds were prior to this evaluated by multispectral imaging...... on the dorsal and ventral sides by the VideometerLab multispectral imaging system (Videometer A/S, Hørsholm, Denmark). This system is an instrument equipped with 19 different light emitting diodes at wavelengths ranging from 375 to 970nm (ultraviolet, visual and lower wavelength of the near-infrared region...... for fungal contamination of barley on the fungal species level was investigated by comparing results from the next generation sequencing and multispectral imaging....
Pak, Theodore R; Kasarskis, Andrew
Recent reviews have examined the extent to which routine next-generation sequencing (NGS) on clinical specimens will improve the capabilities of clinical microbiology laboratories in the short term, but do not explore integrating NGS with clinical data from electronic medical records (EMRs), immune profiling data, and other rich datasets to create multiscale predictive models. This review introduces a range of "omics" and patient data sources relevant to managing infections and proposes 3 potentially disruptive applications for these data in the clinical workflow. The combined threats of healthcare-associated infections and multidrug-resistant organisms may be addressed by multiscale analysis of NGS and EMR data that is ideally updated and refined over time within each healthcare organization. Such data and analysis should form the cornerstone of future learning health systems for infectious disease. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America.
Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo
Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.
Jakobsen, M. A.; Dellgren, C.; Sheppard, C.
Objectives: Next-generation sequencing (NGS) for the determination of rare blood group genotypes was tested in 72 individuals from different ethnicities. Background: Traditional serological-based antigen detection methods, as well as genotyping based on specific single nucleotide polymorphisms...... (SNPs) or single nucleotide variants (SNVs), are limited to detecting only a limited number of known antigens or alleles. NGS methods do not have this limitation. Methods: NGS using Ion torrent Personal Genome Machine (PGM) was performed with a customised Ampliseq panel targeting 15 different blood...... genotypes using commercial SNP assays. However, particularly for the Kidd, Duffy and Lutheran blood group systems, several SNVs were detected by the NGS assay that revealed additional coding information compared to other methods. Furthermore, the NGS assay allowed for the detection of genotypes related...
Full Text Available Transcripts are known to be incorporated in particles of DNA viruses belonging to the families of Herpesviridae and Mimiviridae, but the presence of transcripts in other DNA viruses, such as poxviruses, has not been analyzed yet. Therefore, we first established a next-generation-sequencing (NGS-based protocol, enabling the unbiased identification of transcripts in virus particles. Subsequently, we applied our protocol to analyze RNA in an emerging zoonotic member of the Poxviridae family, namely Cowpox virus. Our results revealed the incorporation of 19 viral transcripts, while host identifications were restricted to ribosomal and mitochondrial RNA. Most viral transcripts had an unknown and immunomodulatory function, suggesting that transcript incorporation may be beneficial for poxvirus immune evasion. Notably, the most abundant transcript originated from the D5L/I1R gene that encodes a viral inhibitor of the host cytoplasmic DNA sensing machinery.
Ebrahimzadeh-Vesal, Reza; Teymoori, Atieh; Azimi-Nezhad, Mohsen; Hosseini, Forough Sadat
Duchenne Muscular Dystrophy (DMD; MIM 310200) is one of the most common and severe type of hereditary muscular dystrophies. The disease is caused by mutations in the dystrophin gene. The dystrophin gene is associated with X-linked recessive Duchenne and Becker muscular dystrophy. This disease occurs almost exclusively in males. The clinical symptoms of muscle weakness usually begin at childhood. The main symptoms of this disorder are gradually muscular weakness. The affected patients have inability to standing up and walking. Death is usually due to respiratory infection or cardiomyopathy. In this article, we have reported the discovery of a new nonsense mutation that creates abnormal stop codon in the dystrophin gene. This mutation was detected using Next Generation Sequencing (NGS) technique. The subject was a 17-year-old male with muscular dystrophy that who was suspected of having DMD. He was referred to Hakim medical genetics center of Neyshabur, IRAN. Copyright © 2017. Published by Elsevier B.V.
Lee, Yujung; Kim, Changshin; Park, YoungJoon; Pyun, Jung-A; Kwack, KyuBum
Premature ovarian failure (POF) is characterized by heterogeneous genetic causes such as chromosomal abnormalities and variants in causal genes. Recently, development of techniques made next generation sequencing (NGS) possible to detect genome wide variants including chromosomal abnormalities. Among 37 Korean POF patients, XY karyotype with distal part deletions of Y chromosome, Yp11.32-31 and Yp12 end part, was observed in two patients through NGS. Six deleterious variants in POF genes were also detected which might explain the pathogenesis of POF with abnormalities in the sex chromosomes. Additionally, the two POF patients had no mutation in SRY but three non-synonymous variants were detected in genes regarding sex reversal. These findings suggest candidate causes of POF and sex reversal and show the propriety of NGS to approach the heterogeneous pathogenesis of POF. Copyright © 2016 Elsevier Inc. All rights reserved.
Chen, W S; Zhao, G; Jian, S G; Wang, Z F
Our objective was to develop microsatellite markers for use in assessing genetic variation in the small shrub or tree species Suriana maritima (Surianaceae). In China, this species is found only as a few fragmented populations and individuals on the Paracel Islands. Using next-generation genome sequencing methodology, we developed 17 novel microsatellite markers for S. maritima. Fifty-four individuals from six populations of S. maritima were examined for polymorphisms; only one allele was detected for each of the markers. Microsatellite loci developed indicate a complete absence of genetic diversity for S. maritima on the Paracel Islands in China. These markers will be useful for examining genetic variation among S. maritima populations in other areas of the world.
Full Text Available Stargardt Disease (STGD is the commonest genetic form of juvenile or early adult onset macular degeneration, which is a genetically heterogeneous disease. Molecular diagnosis of STGD remains a challenge in a significant proportion of cases. To address this, seven patients from five putative STGD families were recruited. We performed capture next generation sequencing (CNGS of the probands and searched for potentially disease-causing genetic variants in previously identified retinal or macular dystrophy genes. Seven disease-causing mutations in ABCA4 and two in PROM1 were identified by CNGS, which provides a confident genetic diagnosis in these five families. We also provided a genetic basis to explain the differences among putative STGD due to various mutations in different genes. Meanwhile, we show for the first time that compound heterozygous mutations in PROM1 gene could cause cone-rod dystrophy. Our findings support the enormous potential of CNGS in putative STGD molecular diagnosis.
Schulz, Wade L; Tormey, Christopher A; Torres, Richard
Next generation sequencing (NGS) has become a common technology in the clinical laboratory, particularly for the analysis of malignant neoplasms. However, most mutations identified by NGS are variants of unknown clinical significance (VOUS). Although the approach to define these variants differs by institution, software algorithms that predict variant effect on protein function may be used. However, these algorithms commonly generate conflicting results, potentially adding uncertainty to interpretation. In this review, we examine several computational tools used to predict whether a variant has clinical significance. In addition to describing the role of these tools in clinical diagnostics, we assess their efficacy in analyzing known pathogenic and benign variants in hematologic malignancies. Copyright© by the American Society for Clinical Pathology (ASCP).
Gargis, Amy S; Kalman, Lisa; Lubin, Ira M
Clinical microbiology and public health laboratories are beginning to utilize next-generation sequencing (NGS) for a range of applications. This technology has the potential to transform the field by providing approaches that will complement, or even replace, many conventional laboratory tests. While the benefits of NGS are significant, the complexities of these assays require an evolving set of standards to ensure testing quality. Regulatory and accreditation requirements, professional guidelines, and best practices that help ensure the quality of NGS-based tests are emerging. This review highlights currently available standards and guidelines for the implementation of NGS in the clinical and public health laboratory setting, and it includes considerations for NGS test validation, quality control procedures, proficiency testing, and reference materials. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Blomstrøm, Monica Marie
Metastatic breast cancer remains an incurable disease accounting for the vast majority of deaths from breast cancer. Understanding the molecular mechanisms for metastatic spread is important to improve diagnosis and for generating starting points for novel treatment strategies. Inhibition...... advantage of mutations is that they are most likely stable in the metastatic cancer cell population, whereas miRNA, mRNA and protein expression profiles may change substantially prior to, throughout, or after the complex metastatic process as well as between subpopulations such as cancer stem cells (CSCs......) and non-CSCs. The main goal of this project was to functionally characterize a set of candidate genes recovered from next-generation sequencing analysis for their role in breast cancer metastasis formation. The starting gene set comprised 104 gene variants; i.e. 57 wildtype and 47 mutated variants. During...
Burgos, Kasandra L; Van Keuren-Jensen, Kendall
There are a number of considerations when choosing protocols both upstream and downstream of Next-Generation Sequencing experiments. On the front end, purification methods, additives, and residuum can often inhibit the sensitive chemistries by which sequencing-by-synthesis is performed. On the back end, data handling, analysis software packages, and pipelines can also impact sequencing outcomes. The current chapter will describe stepwise how acellular biofluid samples are prepared for small RNA sequencing. With regard to purification methods, we found that small RNA yield can be improved considerably by following the total RNA isolation protocol included with Ambion's mirVana PARIS Kit but modifying the organic extraction step. Specifically, after transferring the upper aqueous phase to a fresh tube, water is added to the residual material (interphase and lower organic layer) and again phase-separated. In contrast, all the protocols provided with the commercially available kits at the time of this chapter publication require only one organic extraction. This simple yet, as it turns out, quite useful modification allows access to previously inaccessible material. Potential benefits from these changes are a more comprehensive sample profiling of small RNA, as well as wider access to small volume samples, such as is typically available for acellular biofluids, which now can be prepared for small RNA sequencing on the Illumina platform.
Kim, Hanyoup; Jebrail, Mais J; Sinha, Anupama; Bent, Zachary W; Solberg, Owen D; Williams, Kelly P; Langevin, Stanley A; Renzi, Ronald F; Van De Vreugde, James L; Meagher, Robert J; Schoeniger, Joseph S; Lane, Todd W; Branda, Steven S; Bartsch, Michael S; Patel, Kamlesh D
Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.
Full Text Available Next-generation sequencing (NGS is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM. The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.
Teasdale, M D; van Doorn, N L; Fiddyment, S; Webb, C C; O'Connor, T; Hofreiter, M; Collins, M J; Bradley, D G
Parchment represents an invaluable cultural reservoir. Retrieving an additional layer of information from these abundant, dated livestock-skins via the use of ancient DNA (aDNA) sequencing has been mooted by a number of researchers. However, prior PCR-based work has indicated that this may be challenged by cross-individual and cross-species contamination, perhaps from the bulk parchment preparation process. Here we apply next generation sequencing to two parchments of seventeenth and eighteenth century northern English provenance. Following alignment to the published sheep, goat, cow and human genomes, it is clear that the only genome displaying substantial unique homology is sheep and this species identification is confirmed by collagen peptide mass spectrometry. Only 4% of sequence reads align preferentially to a different species indicating low contamination across species. Moreover, mitochondrial DNA sequences suggest an upper bound of contamination at 5%. Over 45% of reads aligned to the sheep genome, and even this limited sequencing exercise yield 9 and 7% of each sampled sheep genome post filtering, allowing the mapping of genetic affinity to modern British sheep breeds. We conclude that parchment represents an excellent substrate for genomic analyses of historical livestock.
Full Text Available Next generation sequencing (NGS technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the pant associated microbiota to demonstrate the worth of the new methods.
Holt, Carson; Losic, Bojan; Pai, Deepa; Zhao, Zhen; Trinh, Quang; Syam, Sujata; Arshadi, Niloofar; Jang, Gun Ho; Ali, Johar; Beck, Tim; McPherson, John; Muthuswamy, Lakshmi B
Copy number variations (CNVs) are a major source of genomic variability and are especially significant in cancer. Until recently microarray technologies have been used to characterize CNVs in genomes. However, advances in next-generation sequencing technology offer significant opportunities to deduce copy number directly from genome sequencing data. Unfortunately cancer genomes differ from normal genomes in several aspects that make them far less amenable to copy number detection. For example, cancer genomes are often aneuploid and an admixture of diploid/non-tumor cell fractions. Also patient-derived xenograft models can be laden with mouse contamination that strongly affects accurate assignment of copy number. Hence, there is a need to develop analytical tools that can take into account cancer-specific parameters for detecting CNVs directly from genome sequencing data. We have developed WaveCNV, a software package to identify copy number alterations by detecting breakpoints of CNVs using translation-invariant discrete wavelet transforms and assign digitized copy numbers to each event using next-generation sequencing data. We also assign alleles specifying the chromosomal ratio following duplication/loss. We verified copy number calls using both microarray (correlation coefficient 0.97) and quantitative polymerase chain reaction (correlation coefficient 0.94) and found them to be highly concordant. We demonstrate its utility in pancreatic primary and xenograft sequencing data. Source code and executables are available at https://github.com/WaveCNV. The segmentation algorithm is implemented in MATLAB, and copy number assignment is implemented Perl. email@example.com Supplementary data are available at Bioinformatics online.
Full Text Available BACKGROUND: Transcriptome profiling of patterns of RNA expression is a powerful approach to identify networks of genes that play a role in disease. To date, most mRNA profiling of tissues has been accomplished using microarrays, but next-generation sequencing can offer a richer and more comprehensive picture. METHODOLOGY/PRINCIPAL FINDINGS: ECO is a rare multi-system developmental disorder caused by a homozygous mutation in ICK encoding intestinal cell kinase. We performed gene expression profiling using both cDNA microarrays and next-generation mRNA sequencing (mRNA-seq of skin fibroblasts from ECO-affected subjects. We then validated a subset of differentially expressed transcripts identified by each method using quantitative reverse transcription-polymerase chain reaction (qRT-PCR. Finally, we used gene ontology (GO to identify critical pathways and processes that were abnormal according to each technical platform. Methodologically, mRNA-seq identifies a much larger number of differentially expressed genes with much better correlation to qRT-PCR results than the microarray (r² = 0.794 and 0.137, respectively. Biologically, cDNA microarray identified functional pathways focused on anatomical structure and development, while the mRNA-seq platform identified a higher proportion of genes involved in cell division and DNA replication pathways. CONCLUSIONS/SIGNIFICANCE: Transcriptome profiling with mRNA-seq had greater sensitivity, range and accuracy than the microarray. The two platforms generated different but complementary hypotheses for further evaluation.
Lo, Chien-Chi; Chain, Patrick S G
Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
Yohe, Sophia; Hauge, Adam; Bunjer, Kari; Kemmer, Teresa; Bower, Matthew; Schomaker, Matthew; Onsongo, Getiria; Wilson, Jon; Erdmann, Jesse; Zhou, Yi; Deshpande, Archana; Spears, Michael D; Beckman, Kenneth; Silverstein, Kevin A T; Thyagarajan, Bharat
Although next-generation sequencing (NGS) can revolutionize molecular diagnostics, several hurdles remain in the implementation of this technology in clinical laboratories. To validate and implement an NGS panel for genetic diagnosis of more than 100 inherited diseases, such as neurologic conditions, congenital hearing loss and eye disorders, developmental disorders, nonmalignant diseases treated by hematopoietic cell transplantation, familial cancers, connective tissue disorders, metabolic disorders, disorders of sexual development, and cardiac disorders. The diagnostic gene panels ranged from 1 to 54 genes with most of panels containing 10 genes or fewer. We used a liquid hybridization-based, target-enrichment strategy to enrich 10 067 exons in 568 genes, followed by NGS with a HiSeq 2000 sequencing system (Illumina, San Diego, California). We successfully sequenced 97.6% (9825 of 10 067) of the targeted exons to obtain a minimum coverage of 20× at all bases. We demonstrated 100% concordance in detecting 19 pathogenic single-nucleotide variations and 11 pathogenic insertion-deletion mutations ranging in size from 1 to 18 base pairs across 18 samples that were previously characterized by Sanger sequencing. Using 4 pairs of blinded, duplicate samples, we demonstrated a high degree of concordance (>99%) among the blinded, duplicate pairs. We have successfully demonstrated the feasibility of using the NGS platform to multiplex genetic tests for several rare diseases and the use of cloud computing for bioinformatics analysis as a relatively low-cost solution for implementing NGS in clinical laboratories.
Full Text Available Angelica gigas Nakai is an important medicinal herb, widely utilized in Asian countries especially in Korea, Japan, and China. Although it is a vital medicinal herb, the lack of sequencing data and efficient molecular markers has limited the application of a genetic approach for horticultural improvements. Simple sequence repeats (SSRs are universally accepted molecular markers for population structure study. In this study, we found over 130,000 SSRs, ranging from di- to deca-nucleotide motifs, using the genome sequence of Manchu variety (MV of A. gigas, derived from next generation sequencing (NGS. From the putative SSR regions identified, a total of 16,496 primer sets were successfully designed. Among them, we selected 848 SSR markers that showed polymorphism from in silico analysis and contained tri- to hexa-nucleotide motifs. We tested 36 SSR primer sets for polymorphism in 16 A. gigas accessions. The average polymorphism information content (PIC was 0.69; the average observed heterozygosity (HO values, and the expected heterozygosity (HE values were 0.53 and 0.73, respectively. These newly developed SSR markers would be useful tools for molecular genetics, genotype identification, genetic mapping, molecular breeding, and studying species relationships of the Angelica genus.
Wei, Lijuan; Xiao, Meili; Hayward, Alice; Fu, Donghui
Next-generation sequencing (NGS) produces numerous (often millions) short DNA sequence reads, typically varying between 25 and 400 bp in length, at a relatively low cost and in a short time. This revolutionary technology is being increasingly applied in whole-genome, transcriptome, epigenome and small RNA sequencing, molecular marker and gene discovery, comparative and evolutionary genomics, and association studies. The Brassica genus comprises some of the most agro-economically important crops, providing abundant vegetables, condiments, fodder, oil and medicinal products. Many Brassica species have undergone the process of polyploidization, which makes their genomes exceptionally complex and can create difficulties in genomics research. NGS injects new vigor into Brassica research, yet also faces specific challenges in the analysis of complex crop genomes and traits. In this article, we review the advantages and limitations of different NGS technologies and their applications and challenges, using Brassica as an advanced model system for agronomically important, polyploid crops. Specifically, we focus on the use of NGS for genome resequencing, transcriptome sequencing, development of single-nucleotide polymorphism markers, and identification of novel microRNAs and their targets. We present trends and advances in NGS technology in relation to Brassica crop improvement, with wide application for sophisticated genomics research into agronomically important polyploid crops.
Peng, Xu; Wu, Jingyi; Brunmeir, Reinhard; Kim, Sun-Yee; Zhang, Qiongyi; Ding, Chunming; Han, Weiping; Xie, Wei; Xu, Feng
Next-generation sequencing has been widely used for the genome-wide profiling of histone modifications, transcription factor binding and gene expression through chromatin immunoprecipitated DNA sequencing (ChIP-seq) and cDNA sequencing (RNA-seq). Here, we describe a versatile library construction method that can be applied to both ChIP-seq and RNA-seq on the widely used Illumina platforms. Standard methods for ChIP-seq library construction require nanograms of starting DNA, substantially limiting its application to rare cell types or limited clinical samples. By minimizing the DNA purification steps that cause major sample loss, our method achieved a high sensitivity in ChIP-seq library preparation. Using this method, we achieved the following: (i) generated high-quality epigenomic and transcription factor-binding maps using ChIP-seq for murine adipocytes; (ii) successfully prepared a ChIP-seq library from as little as 25 pg of starting DNA; (iii) achieved paired-end sequencing of the ChIP-seq libraries; (iv) systematically profiled gene expression dynamics during murine adipogenesis using RNA-seq and (v) preserved the strand specificity of the transcripts in RNA-seq. Given its sensitivity and versatility in both double-stranded and single-stranded DNA library construction, this method has wide applications in genomic, epigenomic, transcriptomic and interactomic studies. PMID:25223787
Mora-Castilla, Sergio; To, Cuong; Vaezeslami, Soheila; Morey, Robert; Srinivasan, Srimeenakshi; Dumdie, Jennifer N; Cook-Andersen, Heidi; Jenkins, Joby; Laurent, Louise C
As the cost of next-generation sequencing has decreased, library preparation costs have become a more significant proportion of the total cost, especially for high-throughput applications such as single-cell RNA profiling. Here, we have applied novel technologies to scale down reaction volumes for library preparation. Our system consisted of in vitro differentiated human embryonic stem cells representing two stages of pancreatic differentiation, for which we prepared multiple biological and technical replicates. We used the Fluidigm (San Francisco, CA) C1 single-cell Autoprep System for single-cell complementary DNA (cDNA) generation and an enzyme-based tagmentation system (Nextera XT; Illumina, San Diego, CA) with a nanoliter liquid handler (mosquito HTS; TTP Labtech, Royston, UK) for library preparation, reducing the reaction volume down to 2 µL and using as little as 20 pg of input cDNA. The resulting sequencing data were bioinformatically analyzed and correlated among the different library reaction volumes. Our results showed that decreasing the reaction volume did not interfere with the quality or the reproducibility of the sequencing data, and the transcriptional data from the scaled-down libraries allowed us to distinguish between single cells. Thus, we have developed a process to enable efficient and cost-effective high-throughput single-cell transcriptome sequencing. © 2016 Society for Laboratory Automation and Screening.
Desmedt, Christine; Voet, Thierry; Sotiriou, Christos; Campbell, Peter J
We are currently on the threshold of a revolution in breast cancer research, thanks to the emergence of novel technologies based on next-generation sequencing (NGS). In this review, we will describe the different sequencing technologies and platforms, and summarize the main findings from the latest sequencing articles in breast cancer. Firstly, the sequencing of a few hundreds of breast tumors has revealed new cancer genes. Although these were not frequently mutated, mutated genes from different patients could be grouped into the deregulation of similar pathways. Secondly, NGS allowed further exploration of intratumor heterogeneity and revealed that although subclonal mutations were present in all tumors, there was always a dominant clone, which comprised at least 50% of the tumor cells. Finally, tumor-specific DNA rearrangements could be detected in the patient's plasma, suggesting that NGS could be used to personalize the monitoring of the disease. The application of NGS to breast cancer has been associated with tremendous advances and promises for increasing the understanding of the disease. However, there still remain many unanswered questions, such as the role of structural changes of tumor genomes in cancer progression and treatment response/resistance.
Matt J Cahill
Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.
Liu, Ya-Jun; Zhang, Feng; Liu, Hong-de; Sun, Xiao
The mechanism of transcriptional regulation has been the focus of many studies in the post-genomic era. The development of sequencing-based technologies for chromatin profiling enables current researchers to experimentally measure chromatin properties. Moreover, many studies aim at annotating the state of the chromatin into broad categories based on observed chromatin features and/or DNA sequences, then associating the resultant distal regulatory regions with the correct target genes based on DNA sequences, and predicting the dependence of epigenetic features on genetic variation. Stem cell biology has many applications in the area of regenerative medicine and tumorigenesis. In this review, we summarize recent research progresses on the application of next-generation sequencing techniques in studying transcriptional regulation in embryonic stem cells. This review mainly focuses on four areas: (1) microarray or RNA-seq; (2) chromatin immunoprecipitation (ChIP); (3) Dnase I hypersensitive sites (DHSs); (4) high-throughput chromosome conformation capture (Hi-C). These technologies have been utilized in studying chromatin on three levels, i.e., gene expression, transcription factor binding and genome three-dimensional structure. We especially emphasize three master transcription factors of pluripotency: Oct4, Sox2 and Nanog. We aim to track the frontier of stem cell transcriptional regulation research and share important progresses in this field.
Cahill, Matt J.
Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.
Kiyosawa, Hidenori; Okumura, Akio; Okui, Saya; Ushida, Chisato; Kawai, Gota
In order to find novel structured small RNAs, next-generation sequencing was applied to small RNA fractions with lengths ranging from 40 to 140 nt and secondary structure-based clustering was performed. Sequences of structured RNAs were effectively clustered and analyzed by secondary structure. Although more than 99% of the obtained sequences were known RNAs, 16 candidate mouse structured small non-coding RNAs (MsncRs) were isolated. Based on these results, the merits of secondary structure-based analysis are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.
Bertolini, Francesca; Ghionda, Marco Ciro; D'Alessandro, Enrico; Geraci, Claudia; Chiofalo, Vincenzo; Fontanesi, Luca
The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine) for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon) as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43%) in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97) and lower for avian species (0.70). PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.
Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song
Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121
Miller, Marisa E; Liberatore, Katie L; Kianian, Shahryar F
Plant organellar genomes contain large, repetitive elements that may undergo pairing or recombination to form complex structures and/or sub-genomic fragments. Organellar genomes also exist in admixtures within a given cell or tissue type (heteroplasmy), and an abundance of subtypes may change throughout development or when under stress (sub-stoichiometric shifting). Next-generation sequencing (NGS) technologies are required to obtain deeper understanding of organellar genome structure and function. Traditional sequencing studies use several methods to obtain organellar DNA: (1) If a large amount of starting tissue is used, it is homogenized and subjected to differential centrifugation and/or gradient purification. (2) If a smaller amount of tissue is used (i.e., if seeds, material, or space is limited), the same process is performed as in (1), followed by whole-genome amplification to obtain sufficient DNA. (3) Bioinformatics analysis can be used to sequence the total genomic DNA and to parse out organellar reads. All these methods have inherent challenges and tradeoffs. In (1), it may be difficult to obtain such a large amount of starting tissue; in (2), whole-genome amplification could introduce a sequencing bias; and in (3), homology between nuclear and organellar genomes could interfere with assembly and analysis. In plants with large nuclear genomes, it is advantageous to enrich for organellar DNA to reduce sequencing costs and sequence complexity for bioinformatics analyses. Here, we compare a traditional differential centrifugation method with a fourth method, an adapted CpG-methyl pulldown approach, to separate the total genomic DNA into nuclear and organellar fractions. Both methods yield sufficient DNA for NGS, DNA that is highly enriched for organellar sequences, albeit at different ratios in mitochondria and chloroplasts. We present the optimization of these methods for wheat leaf tissue and discuss major advantages and disadvantages of each approach in
Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel
BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310
Full Text Available BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version - which is developed in Java, takes advantage of Amazon Web Services (AWS cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.
George J. Burghel
Full Text Available Detection of clinically actionable mutations in diagnostic tumour specimens aids in the selection of targeted therapeutics. With an ever increasing number of clinically significant mutations identified, tumour genetic diagnostics is moving from single to multigene analysis. As it is still not feasible for routine diagnostic laboratories to perform sequencing of the entire cancer genome, our approach was to undertake targeted mutation detection. To optimise our diagnostic workflow, we evaluated three target enrichment strategies using two next-generation sequencing (NGS platforms (Illumina MiSeq and Ion PGM. The target enrichment strategies were Fluidigm Access Array custom amplicon panel including 13 genes (MiSeq sequencing, the Oxford Gene Technologies (OGT SureSeq Solid Tumour hybridisation panel including 60 genes (MiSeq sequencing, and an Ion AmpliSeq Cancer Hotspot Panel including 50 genes (Ion PGM sequencing. DNA extracted from formalin-fixed paraffin-embedded (FFPE blocks of eight previously characterised cancer cell lines was tested using the three panels. Matching genomic DNA from fresh cultures of these cell lines was also tested using the custom Fluidigm panel and the OGT SureSeq Solid Tumour panel. Each panel allowed mutation detection of core cancer genes including KRAS, BRAF, and EGFR. Our results indicate that the panels enable accurate variant detection despite sequencing from FFPE DNA.
Aziz, Nazneen; Zhao, Qin; Bry, Lynn; Driscoll, Denise K; Funke, Birgit; Gibson, Jane S; Grody, Wayne W; Hegde, Madhuri R; Hoeltge, Gerald A; Leonard, Debra G B; Merker, Jason D; Nagarajan, Rakesh; Palicki, Linda A; Robetorye, Ryan S; Schrijver, Iris; Weck, Karen E; Voelkerding, Karl V
The higher throughput and lower per-base cost of next-generation sequencing (NGS) as compared to Sanger sequencing has led to its rapid adoption in clinical testing. The number of laboratories offering NGS-based tests has also grown considerably in the past few years, despite the fact that specific Clinical Laboratory Improvement Amendments of 1988/College of American Pathologists (CAP) laboratory standards had not yet been developed to regulate this technology. To develop a checklist for clinical testing using NGS technology that sets standards for the analytic wet bench process and for bioinformatics or "dry bench" analyses. As NGS-based clinical tests are new to diagnostic testing and are of much greater complexity than traditional Sanger sequencing-based tests, there is an urgent need to develop new regulatory standards for laboratories offering these tests. To develop the necessary regulatory framework for NGS and to facilitate appropriate adoption of this technology for clinical testing, CAP formed a committee in 2011, the NGS Work Group, to deliberate upon the contents to be included in the checklist. Results . -A total of 18 laboratory accreditation checklist requirements for the analytic wet bench process and bioinformatics analysis processes have been included within CAP's molecular pathology checklist (MOL). This report describes the important issues considered by the CAP committee during the development of the new checklist requirements, which address documentation, validation, quality assurance, confirmatory testing, exception logs, monitoring of upgrades, variant interpretation and reporting, incidental findings, data storage, version traceability, and data transfer confidentiality.
Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.
Weiss, Glen J; Liang, Winnie S; Demeure, Michael J; Kiefer, Jeff A; Hostetter, Galen; Izatt, Tyler; Sinari, Shripad; Christoforides, Alexis; Aldrich, Jessica; Kurdoglu, Ahmet; Phillips, Lori; Benson, Hollie; Reiman, Rebecca; Baker, Angela; Marsh, Vickie; Von Hoff, Daniel D; Carpten, John D; Craig, David W
New anticancer agents that target a single cell surface receptor, up-regulated or amplified gene product, or mutated gene, have met with some success in treating advanced cancers. However, patients' tumors still eventually progress on these therapies. If it were possible to identify a larger number of targetable vulnerabilities in an individual's tumor, multiple targets could be exploited with the use of specific therapeutic agents, thus possibly giving the patient viable therapeutic alternatives. In this exploratory study, we used next-generation sequencing technologies (NGS) including whole genome sequencing (WGS), and where feasible, whole transcriptome sequencing (WTS) to identify genomic events and associated expression changes in advanced cancer patients. WGS on paired tumor and normal samples from nine advanced cancer patients and WTS on six of these patients' tumors was completed. One patient's treatment was based on targets and pathways identified by NGS and the patient had a short-lived PET/CT response with a significant reduction in his tumor-related pain. To design treatment plans based on information garnered from NGS, several challenges were encountered: NGS reporting delays, communication of results to out-of-state participants and their treating oncologists, and chain of custody handling for fresh biopsy samples for Clinical Laboratory Improvement Amendments (CLIA) target validation. While the initial effort was a slower process than anticipated due to a variety of issues, we demonstrate the feasibility of using NGS in advanced cancer patients so that treatments for patients with progressing tumors may be improved.
Full Text Available Abstract Background In humans, copies of the Long Interspersed Nuclear Element 1 (LINE-1 retrotransposon comprise 21% of the reference genome, and have been shown to modulate expression and produce novel splice isoforms of transcripts from genes that span or neighbor the LINE-1 insertion site. Results In this work, newly released pilot data from the 1000 Genomes Project is analyzed to detect previously unreported full length insertions of the retrotransposon LINE-1. By direct analysis of the sequence data, we have identified 22 previously unreported LINE-1 insertion sites within the sequence data reported for a mother/father/daughter trio. Conclusions It is demonstrated here that next generation sequencing data, as well as emerging high quality datasets from individual genome projects allow us to assess the amount of heterogeneity with respect to the LINE-1 retrotransposon amongst humans, and provide us with a wealth of testable hypotheses as to the impact that this diversity may have on the health of individuals and populations.
Wu, Yi-Chung; Chang, Chia-Hua; Hung, Jui-Hung; Yang, Chia-Hsiang
Next-generation sequencing (NGS) enables high-throughput sequencing, in which short DNA fragments can be sequenced in a massively parallel fashion. However, the essential algorithm behind the succeeding NGS data analysis, DNA mapping, is still excessively time consuming. DNA mapping can be partitioned into two parts: suffix array (SA) sorting and backward searching. Dedicated hardware designs for the less-complex backward searching have been proposed, but feasible hardware for the most complicated part, SA sorting, has never been explored. Based on the memory-efficient sBWT algorithm, this work is the first integrated NGS data processor for the entire DNA mapping. The -ordered Ferragina and Manzini index used in the sBWT algorithm is leveraged to improve storage capacity and reduce hardware complexity. The proposed NGS data processor realizes the sBWT algorithm through bucket sorting, suffix grouping, and suffix sorting circuits. Key design parameters are analyzed to achieve the optimal performance with respect to hardware cost and execution time. Fabricated in 40-nm CMOS, the NGS data processor dissipates 135 mW at 200 MHz from a 0.9-V supply. With 1-GB external memory, the chip can analyze human DNA within 10 min. This work achieves 43 065 and 8 971 [3208 and 402 ] higher energy efficiency (throughput-to-area ratio) than the high-end CPU and GPU solutions, respectively.
Full Text Available Pipelines for the analysis of Next-Generation Sequencing (NGS data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/.
Ji, Yuan; Si, Yue; McMillin, Gwendolyn A; Lyon, Elaine
The rapid development and dramatic decrease in cost of sequencing techniques have ushered the implementation of genomic testing in patient care. Next generation DNA sequencing (NGS) techniques have been used increasingly in clinical laboratories to scan the whole or part of the human genome in order to facilitate diagnosis and/or prognostics of genetic disease. Despite many hurdles and debates, pharmacogenomics (PGx) is believed to be an area of genomic medicine where precision medicine could have immediate impact in the near future. Areas covered: This review focuses on lessons learned through early attempts of clinically implementing PGx testing; the challenges and opportunities that PGx testing brings to precision medicine in the era of NGS. Expert commentary: Replacing targeted analysis approach with NGS for PGx testing is neither technically feasible nor necessary currently due to several technical limitations and uncertainty involved in interpreting variants of uncertain significance for PGx variants. However, reporting PGx variants out of clinical whole exome or whole genome sequencing (WES/WGS) might represent additional benefits for patients who are tested by WES/WGS.
Broman, M; Kleinschnitz, I; Bach, J E; Rost, S; Islander, G; Müller, C R
Malignant hyperthermia (MH)-related mutations have been identified in the ryanodine receptor type 1 gene (RYR1) and in the dihydropyridine gene (CACNA1S), but about half of the patients do not have causative mutations in these genes. We wanted to study the contribution of other muscle genes to the RYR1 phenotypes. We designed a gene panel for sequence enrichment targeting 64 genes of proteins involved in the homeostasis of the striated muscle cell. Next-generation sequencing (NGS) resulted in >50,000 sequence variants which were further analyzed by software filtering criteria to identify causative variants. In four of five patients we identified previously reported RYR1 mutations while the fifth patient did not show any candidate variant in any of the genes investigated. In two patients pathogenic variants were found in other genes known to cause a muscle disorders. All but one patient carried likely benign rare polymorphisms. The NGS technique proved convenient in identifying variants in the RYR1. However, with a clinically variable phenotype-like MH, the pre-selection of genes poses problems in variant interpretation. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Sadananda, Singh N; Foo, Jia Nee; Toh, Meng Tiak; Cermakova, Lubomira; Trigueros-Motos, Laia; Chan, Teddy; Liany, Herty; Collins, Jennifer A; Gerami, Sima; Singaraja, Roshni R; Hayden, Michael R; Francis, Gordon A; Frohlich, Jiri; Khor, Chiea Chuen; Brunham, Liam R
A low level of HDL cholesterol (HDL-C) is a common clinical scenario and an important marker for increased cardiovascular risk. Many patients with very low or very high HDL-C have a rare mutation in one of several genes, but identification of the molecular abnormality in patients with extreme HDL-C is rarely performed in clinical practice. We investigated the accuracy and diagnostic yield of a targeted next-generation sequencing (NGS) assay for extreme levels of HDL-C. We developed a targeted NGS panel to capture the exons, intron/exon boundaries, and untranslated regions of 26 genes with highly penetrant effects on plasma lipid levels. We sequenced 141 patients with extreme HDL-C levels and prioritized variants in accordance with medical genetics guidelines. We identified 35 pathogenic and probably pathogenic variants in HDL genes, including 21 novel variants, and performed functional validation on a subset of these. Overall, a molecular diagnosis was established in 35.9% of patients with low HDL-C and 5.2% with high HDL-C, and all prioritized variants identified by NGS were confirmed by Sanger sequencing. Our results suggest that a molecular diagnosis can be identified in a substantial proportion of patients with low HDL-C using targeted NGS. Copyright © 2015 by the American Society for Biochemistry and Molecular Biology, Inc.
Pawełkowicz, Magdalena; Zieliński, Konrad; Zielińska, Dorota; Pląder, Wojciech; Yagi, Kouhei; Wojcieszek, Michał; Siedlecka, Ewa; Bartoszewski, Grzegorz; Skarzyńska, Agnieszka; Przybecki, Zbigniew
In the post-genomic era the availability of genomic tools and resources is leading us to novel generation methods in plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. In this study we have mainly concentrated on the Cucumis sativus and (but much less) Cucurbitaceae family several important vegetable crops. There are many reports on research conducted in Cucurbitaceae plant breeding programs on the ripening process, phloem transport, disease resistance, cold tolerance and fruit quality traits. This paper presents the role played by new omic technologies in the creation of knowledge on the mechanisms of the formation of the breeding features. The analysis of NGS (NGS-next generation sequencing) data allows the discovery of new genes and regulatory sequences, their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Firstly a high density map should be created for the reference genome, then each re-sequencing data could be mapped and new markers brought out into breeding populations. The paper also presents methods that could be used in the future for the creation of variability and genomic modification of the species in question. It has been shown also the state and usefulness in breeding the chloroplastomic and mitochondriomic study. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Nureyev F. Rodrigues
Full Text Available Organellar RNA editing involves the modification of nucleotide sequences to maintain conserved protein functions, mainly by reverting non-neutral codon mutations. The loss of plastid editing events, resulting from mutations in RNA editing factors or through stress interference, leads to developmental, physiological and photosynthetic alterations. Recently, next generation sequencing technology has generated the massive discovery of sRNA sequences and expanded the number of sRNA data. Here, we present a method to screen chloroplast RNA editing using public sRNA libraries from Arabidopsis, soybean and rice. We mapped the sRNAs against the nuclear, mitochondrial and plastid genomes to confirm predicted cytosine to uracil (C-to-U editing events and identify new editing sites in plastids. Among the predicted editing sites, 40.57, 34.78, and 25.31% were confirmed using sRNAs from Arabidopsis, soybean and rice, respectively. SNP analysis revealed 58.2, 43.9, and 37.5% new C-to-U changes in the respective species and identified known and new putative adenosine to inosine (A-to-I RNA editing in tRNAs. The present method and data reveal the potential of sRNA as a reliable source to identify new and confirm known editing sites.
Allen, Jonathan E.; Brown, Trevor S.; Gardner, Shea N.; McLoughlin, Kevin S.; Forsberg, Jonathan A.; Kirkup, Benjamin C.; Chromy, Brett A.; Luciw, Paul A.; Elster, Eric A.
Combat wound healing and resolution are highly affected by the resident microbial flora. We therefore sought to achieve comprehensive detection of microbial populations in wounds using novel genomic technologies and bioinformatics analyses. We employed a microarray capable of detecting all sequenced pathogens for interrogation of 124 wound samples from extremity injuries in combat-injured U.S. service members. A subset of samples was also processed via next-generation sequencing and metagenomic analysis. Array analysis detected microbial targets in 51% of all wound samples, with Acinetobacter baumannii being the most frequently detected species. Multiple Pseudomonas species were also detected in tissue biopsy specimens. Detection of the Acinetobacter plasmid pRAY correlated significantly with wound failure, while detection of enteric-associated bacteria was associated significantly with successful healing. Whole-genome sequencing revealed broad microbial biodiversity between samples. The total wound bioburden did not associate significantly with wound outcome, although temporal shifts were observed over the course of treatment. Given that standard microbiological methods do not detect the full range of microbes in each wound, these data emphasize the importance of supplementation with molecular techniques for thorough characterization of wound-associated microbes. Future application of genomic protocols for assessing microbial content could allow application of specialized care through early and rapid identification and management of critical patterns in wound bioburden. PMID:24829242
Full Text Available Risk assessment of tick-borne and zoonotic disease emergence necessitates sound knowledge of the particular microorganisms circulating within the communities of these major vectors. Assessment of pathogens carried by wild ticks must be performed without a priori, to allow for the detection of new or unexpected agents.We evaluated the potential of Next-Generation Sequencing techniques (NGS to produce an inventory of parasites carried by questing ticks. Sequences corresponding to parasites from two distinct genera were recovered in Ixodes ricinus ticks collected in Eastern France: Babesia spp. and Theileria spp. Four Babesia species were identified, three of which were zoonotic: B. divergens, Babesia sp. EU1 and B. microti; and one which infects cattle, B. major. This is the first time that these last two species have been identified in France. This approach also identified new sequences corresponding to as-yet unknown organisms similar to tropical Theileria species.Our findings demonstrate the capability of NGS to produce an inventory of live tick-borne parasites, which could potentially be transmitted by the ticks, and uncovers unexpected parasites in Western Europe.
Full Text Available Retinal dystrophies (RD constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes.
Ramirez-Gonzalez, Ricardo H; Leggett, Richard M; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert
Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. "provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month". The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.
Blanca, Jose M; Pascual, Laura; Ziarsolo, Peio; Nuez, Fernando; Cañizares, Joaquin
The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.
Full Text Available Abstract Background The possibilities offered by next generation sequencing (NGS platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.
Serrao, Erik; Cherepanov, Peter; Engelman, Alan N
Retroviruses exhibit signature integration preferences on both the local and global scales. Here, we present a detailed protocol for (1) generation of diverse libraries of retroviral integration sites using ligation-mediated PCR (LM-PCR) amplification and next-generation sequencing (NGS), (2) mapping the genomic location of each virus-host junction using BEDTools, and (3) analyzing the data for statistical relevance. Genomic DNA extracted from infected cells is fragmented by digestion with restriction enzymes or by sonication. After suitable DNA end-repair, double-stranded linkers are ligated onto the DNA ends, and semi-nested PCR is conducted using primers complementary to both the long terminal repeat (LTR) end of the virus and the ligated linker DNA. The PCR primers carry sequences required for DNA clustering during NGS, negating the requirement for separate adapter ligation. Quality control (QC) is conducted to assess DNA fragment size distribution and adapter DNA incorporation prior to NGS. Sequence output files are filtered for LTR-containing reads, and the sequences defining the LTR and the linker are cropped away. Trimmed host cell sequences are mapped to a reference genome using BLAT and are filtered for minimally 97% identity to a unique point in the reference genome. Unique integration sites are scrutinized for adjacent nucleotide (nt) sequence and distribution relative to various genomic features. Using this protocol, integration site libraries of high complexity can be constructed from genomic DNA in three days. The entire protocol that encompasses exogenous viral infection of susceptible tissue culture cells to integration site analysis can therefore be conducted in approximately one to two weeks. Recent applications of this technology pertain to longitudinal analysis of integration sites from HIV-infected patients.
Sikkema-Raddatz, B.; Johansson, L.F.; de Boer, E.N.; Almomani, R.; Boven, L.G.; van den Berg, M.P.; van Spaendonck-Zwarts, K.Y.; van Tintelen, J.P.; Sijmons, R.H.; Jongbloed, J.D.H.; Sinke, R.J.
Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace Sanger sequencing (SS) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted
Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay
One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.
Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.
On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human
Sie, Daoud; Snijders, Peter J F; Meijer, Gerrit A; Doeleman, Marije W; van Moorsel, Marinda I H; van Essen, Hendrik F; Eijk, Paul P; Grünberg, Katrien; van Grieken, Nicole C T; Thunnissen, Erik; Verheul, Henk M; Smit, Egbert F; Ylstra, Bauke; Heideman, Daniëlle A M
Next generation DNA sequencing (NGS) holds promise for diagnostic applications, yet implementation in routine molecular pathology practice requires performance evaluation on DNA derived from routine formalin-fixed paraffin-embedded (FFPE) tissue specimens. The current study presents a comprehensive analysis of TruSeq Amplicon Cancer Panel-based NGS using a MiSeq Personal sequencer (TSACP-MiSeq-NGS) for somatic mutation profiling. TSACP-MiSeq-NGS (testing 212 hotspot mutation amplicons of 48 genes) and a data analysis pipeline were evaluated in a retrospective learning/test set approach (n = 58/n = 45 FFPE-tumor DNA samples) against 'gold standard' high-resolution-melting (HRM)-sequencing for the genes KRAS, EGFR, BRAF and PIK3CA. Next, the performance of the validated test algorithm was assessed in an independent, prospective cohort of FFPE-tumor DNA samples (n = 75). In the learning set, a number of minimum parameter settings was defined to decide whether a FFPE-DNA sample is qualified for TSACP-MiSeq-NGS and for calling mutations. The resulting test algorithm revealed 82% (37/45) compliance to the quality criteria and 95% (35/37) concordant assay findings for KRAS, EGFR, BRAF and PIK3CA with HRM-sequencing (kappa = 0.92; 95% CI = 0.81-1.03) in the test set. Subsequent application of the validated test algorithm to the prospective cohort yielded a success rate of 84% (63/75), and a high concordance with HRM-sequencing (95% (60/63); kappa = 0.92; 95% CI = 0.84-1.01). TSACP-MiSeq-NGS detected 77 mutations in 29 additional genes. TSACP-MiSeq-NGS is suitable for diagnostic gene mutation profiling in oncopathology.
Roux, Camille; Pannell, John R
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels. © 2015 John Wiley & Sons Ltd.
Suzuki, Takako; Kawada, Jun-Ichi; Okuno, Yusuke; Hayano, Satoshi; Horiba, Kazuhiro; Torii, Yuka; Takahashi, Yoshiyuki; Umetsu, Syuichiro; Sogo, Tsuyoshi; Inui, Ayano; Ito, Yoshinori
Pediatric acute liver failure (PALF) is a rare and severe syndrome that frequently requires liver transplantation. Viruses are one of the most frequent causes of this disease, however, pathogenic viruses are not determined in many patients. Recently next-generation sequencing (NGS) has been applied to comprehensively detect pathogens of infectious diseases of unknown etiology. To evaluate an NGS-based approach for detecting pathogenic viruses in patients with PALF or acute hepatitis of unknown etiology. To detect virus-derived DNA and RNA sequences existing in sera/plasma from patients, both DNA and RNA sequencing were performed. First, we validated the ability of NGS to detect viral pathogens in clinical serum/plasma samples, and compared different commercial RNA library preparation methods Then, serum/plasma of fourteen patients with PALF or acute hepatitis of unknown etiology were evaluated using NGS. Among three RNA library preparation methods, Ovation RNA-Seq System V2 had the highest sensitivity to detect RNA viral sequences. Among fourteen patients, sequence reads of torque teno virus, adeno-associated virus, and stealth virus were found in the sera of one patient each, however, the pathophysiological role of these three viruses was not clarified. Significant virus reads were not detected in the remaining 11 patients. This finding might be due to low virus titer in blood at the time of referral or a non-infectious cause might be more frequent. These results suggest an NGS-based approach has potential to detect viral pathogens in clinical samples and would contribute to clarification of the etiology of PALF. Copyright © 2017 Elsevier B.V. All rights reserved.
Bhat, Javaid A; Ali, Sajad; Salgotra, Romesh K; Mir, Zahoor A; Dutta, Sutapa; Jadon, Vasudha; Tyagi, Anshika; Mushtaq, Muntazir; Jain, Neelu; Singh, Pradeep K; Singh, Gyanendra P; Prabhu, K V
Genomic selection (GS) is a promising approach exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. In plant breeding, it provides opportunities to increase genetic gain of complex traits per unit time and cost. The cost-benefit balance was an important consideration for GS to work in crop plants. Availability of genome-wide high-throughput, cost-effective and flexible markers, having low ascertainment bias, suitable for large population size as well for both model and non-model crop species with or without the reference genome sequence was the most important factor for its successful and effective implementation in crop species. These factors were the major limitations to earlier marker systems viz., SSR and array-based, and was unimaginable before the availability of next-generation sequencing (NGS) technologies which have provided novel SNP genotyping platforms especially the genotyping by sequencing. These marker technologies have changed the entire scenario of marker applications and made the use of GS a routine work for crop improvement in both model and non-model crop species. The NGS-based genotyping have increased genomic-estimated breeding value prediction accuracies over other established marker platform in cereals and other crop species, and made the dream of GS true in crop breeding. But to harness the true benefits from GS, these marker technologies will be combined with high-throughput phenotyping for achieving the valuable genetic gain from complex traits. Moreover, the continuous decline in sequencing cost will make the WGS feasible and cost effective for GS in near future. Till that time matures the targeted sequencing seems to be more cost-effective option for large scale marker discovery and GS, particularly in case of large and un-decoded genomes.
Full Text Available Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies (GWAS in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS, diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Mathias, Patrick C; Turner, Emily H; Scroggins, Sheena M; Salipante, Stephen J; Hoffman, Noah G; Pritchard, Colin C; Shirts, Brian H
To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors. We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors. We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique. Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy
Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Alic, Andy S; Blanquer, Ignacio
Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics.
Roy-Chowdhuri, Sinchita; Roy, Somak; Monaco, Sara E; Routbort, Mark J; Pantanowitz, Liron
The rapid adoption of next-generation sequencing (NGS) in clinical molecular laboratories has redefined the practice of cytopathology. Instead of simply being used as a diagnostic tool, cytopathology has evolved into a practice providing important genomic information that guides clinical management. The recent emphasis on maximizing limited-volume cytology samples for ancillary molecular studies, including NGS, requires cytopathologists not only to be more involved in specimen collection and processing techniques but also to be aware of downstream testing and informatics issues. For the integration of molecular informatics into the clinical workflow, it is important to understand the computational components of the NGS workflow by which raw sequence data are transformed into clinically actionable genomic information and to address the challenges of having a robust and sustainable informatics infrastructure for NGS-based testing in a clinical environment. Adapting to needs ranging from specimen procurement to report delivery is crucial for the optimal utilization of cytology specimens to accommodate requests from clinicians to improve patient care. This review presents a broad overview of the various aspects of informatics in the context of NGS-based testing of cytology specimens. Cancer Cytopathol 2017;125:236-244. © 2016 American Cancer Society. © 2016 American Cancer Society.
Patro, Jennifer N; Ramachandran, Padmini; Barnaba, Tammy; Mammel, Mark K; Lewis, Jada L; Elkins, Christopher A
be accurate and true. Those products containing live microbials report both identity and viability on most product labels. This study used next-generation sequencing technology as an analytical tool in conjunction with classic culture methods to examine the validity of the labels on supplement products containing live microbials found in the United States marketplace. Our results show the importance of testing these products for identity, viability, and potential contaminants, as well as introduce a new culture-independent diagnostic approach for testing these products. Podcast: A podcast concerning this article is available.
Cottrell, Catherine E; Al-Kateb, Hussam; Bredemeyer, Andrew J; Duncavage, Eric J; Spencer, David H; Abel, Haley J; Lockwood, Christina M; Hagemann, Ian S; O'Guin, Stephanie M; Burcea, Lauren C; Sawyer, Christopher S; Oschwald, Dayna M; Stratman, Jennifer L; Sher, Dorie A; Johnson, Mark R; Brown, Justin T; Cliften, Paul F; George, Bijoy; McIntosh, Leslie D; Shrivastava, Savita; Nguyen, Tudung T; Payton, Jacqueline E; Watson, Mark A; Crosby, Seth D; Head, Richard D; Mitra, Robi D; Nagarajan, Rakesh; Kulkarni, Shashikant; Seibert, Karen; Virgin, Herbert W; Milbrandt, Jeffrey; Pfeifer, John D
Currently, oncology testing includes molecular studies and cytogenetic analysis to detect genetic aberrations of clinical significance. Next-generation sequencing (NGS) allows rapid analysis of multiple genes for clinically actionable somatic variants. The WUCaMP assay uses targeted capture for NGS analysis of 25 cancer-associated genes to detect mutations at actionable loci. We present clinical validation of the assay and a detailed framework for design and validation of similar clinical assays. Deep sequencing of 78 tumor specimens (≥ 1000× average unique coverage across the capture region) achieved high sensitivity for detecting somatic variants at low allele fraction (AF). Validation revealed sensitivities and specificities of 100% for detection of single-nucleotide variants (SNVs) within coding regions, compared with SNP array sequence data (95% CI = 83.4-100.0 for sensitivity and 94.2-100.0 for specificity) or whole-genome sequencing (95% CI = 89.1-100.0 for sensitivity and 99.9-100.0 for specificity) of HapMap samples. Sensitivity for detecting variants at an observed 10% AF was 100% (95% CI = 93.2-100.0) in HapMap mixes. Analysis of 15 masked specimens harboring clinically reported variants yielded concordant calls for 13/13 variants at AF of ≥ 15%. The WUCaMP assay is a robust and sensitive method to detect somatic variants of clinical significance in molecular oncology laboratories, with reduced time and cost of genetic analysis allowing for strategic patient management. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Full Text Available RNA-sequencing is a powerful tool in studying RNomics. However, the highly abundance of ribosomal RNAs (rRNA and transfer RNA (tRNA have predominated in the sequencing reads, thereby hindering the study of lowly expressed genes. Therefore, rRNA depletion prior to sequencing is often performed in order to preserve the subtle alteration in gene expression especially those at relatively low expression levels. One of the commercially available methods is to use DNA or RNA probes to hybridize to the target RNAs. However, there is always a concern with the non-specific binding and unintended removal of messenger RNA (mRNA when the same set of probes is applied to different organisms. The degree of such unintended mRNA removal varies among organisms due to organism-specific genomic variation. We developed a computer-based method to design probes to deplete rRNA in an organism-specific manner. Based on the computation results, biotinylated-RNA-probes were produced by in vitro transcription and were used to perform rRNA depletion with subtractive hybridization. We demonstrated that the designed probes of 16S rRNAs and 23S rRNAs can efficiently remove rRNAs from Mycobacterium smegmatis. In comparison with a commercial subtractive hybridization-based rRNA removal kit, using organism-specific probes is better in preserving the RNA integrity and abundance. We believe the computer-based design approach can be used as a generic method in preparing RNA of any organisms for next-generation sequencing, particularly for the transcriptome analysis of microbes.
Nimwegen, K.J.M. van; Soest, R.A.; Veltman, J.A.; Nelen, M.R.; Wilt, G.J. van der; Peart-Vissers, L.E.L.M.; Grutters, J.P.C.
BACKGROUND: The substantial technological advancements in next-generation sequencing (NGS), combined with dropping costs, have allowed for a swift diffusion of NGS applications in clinical settings. Although several commercial parties report to have broken the $1000 barrier for sequencing an entire
mmi.oregonstate.edu/ccgl LONG - TERM GOALS We are developing next-generation sequencing and digital (d)PCR methodology for detection and species...ubiquitous DNA sequencing for surveys of biodiversity more efficient and affordable in the near future. RELATED PROJECTS None to date.
Ploem, Corrette; Dondorp, Wybo; de Wert, Guido; Hennekam, Raoul
Next-generation sequencing (NGS) involves the laying down of the sequence of the entire genome or exome at one time. This technique is expected to become one of the approaches in diagnostic testing. The genetically determined vulnerability of individuals to disorder and their response to treatment
Full Text Available Application of next-generation sequencing (NGS technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM (Life Technologies, a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene\\'s function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins\\' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/∼wkc/FullSignalRanker/ © 2015 IEEE.
Gómez, Juan; Gil-Peña, Helena; Santos, Fernando; Coto, Eliecer; Arango, Ana; Hernandez, Olaya; Rodríguez, Julián; Nadal, Inmaculada; Cantos, Virginia; Chocrón, Sara; Vergara, Inés; Madrid, Álvaro; Vazquez, Carlos; González, Luz E; Blanco, Fiona
Primary distal renal tubular acidosis (DRTA) is a rare disease caused by loss-of-function mutations in at least three genes (ATP6V0A4, ATP6V1B1, and SLC4A1) involved in urinary distal acidification. The next-generation sequencing (NGS) technique facilitates the search for mutations in DRTA patients and helps to characterize the genetic and clinical spectrum of the disease. Ten DRTA patients were studied. They had normal serum anion gap (AG), metabolic acidosis with simultaneous positive urinary AG, and inability to maximally acidify the urine. The exons of the three genes were sequenced in two pools by ultrasequencing. Putative mutations were confirmed by corresponding Sanger sequencing of each exon. We found 13 mutations in nine patients. ATP6V0A4: Intron16+2insA; p.R807Q; p.Q276fs; p.P395fs; Intron7-2T>C. ATP6V1B1: p.I386fs; p.R394Q. SLC4A1: p.V245M; p.R589C; p.R589H; p.G609A. One case was a compound heterozygous with a known mutation in ATP6V1B1 (p.G609R) and a pathogenic variation at SLC4A1 (p.E508K). One patient was negative for mutations. This study evidences that NGS is labor and cost effective for the analysis of DRTA genes. Our results show for the first time SLC4A1 gene mutations in Spanish patients and disclose that compound heterozygosity at two different genes can be responsible for DRTA.
Glen J Weiss
Full Text Available New anticancer agents that target a single cell surface receptor, up-regulated or amplified gene product, or mutated gene, have met with some success in treating advanced cancers. However, patients' tumors still eventually progress on these therapies. If it were possible to identify a larger number of targetable vulnerabilities in an individual's tumor, multiple targets could be exploited with the use of specific therapeutic agents, thus possibly giving the patient viable therapeutic alternatives.In this exploratory study, we used next-generation sequencing technologies (NGS including whole genome sequencing (WGS, and where feasible, whole transcriptome sequencing (WTS to identify genomic events and associated expression changes in advanced cancer patients.WGS on paired tumor and normal samples from nine advanced cancer patients and WTS on six of these patients' tumors was completed. One patient's treatment was based on targets and pathways identified by NGS and the patient had a short-lived PET/CT response with a significant reduction in his tumor-related pain. To design treatment plans based on information garnered from NGS, several challenges were encountered: NGS reporting delays, communication of results to out-of-state participants and their treating oncologists, and chain of custody handling for fresh biopsy samples for Clinical Laboratory Improvement Amendments (CLIA target validation.While the initial effort was a slower process than anticipated due to a variety of issues, we demonstrate the feasibility of using NGS in advanced cancer patients so that treatments for patients with progressing tumors may be improved.
Juanchich, Amelie; Bardou, Philippe; Rué, Olivier; Gabillard, Jean-Charles; Gaspin, Christine; Bobe, Julien; Guiguen, Yann
MicroRNAs (miRNAs) have emerged as important post-transcriptional regulators of gene expression in a wide variety of physiological processes. They can control both temporal and spatial gene expression and are believed to regulate 30 to 70% of the genes. Data are however limited for fish species, with only 9 out of the 30,000 fish species present in miRBase. The aim of the current study was to discover and characterize rainbow trout (Oncorhynchus mykiss) miRNAs in a large number of tissues using next-generation sequencing in order to provide an extensive repertoire of rainbow trout miRNAs. A total of 38 different samples corresponding to 16 different tissues or organs were individually sequenced and analyzed independently in order to identify a large number of miRNAs with high confidence. This led to the identification of 2946 miRNA loci in the rainbow trout genome, including 445 already known miRNAs. Differential expression analysis was performed in order to identify miRNAs exhibiting specific or preferential expression among the 16 analyzed tissues. In most cases, miRNAs exhibit a specific pattern of expression in only a few tissues. The expression data from sRNA sequencing were confirmed by RT-qPCR. In addition, novel miRNAs are described in rainbow trout that had not been previously reported in other species. This study represents the first characterization of rainbow trout miRNA transcriptome from a wide variety of tissue and sets an extensive repertoire of rainbow trout miRNAs. It provides a starting point for future studies aimed at understanding the roles of miRNAs in major physiological process such as growth, reproduction or adaptation to stress. These rainbow trout miRNAs repertoire provide a novel resource to advance genomic research in salmonid species.
Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.
Khairat, Rabab; Ball, Markus; Chang, Chun-Chi Hsieh; Bianucci, Raffaella; Nerlich, Andreas G; Trautmann, Martin; Ismail, Somaia; Shanab, Gamila M L; Karim, Amr M; Gad, Yehia Z; Pusch, Carsten M
We applied, for the first time, next-generation sequencing (NGS) technology on Egyptian mummies. Seven NGS datasets obtained from five randomly selected Third Intermediate to Graeco-Roman Egyptian mummies (806 BC-124AD) and two unearthed pre-contact Bolivian lowland skeletons were generated and characterised. The datasets were contrasted to three recently published NGS datasets obtained from cold-climate regions, i.e. the Saqqaq, the Denisova hominid and the Alpine Iceman. Analysis was done using one million reads of each newly generated or published dataset. Blastn and megablast results were analysed using MEGAN software. Distinct NGS results were replicated by specific and sensitive polymerase chain reaction (PCR) protocols in ancient DNA dedicated laboratories. Here, we provide unambiguous identification of authentic DNA in Egyptian mummies. The NGS datasets showed variable contents of endogenous DNA harboured in tissues. Three of five mummies displayed a human DNA proportion comparable to the human read count of the Saqqaq permafrost-preserved specimen. Furthermore, a metagenomic signature unique to mummies was displayed. By applying a "bacterial fingerprint", discrimination among mummies and other remains from warm areas outside Egypt was possible. Due to the absence of an adequate environment monitoring, a bacterial bloom was identified when analysing different biopsies from the same mummies taken after a lapse of time of 1.5 years. Plant kingdom representation in all mummy datasets was unique and could be partially associated with their use in embalming materials. Finally, NGS data showed the presence of Plasmodium falciparum and Toxoplasma gondii DNA sequences, indicating malaria and toxoplasmosis in these mummies. We demonstrate that endogenous ancient DNA can be extracted from mummies and serve as a proper template for the NGS technique, thus, opening new pathways of investigation for future genome sequencing of ancient Egyptian individuals.
Wecker, Thomas; Hoffmeier, Klaus; Plötner, Anne; Grüning, Björn Andreas; Horres, Ralf; Backofen, Rolf; Reinhard, Thomas; Schlunck, Günther
Extracellular microRNAs (miRNAs) in aqueous humor were suggested to have a role in transcellular signaling and may serve as disease biomarkers. The authors adopted next-generation sequencing (NGS) techniques to further characterize the miRNA profile in single samples of 60 to 80 μL human aqueous humor. Samples were obtained at the outset of cataract surgery in nine independent, otherwise healthy eyes. Four samples were used to extract RNA and generate sequencing libraries, followed by an adapter-driven amplification step, electrophoretic size selection, sequencing, and data analysis. Five samples were used for quantitative PCR (qPCR) validation of NGS results. Published NGS data on circulating miRNAs in blood were analyzed in comparison. One hundred fifty-eight miRNAs were consistently detected by NGS in all four samples; an additional 59 miRNAs were present in at least three samples. The aqueous humor miRNA profile shows some overlap with published NGS-derived inventories of circulating miRNAs in blood plasma with high prevalence of human miR-451a, -21, and -16. In contrast to blood, miR-184, -4448, -30a, -29a, -29c, -19a, -30d, -205, -24, -22, and -3074 were detected among the 20 most prevalent miRNAs in aqueous humor. Relative expression patterns of miR-451a, -202, and -144 suggested by NGS were confirmed by qPCR. Our data illustrate the feasibility of miRNA analysis by NGS in small individual aqueous humor samples. Intraocular cells as well as blood plasma contribute to the extracellular aqueous humor miRNome. The data suggest possible roles of miRNA in intraocular cell adhesion and signaling by TGF-β and Wnt, which are important in intraocular pressure regulation and glaucoma.
Tawari, Nilesh R; Seow, Justine Jia Wen; Dharuman, Perumal; Ow, Jack L; Ang, Shimin; Devasia, Arun George; Ng, Pauline C
ChronQC is a quality control (QC) tracking system for clinical implementation of next-generation sequencing (NGS). ChronQC generates time series plots for various QC metrics to allow comparison of current runs to historical runs. ChronQC has multiple features for tracking QC data including Westgard rules for clinical validity, laboratory-defined thresholds, and historical observations within a specified time period. Users can record their notes and corrective actions directly onto the plots for long-term recordkeeping. ChronQC facilitates regular monitoring of clinical NGS to enable adherence to high quality clinical standards. ChronQC is freely available on GitHub (https://github.com/nilesh-tawari/ChronQC), Docker (https://hub.docker.com/r/nileshtawari/chronqc/) and the Python Package Index. ChronQC is implemented in Python and runs on all common operating systems (Windows, Linux, and Mac OS X). email@example.com or firstname.lastname@example.org. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com
Profaizer, T; Lázár-Molnár, E; Close, D W; Delgado, J C; Kumánovics, A
Implementation of human leukocyte antigen (HLA) genotyping by next-generation sequencing (NGS) in the clinical lab brings new challenges to the laboratories performing this testing. With the advent of commercially available HLA-NGS typing kits, labs must make numerous decisions concerning capital equipment and address labor considerations. Therefore, careful and unbiased evaluation of available methods is imperative. In this report, we compared our in-house developed HLA NGS typing with two commercially available kits from Illumina and Omixon using 10 International Histocompatibility Working Group (IHWG) and 36 clinical samples. Although all three methods employ long range polymerase chain reaction (PCR) and have been developed on the Illumina MiSeq platform, the methodologies for library preparation show significant variations. There was 100% typing concordance between all three methods at the first field when a HLA type could be assigned. Overall, HLA typing by NGS using in-house or commercially available methods is now feasible in clinical laboratories. However, technical variables such as hands-on time and indexing strategies are sufficiently different among these approaches to impact the workflow of the clinical laboratory. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Viviana Cobos Jiménez
Full Text Available Macrophages are important for mounting inflammatory responses to tissue damage or infection by invading pathogens, and therefore modulation of their cellular functions is essential for the success of the immune system as well as for maintaining tissue homeostasis. Small non-coding RNAs are important regulatory elements of gene expression and microRNAs are the most widely known to be fundamental for the proper development of cells of the immune system. Macrophages can exhibit different phenotypes, depending on the cytokine environment they encounter in the affected tissues. We have analyzed the microRNA expression profiles during maturation of human primary monocytes into macrophages and polarization by pro- or anti-inflammatory cytokines. Here we describe the analysis of next-generation sequencing data deposited in EMBL–EBI ArrayExpress under accession number E-MTAB-1969 and associated with the study published by Cobos Jiménez and collaborators in Physiological Genomics in 2014 (1. The data presented here contributes to our understanding of microRNA expression profiles in human monocytes and macrophages and will also serve as a resource for novel microRNAs and other small RNA species expressed in these cells.
Sun, Jun; Meng, Zhefeng; Wu, Kaiqi; Liu, Biao; Zhang, Sufang; Liu, Yudan; Wang, Yuezhu; Zheng, Huajun; Huang, Jian; Zhou, Pingyu
Syphilis is a systemic sexually transmitted disease caused by Treponema pallidum ssp. pallidum (TPA). The origin and genetic background of Chinese TPA strains remain unclear. We identified a total of 329 single-nucleotide variants (SNVs) in eight Chinese TPA strains using next-generation sequencing. All of the TPA strains were clustered into three lineages, and Chinese TPA strains were grouped in Lineage 2 based on phylogenetic analysis. The phylogeographical data showed that TPA strains originated earlier than did T. pallidum ssp. pertenue (TPE) and T. pallidum ssp. endemicum (TPN) strains and that Chinese TPA strains might be derived from recombination between Lineage 1 and Lineage 3. Moreover, we found through a homology modeling analysis that a nonsynonymous substitution (I415F) in the PBP3 protein might affect the structural flexibility of PBP3 and the binding constant for substrates based on its possible association with penicillin resistance in T. pallidum. Our findings provide new insight into the molecular foundation of the evolutionary origin of TPA and support the development of novel diagnostic/therapeutic technology for syphilis.
Spencer, Thomas E.; Palmarini, Massimo
Endogenous retroviruses (ERVs) are present in the genome of all vertebrates and are remnants of ancient exogenous retroviral infections of the host germline transmitted vertically from generation to generation. The sheep genome contains 27 JSRV-related endogenous betaretroviruses (enJSRVs) related to the pathogenic Jaagsiekte sheep retrovirus (JSRV) that have been integrating in the host genome for the last 5 to 7 million years. The exogenous JSRV is a causative agent of a transmissible lung cancer in sheep, and enJSRVs are able to protect the host against JSRV infection. In sheep, the enJSRVs are most abundantly expressed in the uterine epithelia as well as in the conceptus (embryo and associated extraembryonic membranes) trophectoderm. Sixteen of the 27 enJSRV loci contain an envelope (env) gene with an intact open reading frame, and in utero loss-of-function experiments found the enJSRVs Env to be essential for trophoblast outgrowth and conceptus elongation. Collectively, available evidence supports the ideas that genes captured from ancestral retroviruses were pivotal in the acquisition of new, important functions in mammalian evolution and were positively selected for biological roles in genome plasticity, protection of the host against infection of related pathogenic and exogenous retroviruses, and a convergent physiological role in placental morphogenesis and thus mammalian reproduction. The discovery of ERVs in mammals was initially based on molecular cloning discovery techniques and will be boosted forward by next generation sequencing technologies and in silico discovery techniques. PMID:22951118
Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Werner Mewes, H-; Küffner, Robert
Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license) CONTACT: firstname.lastname@example.org. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Arjun K. Mishra
Full Text Available Affinity maturation is the process whereby the immune system generates antibodies of higher affinities during a response to antigen. It is unique in being the only evolutionary mechanism known to operate on a molecule in an organism’s own body. Deciphering the structural mechanisms through which somatic mutations in antibody genes increase affinity is critical to understanding the evolution of immune repertoires. Next-generation sequencing (NGS has allowed the reconstruction of antibody clonal lineages in response to viral pathogens, such as HIV-1, which was not possible in earlier studies of affinity maturation. Crystal structures of antibodies from these lineages bound to their target antigens have revealed, at the atomic level, how antibodies evolve to penetrate the glycan shield of envelope glycoproteins, and how viruses in turn evolve to escape neutralization. Collectively, structural studies of affinity maturation have shown that increased antibody affinity can arise from any one or any combination of multiple diverse mechanisms, including improved shape complementarity at the interface with antigen, increased buried surface area upon complex formation, additional interfacial polar or hydrophobic interactions, and preorganization or rigidification of the antigen-binding site.
Ariede, Raquel B; Freitas, Milena V; Hata, Milene E; Matrochirico-Filho, Vito A; Utsunomia, Ricardo; Mendonça, Fernando F; Foresti, Fausto; Porto-Foresti, Fábio; Hashimoto, Diogo T
Tambaqui (Colossoma macropomum) is a fish species from the Amazon and Orinoco Rivers, with favorable characteristics to the cultivation system and great market acceptance in South America. However, the construction of a genetic map for the genetic improvement of this species is limited by the low number of molecular markers currently described. Thus, this study aimed to validate gene-associated and anonymous (non-genic) microsatellites obtained by next generation sequencing (RNA-seq and whole genome shotgun-WGS, respectively), for future construction of a genetic map and search for quantitative trait loci (QTL) in this species. In the RNA-seq data, the observed and expected heterozygosity (H o and H e ) ranged from 0.09 to 0.73, and 0.09 to 0.85, respectively. In the WGS data, H o and H e ranged from 0.33 to 0.95, and 0.28 to 0.92, respectively. In general, the evaluation of 200 markers resulted in 45 polymorphic loci, of which 14 were gene-associated (RNA-Seq) and 31 were anonymous (WGS). Moreover, some markers were related to genes of the immune system, biological regulation/control and biogenesis. This study contributes to increase the number of molecular markers available for genetic studies in C. macropomum, which will allow the development of breeding programs assisted by molecular markers.
Asai, Sneha; Ianora, Adrianna; Lauritano, Chiara; Lindeque, Penelope K; Carotenuto, Ylenia
Despite the ecological importance of copepods, few Next Generation Sequencing studies (NGS) have been performed on small crustaceans, and a standard method for RNA extraction is lacking. In this study, we compared three commonly-used methods: TRIzol®, Aurum Total RNA Mini Kit and Qiagen RNeasy Micro Kit, in combination with preservation reagents TRIzol® or RNAlater®, to obtain high-quality and quantity of RNA from copepods for NGS. Total RNA was extracted from the copepods Calanus helgolandicus, Centropages typicus and Temora stylifera and its quantity and quality were evaluated using NanoDrop, agarose gel electrophoresis and Agilent Bioanalyzer. Our results demonstrate that preservation of copepods in RNAlater® and extraction with Qiagen RNeasy Micro Kit were the optimal isolation method for high-quality and quantity of RNA for NGS studies of C. helgolandicus. Intriguingly, C. helgolandicus 28S rRNA is formed by two subunits that separate after heat-denaturation and migrate along with 18S rRNA. This unique property of protostome RNA has never been reported in copepods. Overall, our comparative study on RNA extraction protocols will help increase gene expression studies on copepods using high-throughput applications, such as RNA-Seq and microarrays. Copyright © 2014 Elsevier B.V. All rights reserved.
Full Text Available There is an increasing need to calibrate microbial community profiles obtained through next generation sequencing (NGS with relevant taxonomic identities of the microbes, and to further associate these identities with phenotypic attributes. Phenotype Microarray (PM techniques provide a semi-high throughput assay for characterization and monitoring the microbial cellular phenotypes. Here, we present detailed descriptions of two different PM protocols used in our recent studies on fungal endophytes of forest trees, and highlight the benefits and limitations of this technique. We found that the PM approach enables effective screening of substrate utilization by endophytes. However, the technical limitations are multifaceted and the interpretation of the PM data challenging. For the best result, we recommend that the growth conditions for the fungi are carefully standardized. In addition, rigorous replication and control strategies should be employed whether using pre-configured, commercial microwell-plates or in-house designed PM plates for targeted substrate analyses. With these precautions, the PM technique is a valuable tool to characterize the metabolic capabilities of individual endophyte isolates, or successional endophyte communities identified by NGS, allowing a functional interpretation of the taxonomic data. Thus, PM approaches can provide valuable complementary information for NGS studies of fungal endophytes in forest trees.
Ruiz Salas, Amalio; Peña Hernández, José; Medina Palomo, Carmen; Barrera Cordero, Alberto; Cabrera Bueno, Fernando; García Pinilla, José Manuel; Guijarro, Ana; Morcillo-Hidalgo, Luis; Jiménez Navarro, Manuel; Gómez Doblas, Juan José; de Teresa, Eduardo; Alzueta, Javier
Arrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited cardiomyopathy characterized by progressive fibrofatty replacement of predominantly right ventricular myocardium. This cardiomyopathy is a frequent cause of sudden cardiac death in young people and athletes. The aim of our study was to determine the incidence of pathological or likely pathological desmosomal mutations in patients with high-risk definite ARVC. This was an observational, retrospective cohort study, which included 36 patients diagnosed with high-risk ARVC in our hospital between January 1998 and January 2015. Genetic analysis was performed using next-generation sequencing. Most patients were male (28 patients, 78%) with a mean age at diagnosis of 45 ± 18 years. A pathogenic or probably pathogenic desmosomal mutation was detected in 26 of the 35 index cases (74%): 5 nonsense, 14 frameshift, 1 splice, and 6 missense. Novel mutations were found in 15 patients (71%). The presence or absence of desmosomal mutations causing the disease and the type of mutation were not associated with specific electrocardiographic, clinical, arrhythmic, anatomic, or prognostic characteristics. The incidence of pathological or likely pathological desmosomal mutations in ARVC is very high, with most mutations causing truncation. The presence of desmosomal mutations was not associated with prognosis. Copyright © 2017 Sociedad Española de Cardiología. Published by Elsevier España, S.L.U. All rights reserved.
Mathieu, Ghislaine; Groisman, Iris Jaitovich; Godard, Beatrice
The use of next generation sequencing (NGS) technologies in psychiatric genetics research and its potential to generate individual research results will likely have far reaching implications for predictive and diagnostic practices. The extent of this impact may not be easily understood by psychiatric research participants during the consent process. The traditional consent process for studies involving human subjects does not address critical issues specific to NGS research, such as the return of results. We examined which type of research findings should be communicated, how this information should be conveyed during the consent process and what guidance is required by researchers and IRBs to help psychiatric research participants understand the peculiarities, the limits and the impact of NGS. Strong standards are needed to ensure appropriate use of data generated by NGS, to meet participants' expectations and needs, and to clarify researchers' duties regarding the disclosure of data and their subsequent management. In the short term, researchers and IRBs need to be proactive in revising current consent processes that deal with the disclosure of research findings.
Du, Xuefei; Jiang, Xiao; Ye, Yanhua; Guo, Baofu; Wang, Wei; Ding, Jie; Xie, Guoxiang
Salmonella Schwarzengrund is most frequently isolated from poultry meat and can cause human infections. S. Schwarzengrund was isolated from diarrheal patients in a food poisoning event in Nanjing, China. Three strains isolated from patients were microbiologically confirmed as S. Schwarzengrund. Salmonella strains from spiced donkey meat were also confirmed as S. Schwarzengrund. Epidemiology investigation showed evidence of a correlation between the consumption of spiced donkey meat and those cases. Pulsed field gel electrophoresis, antibiotic susceptibility test and next generation sequencing (NGS) were employed to investigate this food poisoning event. The 3 strains isolated from patients and the strain isolated from the spiced donkey meat showed same results in PFGE, antibiotic susceptibility test and no SNPs were observed between these 4 strains in NGS analysis. NGS data could be used in the confirmation of an outbreak and in the tracing of contamination. However, this standard of defining an outbreak with NGS remained a challenge in practice. And the NGS data should be used in combination with other data in epidemiological investigation. Copyright © 2017. Published by Elsevier B.V.
Full Text Available BRCA germline mutations are the most common predisposing factor in familial breast-ovarian cancer syndrome families. However, many screened patients are identified as harboring BRCA variants of uncertain significance (VUS, rather than carrying deleterious germline mutations [Calo et al.: Cancers 2010; 2:1644–1660]. While such VUSs are typically reclassified as benign polymorphisms, this may occur years after the VUS is first identified [Murray et al.: Genet Med 2011; 13; 998–1005]. Loss of heterozygosity (LOH of BRCA is nearly always the gatekeeper event in inherited BRCA-related breast cancer and LOH of BRCA is rare in sporadic cancers [Osorio et al.: Int J Cancer 2002; 99:305–309]. Here, we describe a patient identified as carrying a germline BRCA VUS. Tumor next-generation sequencing (NGS demonstrated a very high mutation allelic frequency for that BRCA VUS, consistent with LOH. This case illustrates that since BRCA LOH is the typical mechanism of transformation in inherited BRCA-related breast cancers, NGS might be used to suggest that the BRCA VUS is actually cancer predisposing in a particular family. As a result, this may help patients make more informed decisions regarding screening and prophylactic therapy, long before official reclassification of the VUS occurs.
Iacocca, Michael A; Wang, Jian; Dron, Jacqueline S; Robinson, John F; McIntyre, Adam D; Cao, Henian; Hegele, Robert A
Familial hypercholesterolemia (FH) is a heritable condition of severely elevated LDL cholesterol, caused predominantly by autosomal codominant mutations in the LDL receptor gene ( LDLR ). In providing a molecular diagnosis for FH, the current procedure often includes targeted next-generation sequencing (NGS) panels for the detection of small-scale DNA variants, followed by multiplex ligation-dependent probe amplification (MLPA) in LDLR for the detection of whole-exon copy number variants (CNVs). The latter is essential because ∼10% of FH cases are attributed to CNVs in LDLR ; accounting for them decreases false negative findings. Here, we determined the potential of replacing MLPA with bioinformatic analysis applied to NGS data, which uses depth-of-coverage analysis as its principal method to identify whole-exon CNV events. In analysis of 388 FH patient samples, there was 100% concordance in LDLR CNV detection between these two methods: 38 reported CNVs identified by MLPA were also successfully detected by our NGS method, while 350 samples negative for CNVs by MLPA were also negative by NGS. This result suggests that MLPA can be removed from the routine diagnostic screening for FH, significantly reducing associated costs, resources, and analysis time, while promoting more widespread assessment of this important class of mutations across diagnostic laboratories. Copyright © 2017 by the American Society for Biochemistry and Molecular Biology, Inc.
Qian, Xiaoqin; Hou, Jiayi; Wang, Zheng; Ye, Yi; Lang, Min; Gao, Tianzhen; Liu, Jing; Hou, Yiping
There is high demand for forensic pedigree searches with Y-chromosome short tandem repeat (Y-STR) profiling in large-scale crime investigations. However, when two Y-STR haplotypes have a few mismatched loci, it is difficult to determine if they are from the same male lineage because of the high mutation rate of Y-STRs. Here we design a new strategy to handle cases in which none of pedigree samples shares identical Y-STR haplotype. We combine next generation sequencing (NGS), capillary electrophoresis and pyrosequencing under the term 'NGS+' for typing Y-STRs and Y-chromosomal single nucleotide polymorphisms (Y-SNPs). The high-resolution Y-SNP haplogroup and Y-STR haplotype can be obtained with NGS+. We further developed a new data-driven decision rule, FSindex, for estimating the likelihood for each retrieved pedigree. Our approach enables positive identification of pedigree from mismatched Y-STR haplotypes. It is envisaged that NGS+ will revolutionize forensic pedigree searches, especially when the person of interest was not recorded in forensic DNA database.
Vianna, Juliana A.; Noll, Daly; Mura-Jornet, Isidora; Valenzuela-Guerra, Paulina; González-Acuña, Daniel; Navarro, Cristell; Loyola, David E.; Dantas, Gisele P. M.
Abstract Microsatellites are valuable molecular markers for evolutionary and ecological studies. Next generation sequencing is responsible for the increasing number of microsatellites for non-model species. Penguins of the Pygoscelis genus are comprised of three species: Adélie (P. adeliae), Chinstrap (P. antarcticus) and Gentoo penguin (P. papua), all distributed around Antarctica and the sub-Antarctic. The species have been affected differently by climate change, and the use of microsatellite markers will be crucial to monitor population dynamics. We characterized a large set of genome-wide microsatellites and evaluated polymorphisms in all three species. SOLiD reads were generated from the libraries of each species, identifying a large amount of microsatellite loci: 33,677, 35,265 and 42,057 for P. adeliae, P. antarcticus and P. papua, respectively. A large number of dinucleotide (66,139), trinucleotide (29,490) and tetranucleotide (11,849) microsatellites are described. Microsatellite abundance, diversity and orthology were characterized in penguin genomes. We evaluated polymorphisms in 170 tetranucleotide loci, obtaining 34 polymorphic loci in at least one species and 15 polymorphic loci in all three species, which allow to perform comparative studies. Polymorphic markers presented here enable a number of ecological, population, individual identification, parentage and evolutionary studies of Pygoscelis, with potential use in other penguin species. PMID:28898354
Black, Michael; Wang, Wenzhi; Wang, Wei
Stroke is a major cause of mortality and morbidity in both the developed and developing world. Next generation sequencing (NGS) and multi-omics integrative biology research offer new opportunities in the way we research and understand stroke. These biotechnologies also signal a shift from genetics to genomics of stroke, which is highlighted in this review. Stroke is a focal neurological deficit resulting from disruption of the cerebral blood supply. There are two main types of common stroke, ischemic stroke (IS), which comprises 80% of cases, and hemorrhagic stroke (HS) that accounts for about 20% of cases. IS is a complex multi-factorial disease with multiple environmental and genomic determinants. We discuss here IS from genomics and bioinformatics perspectives, including the highlights of the genome wide association studies (GWAS), NGS progress to date, and exome studies. While both 'common variant, common disease' and 'rare variant, common disease' approaches need to be assessed in tandem, future studies into IS omics should also consider pedigree and/or community based sampling to take account of the complex diversity of IS genetics. We conclude by presenting an example of such community genomics research from China in an extended pedigree sample, and the ways in which the intersection of genomics and global society can usefully inform our understanding of IS pathophysiology and potential preventive medicine interventions in the future.
Katalin Komlosi MD, PhD
Full Text Available Next-generation sequencing (NGS panels are used widely in clinical diagnostics to identify genetic causes of various monogenic disease groups including neurometabolic disorders and, more recently, lysosomal storage disorders (LSDs. Many new challenges have been introduced through these new technologies, both at the laboratory level and at the bioinformatics level, with consequences including new requirements for interpretation of results, and for genetic counseling. We review some recent examples of the application of NGS technologies, with purely diagnostic and with both diagnostic and research aims, for establishing a rapid genetic diagnosis in LSDs. Given that NGS can be applied in a way that takes into account the many issues raised by international consensus guidelines, it can have a significant role even early in the course of the diagnostic process, in combination with biochemical and clinical data. Besides decreasing the delay in diagnosis for many patients, a precise molecular diagnosis is extremely important as new therapies are becoming available within the LSD spectrum for patients who share specific types of mutations. A genetic diagnosis is also the prerequisite for genetic counseling, family planning, and the individual choice of reproductive options in affected families.
Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun
Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Full Text Available While Next-Generation Sequencing (NGS can now be considered an established analysis technology for research applications across the life sciences, the analysis workflows still require substantial bioinformatics expertise. Typical challenges include the appropriate selection of analytical software tools, the speedup of the overall procedure using HPC parallelization and acceleration technology, the development of automation strategies, data storage solutions and finally the development of methods for full exploitation of the analysis results across multiple experimental conditions. Recently, NGS has begun to expand into clinical environments, where it facilitates diagnostics enabling personalized therapeutic approaches, but is also accompanied by new technological, legal and ethical challenges. There are probably as many overall concepts for the analysis of the data as there are academic research institutions. Among these concepts are, for instance, complex IT architectures developed in-house, ready-to-use technologies installed on-site as well as comprehensive Everything as a Service (XaaS solutions. In this mini-review, we summarize the key points to consider in the setup of the analysis architectures, mostly for scientific rather than diagnostic purposes, and provide an overview of the current state of the art and challenges of the field.
Li, Heng; Homer, Nils
Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.
Korneliussen, Thorfinn Sand; Moltke, Ida; Albrechtsen, Anders
A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima's D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However......, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions....
Liao, Peizhou; Satten, Glen A; Hu, Yi-Juan
Inferring population structure is important for both population genetics and genetic epidemiology. Principal components analysis (PCA) has been effective in ascertaining population structure with array genotype data but can be difficult to use with sequencing data, especially when low depth leads to uncertainty in called genotypes. Because PCA is sensitive to differences in variability, PCA using sequencing data can result in components that correspond to differences in sequencing quality (read depth and error rate), rather than differences in population structure. We demonstrate that even existing methods for PCA specifically designed for sequencing data can still yield biased conclusions when used with data having sequencing properties that are systematically different across different groups of samples (i.e. sequencing groups). This situation can arise in population genetics when combining sequencing data from different studies, or in genetic epidemiology when using historical controls such as samples from the 1000 Genomes Project. To allow inference on population structure using PCA in these situations, we provide an approach that is based on using sequencing reads directly without calling genotypes. Our approach is to adjust the data from different sequencing groups to have the same read depth and error rate so that PCA does not generate spurious components representing sequencing quality. To accomplish this, we have developed a subsampling procedure to match the depth distributions in different sequencing groups, and a read-flipping procedure to match the error rates. We average over subsamples and read flips to minimize loss of information. We demonstrate the utility of our approach using two datasets from 1000 Genomes, and further evaluate it using simulation studies. TASER-PC software is publicly available at http://web1.sph.emory.edu/users/yhu30/software.html. firstname.lastname@example.org. Supplementary data are available at Bioinformatics online.
Cheng, Lihua; Lu, Wei; Kulkarni, Bhushan; Pejovic, Tanja; Yan, Xiaowei; Chiang, Jung-Hsien; Hood, Leroy; Odunsi, Kunle; Lin, Biaoyang
To understand the chemotherapy response program in ovarian cancer cells at deep transcript sequencing levels. Two next-generation sequencing technologies--MPSS (massively parallel signature sequencing) and SBS (sequencing by synthesis)--were used to sequence the transcripts of IGROV1 and IGROV1-CP cells, and to sequence the transcripts of a highly chemotherapy responsive and a highly chemotherapy resistant ovarian cancer tissue. We identified 3422 signatures (2957 genes) that are significantly different between IGROV1 and IGROV1-CP cells (P<0.001). Gene Ontology (GO) term GO:0001837 (epithelial-to-mesenchymal transition) and GO:0034330 (cell junction assembly and maintenance) are enriched in genes that are over expressed in IGROV1-CP cells while apoptosis-related GO terms are enriched in genes over expressed in IGROV1 cells. We identified 1187 tags (corresponding to 1040 genes) that are differentially expressed between the chemotherapy responsive and the persistently chemotherapy resistant ovarian cancer tissues. GO term GO:0050673 (epithelial cell proliferation) and GO:0050678 (regulation of epithelial cell proliferation) are enriched in the genes over expressed in the chemotherapy resistant tissue while the GO:0007229 (integrin-mediated signaling pathway) is enriched in the genes over expressed in the chemotherapy sensitive tissue. An integrative analysis identified 111 common differentially expressed genes including two bone morphogenetic proteins (BMP4 and BMP7), six solute carrier proteins (SLC10A3, SLC16A3, SLC25A1, SLC35B3, SLC7A5 and SLC7A7), transcription factor POU5F1 (POU class 5 homeobox 1), and KLK10 (kallikrein-related peptidase 10). A network analysis revealed a subnetwork with three genes BMP7, NR2F2 and AP2B1 that were consistently over expressed in the chemoresistant tissue or cells compared to the chemosensitive tissue or cells. Our database offers the first comprehensive view of the digital transcriptomes of ovarian cancer cell lines and tissues
Next-Generation Sequencing in Gynaecological Tumours : The Prognostic and Predictive Value of the Most Common Mutations Found in Ovarian, Endometrial, and Cervical Tumours: Literature Review and the University Medical Centre Utrecht Next-Generation Sequencing Data
van Winkel, Eline; de Leng, Wendy W.J.|info:eu-repo/dai/nl/304822140; Witteveen, Petronella O.|info:eu-repo/dai/nl/17530808X; Jonges, Trudy G N|info:eu-repo/dai/nl/113937172; Willems, Stefan M.|info:eu-repo/dai/nl/33189582X; Langenberg, Marlies H.G.
Objective: To investigate whether next-generation sequencing (NGS) in ovarian and endometrial tumours can discover mutations with a relevant prognostic or predictive value. Methods: After a literature search, selected studies were critically appraised using the Quality in Prognostic Studies tool.
van Amerongen, Rosa A; Retèl, Valesca P; Coupé, Veerle MH; Nederlof, Petra M; Vogel, Maartje J; van Harten, Wim H
Next-generation sequencing (NGS) has reached the molecular diagnostic laboratories. Although the NGS technology aims to improve the effectiveness of therapies by selecting the most promising therapy, concerns are that NGS testing is expensive and that the ‘benefits’ are not yet in relation to these costs. In this study, we give an estimation of the costs and an institutional and national budget impact of various types of NGS tests in non-small-cell lung cancer (NSCLC) and melanoma patients within The Netherlands. First, an activity-based costing (ABC) analysis has been conducted on the costs of two examples of NGS panels (small- and medium-targeted gene panel (TGP)) based on data of The Netherlands Cancer Institute (NKI). Second, we performed a budget impact analysis (BIA) to estimate the current (2015) and future (2020) budget impact of NGS on molecular diagnostics for NSCLC and melanoma patients in The Netherlands. Literature, expert opinions, and a data set of patients within the NKI (n = 172) have been included in the BIA. Based on our analysis, we expect that the NGS test cost concerns will be limited. In the current situation, NGS can indeed result in higher diagnostic test costs, which is mainly related to required additional tests besides the small TGP. However, in the future, we expect that the use of whole-genome sequencing (WGS) will increase, for which it is expected that additional tests can be (partly) avoided. Although the current clinical benefits are expected to be limited, the research potentials of NGS are already an important advantage. PMID:27899957
Danielle Mercatante Carrick
Full Text Available Next Generation Sequencing (NGS technologies are used to detect somatic mutations in tumors and study germ line variation. Most NGS studies use DNA isolated from whole blood or fresh frozen tissue. However, formalin-fixed paraffin-embedded (FFPE tissues are one of the most widely available clinical specimens. Their potential utility as a source of DNA for NGS would greatly enhance population-based cancer studies. While preliminary studies suggest FFPE tissue may be used for NGS, the feasibility of using archived FFPE specimens in population based studies and the effect of storage time on these specimens needs to be determined. We conducted a study to determine whether DNA in archived FFPE high-grade ovarian serous adenocarcinomas from Surveillance, Epidemiology and End Results (SEER registries Residual Tissue Repositories (RTR was present in sufficient quantity and quality for NGS assays. Fifty-nine FFPE tissues, stored from 3 to 32 years, were obtained from three SEER RTR sites. DNA was extracted, quantified, quality assessed, and subjected to whole exome sequencing (WES. Following DNA extraction, 58 of 59 specimens (98% yielded DNA and moved on to the library generation step followed by WES. Specimens stored for longer periods of time had significantly lower coverage of the target region (6% lower per 10 years, 95% CI: 3-10% and lower average read depth (40x lower per 10 years, 95% CI: 18-60, although sufficient quality and quantity of WES data was obtained for data mining. Overall, 90% (53/59 of specimens provided usable NGS data regardless of storage time. This feasibility study demonstrates FFPE specimens acquired from SEER registries after varying lengths of storage time and under varying storage conditions are a promising source of DNA for NGS.
Hurley, C K; Hou, L; Lazaro, A; Gerfen, J; Enriquez, E; Galarza, P; Rodriguez Cardozo, M B; Halagan, M; Maiers, M; Behm, D; Ng, J
Next generation DNA sequencing is used to determine the HLA-A, -B, -C, -DRB1, and -DQB1 assignments of 1472 unrelated volunteers for the unrelated donor registry in Argentina. The analysis characterized all HLA exons and introns for class I alleles; at least exons 2, 3 for HLA-DRB1; and exons 2 to 6 for HLA-DQB1. Of the distinct alleles present, there are 330 class I and 98 class II. The majority (~98%) of the cumulative allele frequency at each locus is contributed by alleles that appear at a frequency of at least 1 in 1000. Fourteen (18.2%) of the 77 novel class I and II alleles carry nonsynonymous variation within their exons; 52 (75.4%) class I novel alleles carry only single, apparently random, nucleotide variation within their introns/untranslated regions. Alleles encoding protein variation not usually detected by typing focused only on the exons encoding the antigen recognition domain are 1.0% of the class I assignments and 7.3% of the class II assignments (predominantly DQB1*02:02:01, DQB1*03:19:01, and DRB1*14:54:01). Updates to the common and well documented list of alleles include 10 alleles previously thought to be uncommon but that are found at least 30 times. Five locus haplotypes estimated using the expectation-maximization algorithm as present 3 or more times total 187. While the known HLA diversity continues to increase, the conservation of known allele sequences is remarkable. Overall, the HLA diversity observed in the Argentinian population reflects its European and Native American ancestry. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Background Familial hypercholesterolaemia (FH) is a common Mendelian condition which, untreated, results in premature coronary heart disease. An estimated 88% of FH cases are undiagnosed in the UK. We previously validated a method for FH mutation detection in a lipid clinic population using next generation sequencing (NGS), but this did not address the challenge of identifying index cases in primary care where most undiagnosed patients receive healthcare. Here, we evaluate the targeted use of NGS as a potential route to diagnosis of FH in a primary care population subset selected for hypercholesterolaemia. Methods We used microfluidics-based PCR amplification coupled with NGS and multiplex ligation-dependent probe amplification (MLPA) to detect mutations in LDLR, APOB and PCSK9 in three phenotypic groups within the Generation Scotland: Scottish Family Health Study including 193 individuals with high total cholesterol, 232 with moderately high total cholesterol despite cholesterol-lowering therapy, and 192 normocholesterolaemic controls. Results Pathogenic mutations were found in 2.1% of hypercholesterolaemic individuals, in 2.2% of subjects on cholesterol-lowering therapy and in 42% of their available first-degree relatives. In addition, variants of uncertain clinical significance (VUCS) were detected in 1.4% of the hypercholesterolaemic and cholesterol-lowering therapy groups. No pathogenic variants or VUCS were detected in controls. Conclusions We demonstrated that population-based genetic testing using these protocols is able to deliver definitive molecular diagnoses of FH in individuals with high cholesterol or on cholesterol-lowering therapy. The lower cost and labour associated with NGS-based testing may increase the attractiveness of a population-based approach to FH detection compared to genetic testing with conventional sequencing. This could provide one route to increasing the present low percentage of FH cases with a genetic diagnosis. PMID:24956927
Weiß, Clemens L; Pais, Marina; Cano, Liliana M; Kamoun, Sophien; Burbano, Hernán A
Intraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats. We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker's yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log- likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies. nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at https://github.com/clwgg/nQuire under the MIT license.
Full Text Available The molecular mechanisms underlying thoracic aortic aneurysm (TAA in patients with bicuspid aortic valve (BAV are incompletely characterized. MicroRNAs (miRNAs may play a major role in the different pathogenesis of aortopathy. We sought to employ next-generation sequencing to analyze the entire miRNome in TAA tissue from patients with BAV and tricuspid aortic valve (TAV. In the discovery stage, small RNA sequencing was performed using the Illumina MiSeq platform in 13 TAA tissue samples (seven patients with BAV and six with TAV. Gene ontology (GO and KEGG pathway analysis were used to identify key pathways and biological functions. Validation analysis was performed by qRT-PCR in an independent cohort of 30 patients with BAV (26 males; 59.5 ± 12 years and 30 patients with TAV (16 males; 68.5 ± 9.5 years. Bioinformatic analysis identified a total of 489 known mature miRNAs and five novel miRNAs. Compared to TAV samples, 12 known miRNAs were found to be differentially expressed in BAV, including two up-regulated and 10 down-regulated (FDR-adjusted p-value ≤ 0.05 and fold change ≥ 1.5. GO and KEGG pathway enrichment analysis (FDR-adjusted p-value < 0.05 identified different target genes and pathways linked to BAV and aneurysm formation, including Hippo signaling pathway, ErbB signaling, TGF-beta signaling and focal adhesion. Validation analysis of selected miRNAs confirmed the significant down-regulation of miR-424-3p (p = 0.01 and miR-3688-3p (p = 0.03 in BAV patients as compared to TAV patients. Our study provided the first in-depth screening of the whole miRNome in TAA specimens and identified specific dysregulated miRNAs in BAV patients.
Zacher, Angela; Kaulich, Kerstin; Stepanow, Stefanie; Wolter, Marietta; Köhrer, Karl; Felsberg, Jörg; Malzkorn, Bastian; Reifenberger, Guido
Current classification of gliomas is based on histological criteria according to the World Health Organization (WHO) classification of tumors of the central nervous system. Over the past years, characteristic genetic profiles have been identified in various glioma types. These can refine tumor diagnostics and provide important prognostic and predictive information. We report on the establishment and validation of gene panel next generation sequencing (NGS) for the molecular diagnostics of gliomas. We designed a glioma-tailored gene panel covering 660 amplicons derived from 20 genes frequently aberrant in different glioma types. Sensitivity and specificity of glioma gene panel NGS for detection of DNA sequence variants and copy number changes were validated by single gene analyses. NGS-based mutation detection was optimized for application on formalin-fixed paraffin-embedded tissue specimens including small stereotactic biopsy samples. NGS data obtained in a retrospective analysis of 121 gliomas allowed for their molecular classification into distinct biological groups, including (i) isocitrate dehydrogenase gene (IDH) 1 or 2 mutant astrocytic gliomas with frequent α-thalassemia/mental retardation syndrome X-linked (ATRX) and tumor protein p53 (TP53) gene mutations, (ii) IDH mutant oligodendroglial tumors with 1p/19q codeletion, telomerase reverse transcriptase (TERT) promoter mutation and frequent Drosophila homolog of capicua (CIC) gene mutation, as well as (iii) IDH wildtype glioblastomas with frequent TERT promoter mutation, phosphatase and tensin homolog (PTEN) mutation and/or epidermal growth factor receptor (EGFR) amplification. Oligoastrocytic gliomas were genetically assigned to either of these groups. Our findings implicate gene panel NGS as a promising diagnostic technique that may facilitate integrated histological and molecular glioma classification. © 2016 International Society of Neuropathology.
Tindall Elizabeth A
Full Text Available Abstract Background High-throughput custom designed genotyping arrays are a valuable resource for biologically focused research studies and increasingly for validation of variation predicted by next-generation sequencing (NGS technologies. We investigate the Illumina GoldenGate chemistry using custom designed VeraCode and sentrix array matrix (SAM assays for each of these applications, respectively. We highlight applications for interpretation of Illumina generated genotype cluster plots to maximise data inclusion and reduce genotyping errors. Findings We illustrate the dramatic effect of outliers in genotype calling and data interpretation, as well as suggest simple means to avoid genotyping errors. Furthermore we present this platform as a successful method for two-cluster rare or non-autosomal variant calling. The success of high-throughput technologies to accurately call rare variants will become an essential feature for future association studies. Finally, we highlight additional advantages of the Illumina GoldenGate chemistry in generating unusually segregated cluster plots that identify potential NGS generated sequencing error resulting from minimal coverage. Conclusions We demonstrate the importance of visually inspecting genotype cluster plots generated by the Illumina software and issue warnings regarding commonly accepted quality control parameters. In addition to suggesting applications to minimise data exclusion, we propose that the Illumina cluster plots may be helpful in identifying potential in-put sequence errors, particularly important for studies to validate NGS generated variation.
Kyrochristos, Ioannis D; Glantzounis, Georgios K; Ziogas, Demosthenes E; Gizas, Ioannis; Schizas, Dimitrios; Lykoudis, Efstathios G; Felekouras, Evangelos; Machairas, Anastasios; Katsios, Christos; Liakakos, Theodoros; Cho, William C; Roukos, Dimitrios H
Hepatobiliary and pancreatic (HBP) cancers are associated with high cancer-related death rates. Surgery aiming for complete tumor resection (R0) remains the cornerstone of the treatment for HBP cancers. The current progress in the adjuvant treatment is quite slow, with gemcitabine chemotherapy available only for pancreatic ductal adenocarcinoma (PDA). In the advanced and metastatic setting, only two targeted drugs have been approved by the Food & Drug Administration (FDA), which are sorafenib for hepatocellular carcinoma and erlotinib for PDA. It is a pity that multiple Phase III randomized control trials testing the efficacy of targeted agents have negative results. Failure in the development of effective drugs probably reflects the poor understanding of genome-wide alterations and molecular mechanisms orchestrating therapeutic resistance and recurrence. In the post-ENCODE (Encyclopedia of DNA Elements) era, cancer is referred to as a highly heterogeneous and systemic disease of the genome. The unprecedented potential of next-generation sequencing (NGS) technologies to accurately identify genetic and genomic variations has attracted major research and clinical interest. The applications of NGS include targeted NGS with potential clinical implications, while whole-exome and whole-genome sequencing focus on the discovery of both novel cancer driver genes and therapeutic targets. These advances dictate new designs for clinical trials to validate biomarkers and drugs. This review discusses the findings of available NGS studies on HBP cancers and the limitations of genome sequencing analysis to translate genome-based biomarkers and drugs into patient care in the clinic.
Ioannis D. Kyrochristos
Full Text Available Hepatobiliary and pancreatic (HBP cancers are associated with high cancer-related death rates. Surgery aiming for complete tumor resection (R0 remains the cornerstone of the treatment for HBP cancers. The current progress in the adjuvant treatment is quite slow, with gemcitabine chemotherapy available only for pancreatic ductal adenocarcinoma (PDA. In the advanced and metastatic setting, only two targeted drugs have been approved by the Food & Drug Administration (FDA, which are sorafenib for hepatocellular carcinoma and erlotinib for PDA. It is a pity that multiple Phase III randomized control trials testing the efficacy of targeted agents have negative results. Failure in the development of effective drugs probably reflects the poor understanding of genome-wide alterations and molecular mechanisms orchestrating therapeutic resistance and recurrence. In the post-ENCODE (Encyclopedia of DNA Elements era, cancer is referred to as a highly heterogeneous and systemic disease of the genome. The unprecedented potential of next-generation sequencing (NGS technologies to accurately identify genetic and genomic variations has attracted major research and clinical interest. The applications of NGS include targeted NGS with potential clinical implications, while whole-exome and whole-genome sequencing focus on the discovery of both novel cancer driver genes and therapeutic targets. These advances dictate new designs for clinical trials to validate biomarkers and drugs. This review discusses the findings of available NGS studies on HBP cancers and the limitations of genome sequencing analysis to translate genome-based biomarkers and drugs into patient care in the clinic.
Carapito, Raphael; Radosavljevic, Mirjana; Bahram, Seiamak
The human Major Histocompatibility Complex, known as the "Human Leukocyte Antigen (HLA)", could be defined as a "super locus" (historically called "supergene") governing the adaptive immune system in vertebrates. It also harbors genes involved in innate immunity. HLA is the most gene-dense, polymorphic and disease-associated region of the human genome. It is of critical medical relevance given its involvement in the fate of the transplanted organs/tissues and its association with more than 100 diseases. However, despite these important roles, comprehensive sequence analysis of the 4 megabase HLA locus has been limited due to technological challenges. Thanks to recent improvements in Next-Generation Sequencing (NGS) technologies however, one is now able to handle the peculiarities of the MHC notably the tight linkage disequilibrium between genes as well as their high degree of polymorphism (and hence heterozygosity). Increased read lengths, throughput, accuracy, as well as development of new bioinformatics tools now enable to efficiently generate complete and accurate full-length HLA haplotypes without phase ambiguities. The present report reviews current NGS approaches to capture, sequence and analyze HLA genes and loci. The impact of these new methodologies on various applications including HLA typing, population genetics and disease association studies are discussed. Copyright © 2016 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Vrancken, Bram; Trovão, Nídia Sequeira; Baele, Guy; van Wijngaerden, Eric; Vandamme, Anne-Mieke; van Laethem, Kristel; Lemey, Philippe
Genetic analyses play a central role in infectious disease research. Massively parallelized "mechanical cloning" and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™) with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure--from nucleic acid extraction to sequencing--should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.
Full Text Available Genetic analyses play a central role in infectious disease research. Massively parallelized “mechanical cloning” and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™ with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure—from nucleic acid extraction to sequencing—should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.
Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk
to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......3431G in the RNAdependent RNA polymerase NS5B. This SNV was present at 100% frequency in the 12th passage and only at 55% in the 4th passage, which could explain the difference in growth kinetics between the passages....
Piednoël, M.; Aberer, A.J.; Schneeweiss, G. M.; Macas, Jiří; Novák, Petr; Gundlach, H.; Temsch, E.M.; Renner, S.S.
Roč. 29, č. 11 (2012), s. 3601-3611 ISSN 0737-4038 Institutional research plan: CEZ:AV0Z50510513 Institutional support: RVO:60077344 Keywords : next-generation sequencing * polyploidy * genome size * Ty3/Gypsy * transposable elements Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 10.353, year: 2012
Giefing, M; Wierzbicka, M; Szyfter, K
of the discovery and functional impact of recurrent genetic lesions that are likely to influence the management of this disease in the near future. This manuscript integrates genetic data from publicly available array comparative genome hybridization (aCGH) and next-generation sequencing genetics databases...
Łopacińska-Jørgensen, Joanna M; Pedersen, Jonas Nyvold; Bak, Mads
Next-generation sequencing (NGS) has caused a revolution, yet left a gap: long-range genetic information from native, non-amplified DNA fragments is unavailable. It might be obtained by optical mapping of megabase-sized DNA molecules. Frequently only a specific genomic region is of interest, so h...
Mellmann, Alexander; Andersen, Paal Skytt; Bletz, Stefan; Friedrich, Alexander W.; Kohl, Thomas A.; Lilje, Berit; Niemann, Stefan; Prior, Karola; Rossen, John W.; Harmsen, Dag
Today, next-generation whole-genome sequencing (WGS) is increasingly used to determine the genetic relationships of bacteria on a nearly whole-genome level for infection control purposes and molecular surveillance. Here, we conducted a multicenter ring trial comprising five laboratories to determine
Advances in Next Generation Sequencing (NGS) allow for rapid development of genomics resources needed to generate molecular diagnostics assays for infectious agents. NGS approaches are particularly helpful for organisms that cannot be cultured, such as the downy mildew pathogens, a group of biotrop...
Pawlowski, Jan; Esling, Philippe; Lejzerowicz, Franck
This report presents the study of foraminiferal and metazoan benthic community based on next-generation sequencing (NGS) of environmental DNA and RNA (eDNA/RNA). The objective of this study was to test the application of NGS assays for benthic monitoring of salmon farms in Norway, in order to ove...
Talseth-Palmer, Bente A; Bauer, Denis C; Sjursen, Wenche; Evans, Tiffany J; McPhillips, Mary; Proietto, Anthony; Otton, Geoffrey; Spigelman, Allan D; Scott, Rodney J
Causative germline mutations in mismatch repair (MMR) genes can only be identified in ~50% of families with a clinical diagnosis of the inherited colorectal cancer (CRC) syndrome hereditary nonpolyposis colorectal cancer (HNPCC)/Lynch syndrome (LS). Identification of these patients are critical as they are at substantially increased risk of developing multiple primary tumors, mainly colorectal and endometrial cancer (EC), occurring at a young age. This demonstrates the need to develop new and/or more thorough mutation detection approaches. Next-generation sequencing (NGS) was used to screen 22 genes involved in the DNA MMR pathway in constitutional DNA from 14 HNPCC and 12 sporadic EC patients, plus 2 positive controls. Several softwares were used for analysis and functional annotation. We identified 5 exonic indel variants, 42 exonic nonsynonymous single-nucleotide variants (SNVs) and 1 intronic variant of significance. Three of these variants were class 5 (pathogenic) or class 4 (likely pathogenic), 5 were class 3 (uncertain clinical relevance) and 40 were classified as variants of unknown clinical significance. In conclusion, we have identified two LS families from the sporadic EC patients, one without a family history of cancer, supporting the notion for universal MMR screening of EC patients. In addition, we have detected three novel class 3 variants in EC cases. We have, in addition discovered a polygenic interaction which is the most likely cause of cancer development in a HNPCC patient that could explain previous inconsistent results reported on an intronic EXO1 variant. © 2016 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Xiao, Yuan; Yuan, Wentao; Yu, Bo; Guo, Yan; Xu, Xu; Wang, Xinqiong; Yu, Yi; Yu, Yi; Gong, Biao; Xu, Chundi
To identify causal mutations in certain genes in children with acute recurrent pancreatitis (ARP) or chronic pancreatitis (CP). After patients were enrolled (CP, 55; ARP, 14) and their clinical characteristics were investigated, we performed next-generation sequencing to detect nucleotide variations among the following 10 genes: cationic trypsinogen protease serine 1 (PRSS1), serine protease inhibitor, Kazal type 1 (SPINK1), cystic fibrosis transmembrane conductance regulator gene (CFTR), chymotrypsin C (CTRC), calcium-sensing receptor (CASR), cathepsin B (CTSB), keratin 8 (KRT8), CLAUDIN 2 (CLDN2), carboxypeptidase A1 (CPA1), and ATPase type 8B member 1 (ATP8B1). Mutations were searched against online databases to obtain information on the cause of the diseases. Certain novel mutations were analyzed using the SIFT2 and Polyphen-2 to predict the effect on protein function. There were 45 patients with CP and 10 patients with ARP who harbored 1 or more mutations in these genes; 45 patients had at least 1 mutation related to pancreatitis. Mutations were observed in the PRSS1, SPINK1, and CFTR genes in 17 patients, the CASR gene in 5 patients, and the CTSB, CTRC, and KRT8 genes in 1 patient. Mutations were not found in the CLDN, CPA1, or ATP8B1 genes. We found that mutations in SPINK1 may increase the risk of pancreatic duct stones (OR, 11.07; P = .003). The patients with CFTR mutations had a higher level of serum amylase (316.0 U/L vs 92.5 U/L; P = .026). Mutations, especially those in PRSS1, SPINK1, and CFTR, accounted for the major etiologies in Chinese children with CP or ARP. Children presenting mutations in the SPINK1 gene may have a higher risk of developing pancreatic duct stones. Copyright © 2017 Elsevier Inc. All rights reserved.
Full Text Available At least 12 genes (FH, HIF2A, MAX, NF1, RET, SDHA, SDHB, SDHC, SDHD, SDHAF2, TMEM127, and VHL have been implicated in inherited predisposition to phaeochromocytoma (PCC, paraganglioma (PGL, or head and neck paraganglioma (HNPGL and a germline mutation may be detected in more than 30% of cases. Knowledge of somatic mutations contributing to PCC/PGL/HNPGL pathogenesis has received less attention though mutations in HRAS, HIF2A, NF1, RET, and VHL have been reported. To further elucidate the role of somatic mutation in PCC/PGL/HNPGL tumourigenesis, we employed a next generation sequencing strategy to analyse “mutation hotspots” in 50 human cancer genes. Mutations were identified for HRAS (c.37G>C; p.G13R and c.182A>G; p.Q61R in 7.1% (6/85; for BRAF (c.1799T>A; p.V600E in 1.2% (1/85 of tumours; and for TP53 (c.1010G>A; p.R337H in 2.35% (2/85 of cases. Twenty-one tumours harboured mutations in inherited PCC/PGL/HNPGL genes and no HRAS, BRAF, or TP53 mutations occurred in this group. Combining our data with previous reports of HRAS mutations in PCC/PGL we find that the mean frequency of HRAS/BRAF mutations in sporadic PCC/PGL is 8.9% (24/269 and in PCC/PGL with an inherited gene mutation 0% (0/148 suggesting that HRAS/BRAF mutations and inherited PCC/PGL genes mutations might be mutually exclusive. We report the first evidence for BRAF mutations in the pathogenesis of PCC/PGL/HNPGL.
Watson-Haigh, Nathan S; Shang, Catherine A; Haimel, Matthias; Kostadima, Myrto; Loos, Remco; Deshpande, Nandan; Duesing, Konsta; Li, Xi; McGrath, Annette; McWilliam, Sean; Michnowicz, Simon; Moolhuijzen, Paula; Quenette, Steve; Revote, Jerico Nico De Leon; Tyagi, Sonika; Schneider, Maria V
The widespread adoption of high-throughput next-generation sequencing (NGS) technology among the Australian life science research community is highlighting an urgent need to up-skill biologists in tools required for handling and analysing their NGS data. There is currently a shortage of cutting-edge bioinformatics training courses in Australia as a consequence of a scarcity of skilled trainers with time and funding to develop and deliver training courses. To address this, a consortium of Australian research organizations, including Bioplatforms Australia, the Commonwealth Scientific and Industrial Research Organisation and the Australian Bioinformatics Network, have been collaborating with EMBL-EBI training team. A group of Australian bioinformaticians attended the train-the-trainer workshop to improve training skills in developing and delivering bioinformatics workshop curriculum. A 2-day NGS workshop was jointly developed to provide hands-on knowledge and understanding of typical NGS data analysis workflows. The road show-style workshop was successfully delivered at five geographically distant venues in Australia using the newly established Australian NeCTAR Research Cloud. We highlight the challenges we had to overcome at different stages from design to delivery, including the establishment of an Australian bioinformatics training network and the computing infrastructure and resource development. A virtual machine image, workshop materials and scripts for configuring a machine with workshop contents have all been made available under a Creative Commons Attribution 3.0 Unported License. This means participants continue to have convenient access to an environment they had become familiar and bioinformatics trainers are able to access and reuse these resources.
Buonuomo, Paola Sabrina; Iughetti, Lorenzo; Pisciotta, Livia; Rabacchi, Claudio; Papadia, Francesco; Bruzzi, Patrizia; Tummolo, Albina; Bartuli, Andrea; Cortese, Claudio; Bertolini, Stefano; Calandra, Sebastiano
Severe hypercholesterolemia associated or not with xanthomas in a child may suggest the diagnosis of homozygous autosomal dominant hypercholesterolemia (ADH), autosomal recessive hypercholesterolemia (ARH) or sitosterolemia, depending on the transmission of hypercholesterolemia in the patient's family. Sitosterolemia is a recessive disorder characterized by high plasma levels of cholesterol and plant sterols due to mutations in the ABCG5 or the ABCG8 gene, leading to a loss of function of the ATP-binding cassette (ABC) heterodimer transporter G5-G8. We aimed to perform the molecular characterization of two children with severe primary hypercholesterolemia. Case #1 was a 2 year-old girl with high LDL-cholesterol (690 mg/dl) and tuberous and intertriginous xanthomas. Case #2 was a 7 year-old boy with elevated LDL-C (432 mg/dl) but no xanthomas. In both cases, at least one parent had elevated LDL-cholesterol levels. For the molecular diagnosis, we applied targeted next generation sequencing (NGS), which unexpectedly revealed that both patients were compound heterozygous for nonsense mutations: Case #1 in ABCG5 gene [p.(Gln251*)/p.(Arg446*)] and Case #2 in ABCG8 gene [p.(Ser107*)/p.(Trp361*)]. Both children had extremely high serum sitosterol and campesterol levels, thus confirming the diagnosis of sisterolemia. A low-fat/low-sterol diet was promptly adopted with and without the addition of ezetimibe for Case #1 and Case #2, respectively. In both patients, serum total and LDL-cholesterol decreased dramatically in two months and progressively normalized. Targeted NGS allows the rapid diagnosis of sitosterolemia in children with severe hypercholesterolemia, even though their family history does not unequivocally suggest a recessive transmission of hypercholesterolemia. A timely diagnosis is crucial to avoid delays in treatment. Copyright © 2017 Elsevier B.V. All rights reserved.
Dunn, Joshua G; Weissman, Jonathan S
Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily
Full Text Available Background/Aims: As MCF-7 and MDA-MB-231 cells are the typical cell lines of two clinical breast tumour subtypes, the aim of the present study was to elucidate the transcriptome differences between MCF-7 and MDA-MB-231 breast cancer cell lines. Methods: The mRNA, miRNA (MicroRNA and lncRNA (Long non-coding RNA expression profiles were examined using NGS (next generation sequencing instrument Illumina HiSeq-2500. GO (Gene Ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes pathway analyses were performed to identify the biological functions of differentially expressed coding RNAs. Subsequently, we constructed an mRNA-ncRNA (non-coding RNA targeting regulatory network. Finally, we performed RT-qPCR (real-time quantitative PCR to confirm the NGS results. Results: There are sharp distinctions of the coding and non-coding RNA profiles between MCF-7 and MDA-MB-231 cell lines. Among the mRNAs and ncRNAs with the most differential expression, SLPI, SOD2, miR-7, miR-143 and miR-145 were highly expressed in MCF-7 cells, while CD55, KRT17, miR-21, miR-10b, miR-9, NEAT1 and PICSAR were over-expressed in MDA-MB-231 cells. Differentially expressed mRNAs are primarily involved in biological processes of locomotion, biological adhesion, ECM-receptor interaction pathway and focal adhesion. In the targeting regulatory network of differentially expressed RNAs, mRNAs and miRNAs are primarily associated with tumour metastasis, but the functions of lncRNAs remain uncharacterized. Conclusion: These results provide a basis for future studies of breast cancer metastasis and drug resistance.
Cheng, Huan-Chen; Liu, Sheng-Wei; Liu, Yu; Zhao, Xue-Fei; Li, Wei; Qiu, Lin; Ma, Jun
To detect the mutations of AML/MDS- related genes by using next generation sequencing (NGS), to analyze the mutation levels of each genes in the AML/MDS and the sensitivity of NGS, and to evaluate the feasibility of gene mutations for monitoring the MRD and predicating the progression of diseases. The specimens were collected from primary AML (68 cases) and MDS (57 cases) patients from August 2015 to June 2016 in the Harbin Institute of Hematology and Oncology. The mutations of 22 related genes were detected by using AML/MDS-NGS chips. TET2 gene showed the highest mutation rate in AML (55.9%) and MDS (56.1%). The gene mutations were as follows: CEBPA (11.8%), DNMT3A (7.4%), C-KIT (7.4%) and FLT3-ITD (7.4%) in AML, and U2AF1 (10.5%) and SRSF2 (10.5%) in MDS. All the genes had specific mutation sites except TP53 and CEBPA. The mutations of FLT3, C-KIT and CEBPA became negative in the 5 AML patients in remission when compared with those at primary attack, but the mutation rate of TET2 gene was not obviously changed, whereas the mutation rate of the 5 MDS patients was not significantly changed. The new gene mutations appeared in 3 MDS patients with disease progression, but the mutation rate was not changed significantly in the disease progression. The gene mutation rate still has not been changed significantly even after remission. Both AML and MDS have their own specific mutated genes and sites. Some gene mutations, such as CEBPA, can be used as an effective indicator to monitoring MRD in AML patients, but those only used for the evaluation of the disease progression and prognosis in MDS patients.
Full Text Available Although diabetes mellitus (DM causes cardiomyopathy and exacerbates heart failure, the underlying molecular mechanisms for diabetic cardiomyopathy/heart failure are poorly understood. Insulin2 mutant (Ins2+/- Akita is a mouse model of T1DM, which manifests cardiac dysfunction. However, molecular changes at cardiac transcriptome level that lead to cardiomyopathy remain unclear. To understand the molecular changes in the heart of diabetic Akita mice, we profiled cardiac transcriptome of Ins2+/- Akita and Ins2+/+ control mice using next generation sequencing (NGS and microarray, and determined the implications of differentially expressed genes on various heart failure signaling pathways using Ingenuity pathway (IPA analysis. First, we validated hyperglycemia, increased cardiac fibrosis, and cardiac dysfunction in twelve-week male diabetic Akita. Then, we analyzed the transcriptome levels in the heart. NGS analyses on Akita heart revealed 137 differentially expressed transcripts, where Bone Morphogenic Protein-10 (BMP10 was the most upregulated and hairy and enhancer of split-related (HELT was the most downregulated gene. Moreover, twelve long non-coding RNAs (lncRNAs were upregulated. The microarray analyses on Akita heart showed 351 differentially expressed transcripts, where vomeronasal-1 receptor-180 (Vmn1r180 was the most upregulated and WD Repeat Domain 83 Opposite Strand (WDR83OS was the most downregulated gene. Further, miR-101c and H19 lncRNA were upregulated but Neat1 lncRNA was downregulated in Akita heart. Eleven common genes were upregulated in Akita heart in both NGS and microarray analyses. IPA analyses revealed the role of these differentially expressed genes in key signaling pathways involved in diabetic cardiomyopathy. Our results provide a platform to initiate focused future studies by targeting these genes and/or non-coding RNAs, which are differentially expressed in Akita hearts and are involved in diabetic cardiomyopathy.
Quesada, Andrés E.; Hu, Zhihong; Routbort, Mark J.; Patel, Keyur P.; Luthra, Rajyalakshmi; Loghavi, Sanam; Zuo, Zhuang; Yin, C. Cameron; Kanagal-Shamanna, Rashmi; Wang, Sa A.; Jorgensen, Jeffrey L.; Medeiros, L. Jeffrey; Ok, Chi Young
Mixed phenotype acute leukemia (MPAL) is an uncommon manifestation of acute leukemia. The aim of this study is to further characterize the genetic landscape of de novo cases of MPAL that fulfill the 2016 World Health Organization (WHO) classification criteria for this entity. We identified 14 cases examined by next generation sequencing (NGS) using 28 (n = 10), 53 (n = 3) or 81 (n = 1) gene panels: 7 cases with a B-cell/myeloid (B/My) immunophenotype, 6 T-cell/myeloid (T/My) immunophenotype, and 1 B-cell/T-cell (B/T) immunophenotype. A total of 25 distinct mutations were identified in 15 different genes in 9/14 (64%) patients. FLT3-ITD was the only recurrent mutation in 2 patients. B/My MPAL cases less commonly harbored mutations compared with T/My MPAL cases (43% vs. 100%, p = 0.07). In contrast, B/My MPALs more commonly showed a complex karyotype compared to T/My MPALs (71% vs. 17%, p = 0.1). With NGS and karyotype combined, most (93%) MPAL cases had mutations or cytogenetic abnormalities. With a median follow-up of 12.5 months, there were no significant differences in median overall survival (OS) between patients with B/My or T/My MPAL (17.8 and 6.5 months, respectively, p = 0.81) or between patients with MPAL with versus without gene mutations (6.5 and 13.3 months, respectively, p = 0.86). Our data suggest that the distinguishing cases of MPAL according to immunophenotype has value because the underlying mechanisms of leukemogenesis might differ between B/My and T/My MPAL. PMID:29492206
Groves, Ian J; Coleman, Nicholas
Human papillomavirus (HPV) infection is associated with ∼5% of all human cancers, including a range of squamous cell carcinomas. Persistent infection by high-risk HPVs (HRHPVs) is associated with the integration of virus genomes (which are usually stably maintained as extrachromosomal episomes) into host chromosomes. Although HRHPV integration rates differ across human sites of infection, this process appears to be an important event in HPV-associated neoplastic progression, leading to deregulation of virus oncogene expression, host gene expression modulation, and further genomic instability. However, the mechanisms by which HRHPV integration occur and by which the subsequent gene expression changes take place are incompletely understood. The advent of next-generation sequencing (NGS) of both RNA and DNA has allowed powerful interrogation of the association of HRHPVs with human disease, including precise determination of the sites of integration and the genomic rearrangements at integration loci. In turn, these data have indicated that integration occurs through two main mechanisms: looping integration and direct insertion. Improved understanding of integration sites is allowing further investigation of the factors that provide a competitive advantage to some integrants during disease progression. Furthermore, advanced approaches to the generation of genome-wide samples have given novel insights into the three-dimensional interactions within the nucleus, which could act as another layer of epigenetic control of both virus and host transcription. It is hoped that further advances in NGS techniques and analysis will not only allow the examination of further unanswered questions regarding HPV infection, but also direct new approaches to treating HPV-associated human disease. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John
Marks, Michael; Fookes, Maria; Wagner, Josef; Butcher, Robert; Ghinai, Rosanna; Sokana, Oliver; Sarkodie, Yaw-Adu; Lukehart, Sheila A; Solomon, Anthony W; Mabey, David C W; Thomson, Nicholas
Abstract Background Yaws-like chronic ulcers can be caused by Treponema pallidum subspecies pertenue, Haemophilus ducreyi, or other, still-undefined bacteria. To permit accurate evaluation of yaws elimination efforts, programmatic use of molecular diagnostics is required. The accuracy and sensitivity of current tools remain unclear because our understanding of T. pallidum diversity is limited by the low number of sequenced genomes. Methods We tested samples from patients with suspected yaws collected in the Solomon Islands and Ghana. All samples were from patients whose lesions had previously tested negative using the Centers for Disease Control and Prevention (CDC) diagnostic assay in widespread use. However, some of these patients had positive serological assays for yaws on blood. We used direct whole-genome sequencing to identify T. pallidum subsp pertenue strains missed by the current assay. Results From 45 Solomon Islands and 27 Ghanaian samples, 11 were positive for T. pallidum DNA using the species-wide quantitative polymerase chain reaction (PCR) assay, from which we obtained 6 previously undetected T. pallidum subsp pertenue whole-genome sequences. These show that Solomon Islands sequences represent distinct T. pallidum subsp pertenue clades. These isolates were invisible to the CDC diagnostic PCR assay, due to sequence variation in the primer binding site. Conclusions Our data double the number of published T. pallidum subsp pertenue genomes. We show that Solomon Islands strains are undetectable by the PCR used in many studies and by health ministries. This assay is therefore not adequate for the eradication program. Next-generation genome sequence data are essential for these efforts. PMID:29045605
Full Text Available The impact of natural killer (NK cell alloreactivity on hematopoietic stem cell transplantation (HSCT outcome is still debated due to the complexity of graft parameters, HLA class I environment, the nature of killer cell immunoglobulin-like receptor (KIR/KIR ligand genetic combinations studied, and KIR+ NK cell repertoire size. KIR genes are known to be polymorphic in terms of gene content, copy number variation, and number of alleles. These allelic polymorphisms may impact both the phenotype and function of KIR+ NK cells. We, therefore, speculate that polymorphisms may alter donor KIR+ NK cell phenotype/function thus modulating post-HSCT KIR+ NK cell alloreactivity. To investigate KIR allele polymorphisms of all KIR genes, we developed a next-generation sequencing (NGS technology on a MiSeq platform. To ensure the reliability and specificity of our method, genomic DNA from well-characterized cell lines were used; high-resolution KIR typing results obtained were then compared to those previously reported. Two different bioinformatic pipelines were used allowing the attribution of sequencing reads to specific KIR genes and the assignment of KIR alleles for each KIR gene. Our results demonstrated successful long-range KIR gene amplifications of all reference samples using intergenic KIR primers. The alignment of reads to the human genome reference (hg19 using BiRD pipeline or visualization of data using Profiler software demonstrated that all KIR genes were completely sequenced with a sufficient read depth (mean 317× for all loci and a high percentage of mapping (mean 93% for all loci. Comparison of high-resolution KIR typing obtained to those published data using exome capture resulted in a reported concordance rate of 95% for centromeric and telomeric KIR genes. Overall, our results suggest that NGS can be used to investigate the broad KIR allelic polymorphism. Hence, these data improve our knowledge, not only on KIR+ NK cell alloreactivity in
Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav
Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.
Radhe Shyam Thakur
Full Text Available Advancements in the field of sequencing techniques resulted in the huge sequenced data to be produced at a very faster rate. It is going cumbersome for the datacenter to maintain the databases. Data mining and sequence analysis approaches needs to analyze the databases several times to reach any efficient conclusion. To cope with such overburden on computer resources and to reach efficient and effective conclusions quickly, the virtualization of the resources and computation on pay as you go concept was introduced and termed as cloud computing. The datacenter’s hardware and software is collectively known as cloud which when available publicly is termed as public cloud. The datacenter’s resources are provided in a virtual mode to the clients via a service provider like Amazon, Google and Joyent which charges on pay as you go manner. The workload is shifted to the provider which is maintained by the required hardware and software upgradation. The service provider manages it by upgrading the requirements in the virtual mode. Basically a virtual environment is created according to the need of the user by taking permission from datacenter via internet, the task is performed and the environment is deleted after the task is over. In this discussion, we are focusing on the basics of cloud computing, the prerequisites and overall working of clouds. Furthermore, briefly the applications of cloud computing in biological systems, especially in comparative genomics, genome informatics and SNP detection with reference to traditional workflow are discussed.
Wang, Jing; Yang, Xue; Chen, Haofeng; Wang, Xuewei; Wang, Xiangyu; Fang, Yi; Jia, Zhenyu; Gao, Jidong
RNA in formalin-fixed and paraffin-embedded (FFPE) tissues provides large amount of information indicating disease stages, histological tumor types and grades, as well as clinical outcomes. However, Detection of RNA expression levels in formalin-fixed and paraffin-embedded samples is extremely difficult due to poor RNA quality. Here we developed a high-throughput method, Reverse Transcription-Multiple Ligation-dependent Probe Sequencing (RT-MLPSeq), to determine expression levels of multiple transcripts in FFPE samples. By combining Reverse Transcription-Multiple Ligation-dependent Amplification method and next generation sequencing technology, RT-MLPSeq overcomes the limit of probe length in multiplex ligation-dependent probe amplification assay and thus could detect expression levels of transcripts without quantitative limitations. We proved that different RT-MLPSeq probes targeting on the same transcripts have highly consistent results and the starting RNA/cDNA input could be as little as 1 ng. RT-MLPSeq also presented consistent relative RNA levels of selected 13 genes with reverse transcription quantitative PCR. Finally, we demonstrated the application of the new RT-MLPSeq method by measuring the mRNA expression levels of 21 genes which can be used for accurate calculation of the breast cancer recurrence score - an index that has been widely used for managing breast cancer patients.
Jakaitiene, Audrone; Avino, Mariano; Guarracino, Mario Rosario
Against diminishing costs, next-generation sequencing (NGS) still remains expensive for studies with a large number of individuals. As cost saving, sequencing genome of pools containing multiple samples might be used. Currently, there are many software available for the detection of single-nucleotide polymorphisms (SNPs). Sensitivity and specificity depend on the model used and data analyzed, indicating that all software have space for improvement. We use beta-binomial model to detect rare mutations in untagged pooled NGS experiments. We propose a multireference framework for pooled data with ability being specific up to two patients affected by neuromuscular disorders (NMD). We assessed the results comparing with The Genome Analysis Toolkit (GATK), CRISP, SNVer, and FreeBayes. Our results show that the multireference approach applying beta-binomial model is accurate in predicting rare mutations at 0.01 fraction. Finally, we explored the concordance of mutations between the model and software, checking their involvement in any NMD-related gene. We detected seven novel SNPs, for which the functional analysis produced enriched terms related to locomotion and musculature.
Piskorz, A M; Ennis, D; Macintyre, G; Goranova, T E; Eldridge, M; Segui-Gracia, N; Valganon, M; Hoyle, A; Orange, C; Moore, L; Jimenez-Linan, M; Millan, D; McNeish, I A; Brenton, J D
Next-generation sequencing (NGS) of tumour samples is a critical component of personalised cancer treatment, but it requires high-quality DNA samples. Routine neutral-buffered formalin (NBF) fixation has detrimental effects on nucleic acids, causing low yields, as well as fragmentation and DNA base changes, leading to significant artefacts. We have carried out a detailed comparison of DNA quality from matched samples isolated from high-grade serous ovarian cancers from 16 patients fixed in methanol and NBF. These experiments use tumour fragments and mock biopsies to simulate routine practice, ensuring that results are applicable to standard clinical biopsies. Using matched snap-frozen tissue as gold standard comparator, we show that methanol-based fixation has significant benefits over NBF, with greater DNA yield, longer fragment size and more accurate copy-number calling using shallow whole-genome sequencing (WGS). These data also provide a new approach to understand and quantify artefactual effects of fixation using non-negative matrix factorisation to analyse mutational spectra from targeted and WGS data. We strongly recommend the adoption of methanol fixation for sample collection strategies in new clinical trials. This approach is immediately available, is logistically simple and can offer cheaper and more reliable mutation calling than traditional NBF fixation. © The Author 2015. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
Wang, Yan; Yang, Yao; Liu, Jing; Chen, Xiao-Chun; Liu, Xin; Wang, Chun-Zhi; He, Xi-Yu
Duchenne/Becker muscular dystrophies are the most frequent inherited neuromuscular diseases caused by mutations of the dystrophin gene. However, approximately 30% of patients with the disease do not receive a molecular diagnosis because of the complex mutational spectrum and the large size of the gene. The introduction and use of next-generation sequencing have advanced clinical genetic research and might be a suitable method for the detection of various types of mutations in the dystrophin gene. To identify the mutational spectrum using a single platform, whole dystrophin gene sequencing was performed using next-generation sequencing. The entire dystrophin gene, including all exons, introns and promoter regions, was target enriched using a DMD whole gene enrichment kit. The enrichment libraries were sequenced on an Illumina HiSeq 2000 sequencer using paired read 100 bp sequencing. We studied 26 patients: 21 had known large deletion/duplications and 5 did not have detectable large deletion/duplications by multiplex ligation-dependent probe amplification technology (MLPA). We applied whole dystrophin gene analysis by next-generation sequencing to the five patients who did not have detectable large deletion/duplications and to five randomly chosen patients from the 21 who did have large deletion/duplications. The sequencing data covered almost 100% of the exonic region of the dystrophin gene by ≥10 reads with a mean read depth of 147. Five small mutations were identified in the first five patients, of which four variants were unreported in the dmd.nl database. The deleted or duplicated exons and the breakpoints in the five large deletion/duplication patients were precisely identified. Whole dystrophin gene sequencing by next-generation sequencing may be a useful tool for the genetic diagnosis of Duchenne and Becker muscular dystrophies.
Chen, Zhao; Wang, Jun-Ling; Tang, Bei-Sha; Sun, Zhan-Fang; Shi, Yu-Ting; Shen, Lu; Lei, Li-Fang; Wei, Xiao-Ming; Xiao, Jing-Jing; Hu, Zheng-Mao; Pan, Qian; Xia, Kun; Zhang, Qing-Yan; Dai, Mei-Zhi; Liu, Yu; Ashizawa, Tetsuo; Jiang, Hong
Next-generation sequencing was used to investigate 9 rare Chinese pedigrees with rare autosomal recessive neurologic Mendelian disorders. Five probands with ataxia-telangectasia and 1 proband with chorea-acanthocytosis were analyzed by targeted gene sequencing. Whole-exome sequencing was used to investigate 3 affected individuals with Joubert syndrome, nemaline myopathy, or spastic ataxia Charlevoix-Saguenay type. A list of known and novel candidate variants was identified for each causative gene. All variants were genetically verified by Sanger sequencing or quantitative polymerase chain reaction with the strategy of disease segregation in related pedigrees and healthy controls. The advantages of using next-generation sequencing to diagnose rare autosomal recessive neurologic Mendelian disorders characterized by genetic and phenotypic heterogeneity are demonstrated. A genetic diagnostic strategy combining the use of targeted gene sequencing and whole-exome sequencing with the aid of next-generation sequencing platforms has shown great promise for improving the diagnosis of neurologic Mendelian disorders. Copyright © 2013 Elsevier Inc. All rights reserved.
Lusk Tina S
have been influenced by the enrichment process. This study is the first to define Latin-style cheese microflora using Next-Generation Sequencing. These valuable preliminary data will direct selective tailoring of agar formulations to improve culture-based detection of pathogens in Latin-style cheese.
Full Text Available Emerging evidence has demonstrated that miRNA sequences can regulate skeletal myogenesis by controlling the process of myoblast proliferation and differentiation. However, at present a deep analysis of miRNA expression in control and FSHD myoblasts during differentiation has not yet been derived. To close this gap, we used a next-generation sequencing (NGS approach applied to in vitro myogenesis. Furthermore, to minimize sample genetic heterogeneity and muscle-type specific patterns of gene expression, miRNA profiling from NGS data was filtered with FC ≥ 4 (log(2FC ≥ 2 and p-value<0.05, and its validation was derived by qRT-PCR on myoblasts from seven muscle districts. In particular, control myogenesis showed the modulation of 38 miRNAs, the majority of which (34 out 38 were up-regulated, including myomiRs (miR-1, -133a, -133b and -206. Approximately one third of the modulated miRNAs were not previously reported to be involved in muscle differentiation, and interestingly some of these (i.e. miR-874, -1290, -95 and -146a were previously shown to regulate cell proliferation and differentiation. FSHD myogenesis evidenced a reduced number of modulated miRNAs than healthy muscle cells. The two processes shared nine miRNAs, including myomiRs, although with FC values lower in FSHD than in control cells. In addition, FSHD cells showed the modulation of six miRNAs (miR-1268, -1268b, -1908, 4258, -4508- and -4516 not evidenced in control cells and that therefore could be considered FSHD-specific, likewise three novel miRNAs that seem to be specifically expressed in FSHD myotubes. These data further clarify the impact of miRNA regulation during control myogenesis and strongly suggest that a complex dysregulation of miRNA expression characterizes FSHD, impairing two important features of myogenesis: cell cycle and muscle development. The derived miRNA profiling could represent a novel molecular signature for FSHD that includes diagnostic biomarkers and
Bettoni, Fabiana; Koyama, Fernanda Christtanini; de Avelar Carpinetti, Paola; Galante, Pedro Alexandre Favoretto; Camargo, Anamaria Aranha; Asprino, Paula Fontes
Next generation sequencing (NGS) has become an informative tool to guide cancer treatment and conduce a personalized approach in oncology. The biopsy collected for pathologic analysis is usually stored as formalin-fixed paraffin-embedded (FFPE) blocks and then availed for molecular diagnostic, resulting in DNA molecules that are invariably fragmented and chemically modified. In an attempt to improve NGS based diagnostics in oncology we developed a straightforward DNA integrity assessment assay based on qPCR, defining clear parameters to whether NGS sequencing results is accurate or when it should be analyzed with caution. We performed DNA extraction from 12 tumor samples from diverse tissues and accessed DNA integrity by straightforward qPCR assays. In order to perform a cancer panel NGS sequencing, DNA library preparation was performed using RNA capture baits. Reads were aligned to the reference human genome and mutation calls were further validated by Sanger sequencing. Results obtained by the DNA integrity assays correlated to the efficiency of the pre-capture library preparation in up to 0.94 (Pearson's test). Moreover, sequencing results showed that poor integrity DNA leads to high rates of false positive mutation calls, specially C:G>T:A and C:G>A:T. Poor quality FFPE DNA samples are prone to generating false positive mutation calls. These are especially perilous in cases in which subclonal populations are expected, such as in advance disease, since it could lead clinicians to erroneous conclusions and equivocated conduct. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Jensen, Taylor J; Dzakula, Zeljko; Deciu, Cosmin; van den Boom, Dirk; Ehrich, Mathias
Efforts have been undertaken recently to assess the fetal genome through analysis of circulating cell-free (ccf) fetal DNA obtained from maternal plasma. Sequencing analysis of such ccf DNA has been shown to enable accurate prenatal detection of fetal aneuploidies, including trisomies of chromosomes 21, 18, and 13. We sought to extend these analyses to examine subchromosomal copy number variants through the sequencing of ccf DNA. We examined a clinically relevant genomic region, chromosome 22q11.2, the location of a series of well-characterized deletion anomalies that cause 22q11.2 deletion syndrome. We sequenced ccf DNA isolated from maternal plasma samples obtained from 2 patients with confirmed 22q11.2 deletion syndrome and from 14 women at low risk for fetal chromosomal abnormalities. The latter samples were used as controls, and the mean genomic coverage was 3.83-fold. Data were aligned to the human genome, repetitive regions were removed, the remaining data were normalized for GC content, and z scores were calculated for the affected region. The median fetal DNA contribution for all samples was 18%, with the affected samples containing 17%-18% fetal DNA. Using a technique similar to that used for sequencing-based fetal aneuploidy detection from maternal plasma, we detected a statistically significant loss of representation of a portion of chromosome 22q11.2 in both of the affected fetal samples. No such loss was detected in any of the control samples. Noninvasive prenatal diagnosis of subchromosomal fetal genomic anomalies is feasible with next-generation sequencing.
Weisschuh, Nicole; Mayer, Anja K; Strom, Tim M
Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencin...
McPherson, Hannah; van der Merwe, Marlien; Delaney, Sven K; Edwards, Mark A; Henry, Robert J; McIntosh, Emma; Rymer, Paul D; Milner, Melita L; Siow, Juelian; Rossetto, Maurizio
Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ...
Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K
Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
BoonFei eTan; Charmaine Marie Ng; Jean Pierre Nshimyimana; Jean Pierre Nshimyimana; Lay-Leng eLoh; Lay-Leng eLoh; Karina Yew-Hoong Gin; Janelle Renee Thompson; Janelle Renee Thompson
Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS) technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU) rRNA hypervariable reg...
Quaynor, SD; Bosley, ME; Duckworth, CG; Porter, KR; Kim, S-H; Kim, H-G; Chorich, LP; Sullivan, ME; Choi, J-H; Cameron, RS; Layman, LC
The genetic basis is unknown for ∼60% of normosmic hypogonadotropic hypogonadism (nHH)/Kallmann syndrome (KS). DNAs from (17 male and 31 female) nHH/KS patients were analyzed by targeted next generation sequencing (NGS) of 261 genes involved in hypothalamic, pituitary, and/or olfactory pathways, or suggested by chromosome rearrangements. Selected variants were subjected to Sanger DNA sequencing, the gold standard. The frequency of Sanger-confirmed variants was determined using the ExAC databa...
Christopher H Stuart
Full Text Available Breast cancer (BC results in ≃40,000 deaths each year in the United States and even among survivors treatment of the disease may have devastating consequences, including increased risk for heart disease and cognitive impairment resulting from the toxic effects of chemotherapy. Aptamer-mediated drug delivery can contribute to improved treatment outcomes through the selective delivery of chemotherapy to BC cells, provided suitable cancer-specific antigens can be identified. We report here the use of capillary electrophoresis in conjunction with next generation sequencing to develop the first vitronectin (VN binding aptamer (VBA-01; Kd 405 nmol/l, the first aptamer to vitronectin (VN; Kd = 405 nmol/l, a protein that plays an important role in wound healing and that is present at elevated levels in BC tissue and in the blood of BC patients relative to the corresponding nonmalignant tissues. We used VBA-01 to develop DVBA-01, a dimeric aptamer complex, and conjugated doxorubicin (Dox to DVBA-01 (7:1 ratio using pH-sensitive, covalent linkages. Dox conjugation enhanced the thermal stability of the complex (60.2 versus 46.5°C and did not decrease affinity for the VN target. The resulting DVBA-01-Dox complex displayed increased cytotoxicity to MDA-MB-231 BC cells that were cultured on plasticware coated with VN (1.8 × 10−6mol/l relative to uncoated plates (2.4 × 10−6 mol/l, or plates coated with the related protein fibronectin (2.1 × 10−6 mol/l. The VBA-01 aptamer was evaluated for binding to human BC tissue using immunohistochemistry and displayed tissue specific binding and apparent association with BC cells. In contrast, a monoclonal antibody that preferentially binds to multimeric VN primarily stained extracellular matrix and vessel walls of BC tissue. Our results indicate a strong potential for using VN-targeting aptamers to improve drug delivery to treat BC.
Lurier, Emily B; Dalton, Donald; Dampier, Will; Raman, Pichai; Nassiri, Sina; Ferraro, Nicole M; Rajagopalan, Ramakrishan; Sarmady, Mahdi; Spiller, Kara L
Alternatively activated "M2" macrophages are believed to function during late stages of wound healing, behaving in an anti-inflammatory manner to mediate the resolution of the pro-inflammatory response caused by "M1" macrophages. However, the differences between two main subtypes of M2 macrophages, namely interleukin-4 (IL-4)-stimulated "M2a" macrophages and IL-10-stimulated "M2c" macrophages, are not well understood. M2a macrophages are characterized by their ability to inhibit inflammation and contribute to the stabilization of angiogenesis. However, the role and temporal profile of M2c macrophages in wound healing are not known. Therefore, we performed next generation sequencing (RNA-seq) to identify biological functions and gene expression signatures of macrophages polarized in vitro with IL-10 to the M2c phenotype in comparison to M1 and M2a macrophages and an unactivated control (M0). We then explored the expression of these gene signatures in a publicly available data set of human wound healing. RNA-seq analysis showed that hundreds of genes were upregulated in M2c macrophages compared to the M0 control, with thousands of alternative splicing events. Following validation by Nanostring, 39 genes were found to be upregulated by M2c macrophages compared to the M0 control, and 17 genes were significantly upregulated relative to the M0, M1, and M2a phenotypes (using an adjusted p-value cutoff of 0.05 and fold change cutoff of 1.5). Many of the identified M2c-specific genes are associated with angiogenesis, matrix remodeling, and phagocytosis, including CD163, MMP8, TIMP1, VCAN, SERPINA1, MARCO, PLOD2, PCOCLE2 and F5. Analysis of the macrophage-conditioned media for secretion of matrix-remodeling proteins showed that M2c macrophages secreted higher levels of MMP7, MMP8, and TIMP1 compared to the other phenotypes. Interestingly, temporal gene expression analysis of a publicly available microarray data set of human wound healing showed that M2c-related genes were
Full Text Available Penile cancer (PeCa is a relatively rare tumor entity but possesses higher morbidity and mortality rates especially in developing countries. To date, the concrete pathogenic signaling pathways and core machineries involved in tumorigenesis and progression of PeCa remain to be elucidated. Several studies suggested miRNAs, which modulate gene expression at posttranscriptional level, were frequently mis-regulated and aberrantly expressed in human cancers. However, the miRNA profile in human PeCa has not been reported before. In this present study, the miRNA profile was obtained from 10 fresh penile cancerous tissues and matched adjacent non-cancerous tissues via next-generation sequencing. As a result, a total of 751 and 806 annotated miRNAs were identified in normal and cancerous penile tissues, respectively. Among which, 56 miRNAs with significantly different expression levels between paired tissues were identified. Subsequently, several annotated miRNAs were selected randomly and validated using quantitative real-time PCR. Compared with the previous publications regarding to the altered miRNAs expression in various cancers and especially genitourinary (prostate, bladder, kidney, testis cancers, the most majority of deregulated miRNAs showed the similar expression pattern in penile cancer. Moreover, the bioinformatics analyses suggested that the putative target genes of differentially expressed miRNAs between cancerous and matched normal penile tissues were tightly associated with cell junction, proliferation, growth as well as genomic instability and so on, by modulating Wnt, MAPK, p53, PI3K-Akt, Notch and TGF-β signaling pathways, which were all well-established to participate in cancer initiation and progression. Our work presents a global view of the differentially expressed miRNAs and potentially regulatory networks of their target genes for clarifying the pathogenic transformation of normal penis to PeCa, which research resource also
Maji, Ranjan Kumar; Sarkar, Arijita; Khatua, Sunirmal; Dasgupta, Subhasis; Ghosh, Zhumur
High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an
Xue, J J; Xue, J F; Xue, H Q; Guo, Y Y; Liu, Y; Ouyang, N
Albinism is a diverse group of hypopigmentary disorders caused by multiple-genetic defects. The genetic diagnosis of patients affected with albinism by Sanger sequencing is often complex, expensive, and time-consuming. In this study, we performed targeted next-generation sequencing to screen for 16 genes in a patient with albinism, and identified 21 genetic variants, including 19 known single nucleotide polymorphisms, one novel missense mutation (c.1456 G>A), and one disease-causing mutation (c.478 G>C). The novel mutation was not observed in 100 controls, and was predicted to be a damaging mutation by SIFT and Polyphen. Thus, we identified a novel mutation in SLC45A2 in a Chinese family, expanding the mutational spectrum of albinism. Our results also demonstrate that targeted next-generation sequencing is an effective genetic test for albinism.
Vanni, Irene; Coco, Simona; Truini, Anna; Rusmini, Marta; Dal Bello, Maria Giovanna; Alama, Angela; Banelli, Barbara; Mora, Marco; Rijavec, Erika; Barletta, Giulia; Genova, Carlo; Biello, Federica; Maggioni, Claudia; Grossi, Francesco
Next-generation sequencing (NGS) is a cost-effective technology capable of screening several genes simultaneously; however, its application in a clinical context requires an established workflow to acquire reliable sequencing results. Here, we report an optimized NGS workflow analyzing 22 lung cancer-related genes to sequence critical samples such as DNA from formalin-fixed paraffin-embedded (FFPE) blocks and circulating free DNA (cfDNA). Snap frozen and matched FFPE gDNA from 12 non-small cell lung cancer (NSCLC) patients, whose gDNA fragmentation status was previously evaluated using a multiplex PCR-based quality control, were successfully sequenced with Ion Torrent PGM™. The robust bioinformatic pipeline allowed us to correctly call both Single Nucleotide Variants (SNVs) and indels with a detection limit of 5%, achieving 100% specificity and 96% sensitivity. This workflow was also validated in 13 FFPE NSCLC biopsies. Furthermore, a specific protocol for low input gDNA capable of producing good sequencing data with high coverage, high uniformity, and a low error rate was also optimized. In conclusion, we demonstrate the feasibility of obtaining gDNA from FFPE samples suitable for NGS by performing appropriate quality controls. The optimized workflow, capable of screening low input gDNA, highlights NGS as a potential tool in the detection, disease monitoring, and treatment of NSCLC.
Liu, Jing; Wang, Hua; Xi, Hui; Jia, Zhengjun; Zhou, Yuchun; Wu, Lingqian
To explore the value of next-generation sequencing for the non-invasive prenatal testing of fetal chromosomal aneuploidies. Plasma from 4004 women with singleton pregnancy at a gestational age between 12-35(+5) weeks was collected prior to amniocentesis between April 19th 2011 and December 31st 2013. The samples were divided into three groups: (1) High risk for Down syndrome by biochemical screening; (2) Advanced maternal age; (3) Abnormalities by ultrasound or other methods. Plasma DNA extracted from above samples was sequenced at low coverage. Positive results were verified against the karyotypes of the fetuses. For those with negative results, the fetuses were followed up by telephone call for at least six months after birth. Among 4003 samples subjected to non-invasive prenatal diagnosis, 66 (1.65%) had a positive result. In group 1, 22 cases of trisomy 21 (T21), 3 cases of trisomy 18 (T18), 1 case of 13 trisomy (T13), 8 cases of 45,X and 2 cases of other chromosomal abnormality were detected. In group 2, 13 cases of T21, 2 cases of T18, 1 case of T13, 5 cases of 45,X, 2 cases of 47,XXN and 1 case of other chromosomal abnormality were detected. In group 3, 1 case of T21, 1 case of T18, 1 case of T13, and 3 cases of 47,XXN were detected. For 55 samples underwent prenatal diagnosis, 30 cases of T21 and 4 cases of T18 were discovered, which was consistent with the results of non-invasive prenatal diagnosis. For the 13 cases indicated as 45,X, 3 were verified by karyotype analysis, 2 were verified as mosaicism (45,X/46,XN), 8 were 46,XN (false positives). For the 5 cases indicated as 47,XXN, 2 were verified by karyotype analysis, the other 3 were 46,XN (false positives). Karyotypes of 3 cases suspected for other chromosomal abnormalities were all verified as 46,XN (false positive). Until May 1st 2014, telephone follow-up for those with negative screening results only identified a boy with facial abnormalities and developmental delay, which was similar to his older
Matthew C Hiemenz
Full Text Available Next-generation sequencing (NGS is a powerful platform for identifying cancer mutations. Routine clinical adoption of NGS requires optimized quality control metrics to ensure accurate results. To assess the robustness of our clinical NGS pipeline, we analyzed the results of 304 solid tumor and hematologic malignancy specimens tested simultaneously by NGS and one or more targeted single-gene tests (EGFR, KRAS, BRAF, NPM1, FLT3, and JAK2. For samples that passed our validated tumor percentage and DNA quality and quantity thresholds, there was perfect concordance between NGS and targeted single-gene tests with the exception of two FLT3 internal tandem duplications that fell below the stringent pre-established reporting threshold but were readily detected by manual inspection. In addition, NGS identified clinically significant mutations not covered by single-gene tests. These findings confirm NGS as a reliable platform for routine clinical use when appropriate quality control metrics, such as tumor percentage and DNA quality cutoffs, are in place. Based on our findings, we suggest a simple workflow that should facilitate adoption of clinical oncologic NGS services at other institutions.
Shang, Xuan; Peng, Zhiyu; Ye, Yuhua; Asan; Zhang, Xinhua; Chen, Yan; Zhu, Baosheng; Cai, Wangwei; Chen, Shaoke; Cai, Ren; Guo, Xiaoling; Zhang, Chonglin; Zhou, Yuqiu; Huang, Shuodan; Liu, Yanhui; Chen, Biyan; Yan, Shanhuo; Chen, Yajun; Ding, Hongmei; Yin, Xiaolin; Wu, Liusong; He, Jing; Huang, Dongai; He, Sheng; Yan, Tizhen; Fan, Xin; Zhou, Yuehong; Wei, Xiaofeng; Zhao, Sumin; Cai, Decheng; Guo, Fengyu; Zhang, Qianqian; Li, Yun; Zhang, Xuelian; Lu, Haorong; Huang, Huajie; Guo, Junfu; Zhu, Fei; Yuan, Yuan; Zhang, Li; Liu, Na; Li, Zhiming; Jiang, Hui; Zhang, Qiang; Zhang, Yijia; Juhari, Wan Khairunnisa Wan; Hanafi, Sarifah; Zhou, Wanjun; Xiong, Fu; Yang, Huanming; Wang, Jian; Zilfalil, Bin Alwi; Qi, Ming; Yang, Yaping; Yin, Ye; Mao, Mao; Xu, Xiangmin
Hemoglobinopathies are among the most common autosomal-recessive disorders worldwide. A comprehensive next-generation sequencing (NGS) test would greatly facilitate screening and diagnosis of these disorders. An NGS panel targeting the coding regions of hemoglobin genes and four modifier genes was designed. We validated the assay by using 2522 subjects affected with hemoglobinopathies and applied it to carrier testing in a cohort of 10,111 couples who were also screened through traditional methods. In the clinical genotyping analysis of 1182 β-thalassemia subjects, we identified a group of additional variants that can be used for accurate diagnosis. In the molecular screening analysis of the 10,111 couples, we detected 4180 individuals in total who carried 4840 mutant alleles, and identified 186 couples at risk of having affected offspring. 12.1% of the pathogenic or likely pathogenic variants identified by our NGS assay, which were undetectable by traditional methods. Compared with the traditional methods, our assay identified an additional at-risk 35 couples. We describe a comprehensive NGS-based test that offers advantages over the traditional screening/molecular testing methods. To our knowledge, this is among the first large-scale population study to systematically evaluate the application of an NGS technique in carrier screening and molecular diagnosis of hemoglobinopathies. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Hiemenz, Matthew C; Kadauke, Stephan; Lieberman, David B; Roth, David B; Zhao, Jianhua; Watt, Christopher D; Daber, Robert D; Morrissette, Jennifer J D
Next-generation sequencing (NGS) is a powerful platform for identifying cancer mutations. Routine clinical adoption of NGS requires optimized quality control metrics to ensure accurate results. To assess the robustness of our clinical NGS pipeline, we analyzed the results of 304 solid tumor and hematologic malignancy specimens tested simultaneously by NGS and one or more targeted single-gene tests (EGFR, KRAS, BRAF, NPM1, FLT3, and JAK2). For samples that passed our validated tumor percentage and DNA quality and quantity thresholds, there was perfect concordance between NGS and targeted single-gene tests with the exception of two FLT3 internal tandem duplications that fell below the stringent pre-established reporting threshold but were readily detected by manual inspection. In addition, NGS identified clinically significant mutations not covered by single-gene tests. These findings confirm NGS as a reliable platform for routine clinical use when appropriate quality control metrics, such as tumor percentage and DNA quality cutoffs, are in place. Based on our findings, we suggest a simple workflow that should facilitate adoption of clinical oncologic NGS services at other institutions.
Full Text Available Hemoglobinopathies are among the most common autosomal-recessive disorders worldwide. A comprehensive next-generation sequencing (NGS test would greatly facilitate screening and diagnosis of these disorders. An NGS panel targeting the coding regions of hemoglobin genes and four modifier genes was designed. We validated the assay by using 2522 subjects affected with hemoglobinopathies and applied it to carrier testing in a cohort of 10,111 couples who were also screened through traditional methods. In the clinical genotyping analysis of 1182 β-thalassemia subjects, we identified a group of additional variants that can be used for accurate diagnosis. In the molecular screening analysis of the 10,111 couples, we detected 4180 individuals in total who carried 4840 mutant alleles, and identified 186 couples at risk of having affected offspring. 12.1% of the pathogenic or likely pathogenic variants identified by our NGS assay, which were undetectable by traditional methods. Compared with the traditional methods, our assay identified an additional at-risk 35 couples. We describe a comprehensive NGS-based test that offers advantages over the traditional screening/molecular testing methods. To our knowledge, this is among the first large-scale population study to systematically evaluate the application of an NGS technique in carrier screening and molecular diagnosis of hemoglobinopathies.
Dario de Biase
Full Text Available The use of endoscopic ultrasonography has allowed for improved detection and pathologic analysis of fine needle aspirate material for pancreatic lesion diagnosis. The molecular analysis of KRAS has further improved the clinical sensitivity of preoperative analysis. For this reason, the use of highly analytical sensitive and specific molecular tests in the analysis of material from fine needle aspirate specimens has become of great importance. In the present study, 60 specimens from endoscopic ultrasonography fine needle aspirate were analyzed for KRAS exon 2 and exon 3 mutations, using three different techniques: Sanger sequencing, allele specific locked nucleic acid PCR and Next Generation sequencing (454 GS-Junior, Roche. Moreover, KRAS was also tested in wild-type samples, starting from DNA obtained from cytological smears after pathological evaluation. Sanger sequencing showed a clinical sensitivity for the detection of the KRAS mutation of 42.1%, allele specific locked nucleic acid of 52.8% and Next Generation of 73.7%. In two wild-type cases the re-sequencing starting from selected material allowed to detect a KRAS mutation, increasing the clinical sensitivity of next generation sequencing to 78.95%. The present study demonstrated that the performance of molecular analysis could be improved by using highly analytical sensitive techniques. The Next Generation Sequencing allowed to increase the clinical sensitivity of the test without decreasing the specificity of the analysis. Moreover we observed that it could be useful to repeat the analysis starting from selectable material, such as cytological smears to avoid false negative results.
Boga, Hamadi Iddi; Anami, Sylvester Elikana; Mehari, Tadesse; Budambula, Nancy L. M.
Human pathogens can survive and grow in hot springs. For water quality assessment, Escherichia coli or Enterococci are the main thermotolerant enteric bacteria commonly used to estimate the load of pathogenic bacteria in water. However, most of the environmental bacteria are unculturable thus culture methods may cause bias in detection of most pathogens. Illumina sequencing can provide a more comprehensive and accurate insight into environmental bacterial pathogens, which can be used to develop better risk assessment methods and promote public health awareness. In this study, high-throughput Illumina sequencing was used to identify bacterial pathogens from five hot springs; Maiwooi, Akwar, Garbanabra, Elegedi and Gelti, in Eritrea. Water samples were collected from the five hot springs. Total community DNA was extracted from samples using the phenol-chloroform method. The 16S rRNA gene variable region (V4—V7) of the extracted DNA was amplified and library construction done according to Illumina sequencing protocol. The sequence reads (length >200 bp) from Illumina sequencing libraries ranged from 22,091 sequences in the wet sediment sample from Garbanabra to 155,789 sequences in the mat sample from Elegedi. Taxonomy was assigned to each OTU using BLASTn against a curated database derived from GreenGenes, RDPII, SILVA SSU Reference 119 and NCBI. The proportion of potential pathogens from the water samples was highest in Maiwooi (17.8%), followed by Gelti (16.7%), Akwar (13.6%) and Garbanabra (10.9%). Although the numbers of DNA sequence reads from Illumina sequencing were very high for the Elegedi (104,328), corresponding proportion of potential pathogens very low (3.6%). Most of the potential pathogenic bacterial sequences identified were from Proteobacteria and Firmicutes. Legionella and Clostridium were the most common detected genera with different species. Most of the potential pathogens were detected from the water samples. However, sequences belonging to
Liu, Guodong; Li, Zhihua; Lin, Yuefeng; John, Bino
We developed NameMyGene, a web tool and a stand alone program to easily generate putative family-based names for small RNA sequences so that laboratories can easily organize, analyze, and observe patterns from, the massive amount of data generated by next-generation sequencers. NameMyGene, also applicable to other emerging methods such as RNA-Seq, and Chip-Seq, solely uses the input small RNA sequence and does not require any additional data such as other sequence data sets. The web server an...
Ampofo, Krow; Pavia, Andrew; Blaschke, Anne J; Schlaberg, Robert
Abstract Background Species-specific polymerase chain reaction (PCR) testing of pleural fluid (PF) from children with parapneumonic effusion (PPE) has increased pathogen identification in pediatric PPE. However, a pathogen is not detected in 25–35% of cases. Hypothesis-free, next-generation sequencing (NGS) provides a more comprehensive alternative and has led to pathogen detection in PCR-negative samples. However, the utility of NGS in the evaluation of PF from children with PPE is unknown. Methods Archived PF (n = 20) from children younger than 18 years with PPE and hospitalized at Primary Children’s Hospital, Utah, in 2015 and previously tested by PCR were evaluated. Ten PCR-negative and 10 PCR-positive PF specimens were tested using RNA-seq at an average depth of 7.7×106 sequencing reads per sample. NGS data were analyzed with Taxonomer. We compared pathogens detected by blood and PF culture, PCR, and NGS. Results Overall, compared with blood/PF culture, PF PCR and PF NGS testing of PF increased bacterial identification from 15% to 50% (P < 0.05) and 65% (P = 0.003), respectively. Pathogen detection in PF by PCR and NGS were comparable (50 vs. 65%, p = NS) (Table). However, compared with PF PCR, NGS significantly increased detection of S. pyogenes (20% vs. 55%; P < 0.05), with 100% concordance when detected by PCR and culture. Detection of Fusobacterium spp. (10 vs. 10%) by PF NGS and PF PCR were comparable. In contrast, there was no detection of S. pneumoniae (15 vs. 0%) by PF NGS compared with PF PCR. Conclusion PF NGS testing significantly improves bacterial identification and comparable to PF PCR testing, which can help inform antimicrobial selection. However there were differences in detection of S. pneumoniae and S. pyogenes. Further studies of NGS testing of PF of children with PPE are needed to assess its potential in the evaluation of PPE in children. Positive by culturea and PCR (n = 10) Negative by culturea and PCR (n = 10
Full Text Available Microdeletions at exon 19 are the most frequent genetic alterations affecting the Epidermal Growth Factor Receptor (EGFR gene in non-small cell lung cancer (NSCLC and they are strongly associated with response to treatment with tyrosine kinase inhibitors. A series of 116 NSCLC DNA samples investigated by Sanger Sequencing (SS, including 106 samples carrying exon 19 EGFR deletions and 10 without deletions (control samples, were subjected to deep next generation sequencing (NGS. All samples with deletions at SS showed deletions with NGS. No deletions were seen in control cases. In 93 (88% cases, deletions detected by NGS were exactly corresponding to those identified by SS. In 13 cases (12% NGS resolved deletions not accurately characterized by SS. In 21 (20% cases the NGS showed presence of complex (double/multiple frameshift deletions producing a net in-frame change. In 5 of these cases the SS could not define the exact sequence of mutant alleles, in the other 16 cases the results obtained by SS were conventionally considered as deletions plus insertions. Different interpretative hypotheses for complex mutations are discussed. In 46 (43% tumors deep NGS showed, for the first time to our knowledge, subpopulations of DNA molecules carrying EGFR deletions different from the main one. Each of these subpopulations accounted for 0.1% to 17% of the genomic DNA in the different tumors investigated. Our findings suggest that a region in exon 19 is highly unstable in a large proportion of patients carrying EGFR deletions. As a corollary to this study, NGS data were compared with those obtained by immunohistochemistry using the 6B6 anti-mutant EGFR antibody. The immunoreaction was E746-A750del specific. In conclusion, NGS analysis of EGFR exon 19 in NSCLCs allowed us to formulate a new interpretative hypothesis for complex mutations and revealed the presence of subpopulations of deletions with potential pathogenetic and clinical impact.
Full Text Available Purpose: To show early, rapid and accurate molecular diagnosis of occult macular dystrophy (OMD in a four-generation Chinese family with inherited macular dystrophy.Methods: In the current study, we comprehensively screened 130 genes involved in common inherited non-syndromic eye diseases with next-generation sequencing-based target capture sequencing of the proband of a four-generation Chinese family that has suffered from maculopathy without a definitive diagnosis for over 10 years. Variants were filtered and analyzed to identify possible disease-causing variants before validation by Sanger sequencing.Results: Two heterozygous mutations—RP1L1 c.133 C > T (p.Arg45Trp, which is a hot spot for OMD, and ABCA4 c.6119 G > A (p.Arg2040Gln, which was identified in Stargardt’s disease were found in three patients, but neither of the mutations was found in the unaffected individuals in the same family, who are phenotypically normal or in the normal control volunteers.Conclusion: These results cannot only confirm the diagnosis of OMD in the proband, but also provide presymptomatic diagnosis of the proband’s children before the onset of visual acuity impairment and guidance regarding the prognosis and management of these patients. Heterozygous mutations of RP1L1 c.133 C > T (p.Arg45Trp and ABCA4 c.6119 G > A (p.Arg2040Gln are likely responsible for OMD. Our results further extend our current understanding of the genetic basis of OMD, and emphasize the importance of molecular diagnosis and genetic counseling for OMD.
Sheema Abdul Aziz
Full Text Available There is an urgent need to identify and understand the ecosystem services of pollination and seed dispersal provided by threatened mammals such as flying foxes. The first step towards this is to obtain comprehensive data on their diet. However, the volant and nocturnal nature of bats presents a particularly challenging situation, and conventional microhistological approaches to studying their diet can be laborious and time-consuming, and provide incomplete information. We used Illumina Next-Generation Sequencing (NGS as a novel, non-invasive method for analysing the diet of the island flying fox (Pteropus hypomelanus on Tioman Island, Peninsular Malaysia. Through DNA metabarcoding of plants in flying fox droppings, using primers targeting the rbcL gene, we identified at least 29 Operationally Taxonomic Units (OTUs comprising the diet of this giant pteropodid. OTU sequences matched at least four genera and 14 plant families from online reference databases based on a conservative Least Common Ancestor approach, and eight species from our site-specific plant reference collection. NGS was just as successful as conventional microhistological analysis in detecting plant taxa from droppings, but also uncovered six additional plant taxa. The island flying fox’s diet appeared to be dominated by figs (Ficus sp., which was the most abundant plant taxon detected in the droppings every single month. Our study has shown that NGS can add value to the conventional microhistological approach in identifying food plant species from flying fox droppings. At this point in time, more accurate genus- and species-level identification of OTUs not only requires support from databases with more representative sequences of relevant plant DNA, but probably necessitates in situ collection of plant specimens to create a reference collection. Although this method cannot be used to quantify true abundance or proportion of plant species, nor plant parts consumed, it ultimately
Quail Michael A
Full Text Available Abstract Background Next generation sequencing (NGS technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Results Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. Conclusions All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Niba, Emma Tabe Eko; Tran, Van Khanh; Tuan-Pham, Le Anh; Vu, Dung Chi; Nguyen, Ngoc Khanh; Ta, Van Thanh; Tran, Thinh Huy; Lee, Tomoko; Takeshima, Yasuhiro; Matsuo, Masafumi
Duchenne muscular dystrophy (DMD) is the most common inherited muscular disease and caused by mutations in the DMD gene on the X-chromosome. Multiplex ligation-dependent probe amplification (MLPA) is recognized as a convenient and reliable technique to detect exon deletion/duplication mutations in the DMD gene. Here, we applied targeted semi-conductor next-generation sequencing to clarify the cause of ambiguous MLPA results. Targeted semi-conductor next-generation sequencing was carried out using the Inherited Disease Panel (IDP) on the Ion Torrent Personal Genome Machine (PGM). MLPA analysis disclosed unclassifiable relative peak ratio of exon 18 in a DMD boy. His female cousin was indicated to have exon 18 deletion in one allele. To validate these incompatible results, targeted next-generation sequencing was conducted. A nucleotide change, C.2227 C>T creating a premature stop codon, was in exon 18. Concomitantly, both C and T nucleotides were identified in his cousin's genome. Ambiguous values of the relative peak ratio in MLPA were considered due to the one nucleotide mismatch between the genomic sequence and the probe used in MLPA. Analysis using IDP on PGM disclosed a nonsense mutation in the DMD gene as a cause of ambiguous results of MLPA. Copyright © 2014 Elsevier B.V. All rights reserved.
Xu, Yijuan; Thomsen, Trine Rolighed; Lorenzen, Jan
implant-related infection is believed to be linked to pedicle screw loosening after spine surgery. Low-grade bacterial infection can be hard to diagnose and may be undetected by conventional culture based methods. Next generation sequencing (NGS) could help to uncover hidden bacterial infections...... (v.1.20).” Results: “Clinically there were no signs of local or general infection. Serum parameters were normal (C-reactive protein 0.7 mg/L, WBC 6.2 Gpt/L) at revision surgery. No other infectious foci were noticed. Histology showed no signs of infection. Routine microbial culturing was negative......Title: Use of next generation sequencing to detect biofilm bacteria in a patient with pedicle screw loosening after spine surgery: a case report Yijuan Xu1, Trine Rolighed Thomsen1,2, Jan Lorenzen1, Kathrin Chamaon3, Per Trobisch4, Steffen Drange3 1. Danish Technological Institute, Aarhus, Denmark...
Hansen, Nancy F; Gartner, Jared J; Mei, Lan; Samuels, Yardena; Mullikin, James C
Extensive DNA sequencing of tumor and matched normal samples using exome and whole-genome sequencing technologies has enabled the discovery of recurrent genetic alterations in cancer cells, but variability in stromal contamination and subclonal heterogeneity still present a severe challenge to available detection algorithms. Here, we describe publicly available software, Shimmer, which accurately detects somatic single-nucleotide variants using statistical hypothesis testing with multiple testing correction. This program produces somatic single-nucleotide variant predictions with significantly higher sensitivity and accuracy than other available software when run on highly contaminated or heterogeneous samples, and it gives comparable sensitivity and accuracy when run on samples of high purity. http://www.github.com/nhansen/Shimmer
Amr TM Saeb; Mohamed Abouelhoda; Manojkumar Selvaraju; Sahar I Althawadi; Maysoon Mutabagani; Mohammad Adil; Abdullah Al Hokail; Hamsa T Tayeb
Clostridium haemolyticum is the causal agent of bacillary hemoglobinuria in cattle, goat, sheep, and ruminants. In this study, we report the first recorded human-infecting C. haemolyticum strain collected from an 18-year-old woman diagnosed with acute lymphoblastic leukemia. After failure of traditional techniques, only next-generation sequencing (NGS) technology in combination with bioinformatics, phylogenetic, and pathogenomics analyses revealed that our King Faisal Specialist Hospital and ...
Yang, Lei; Naylor, Gavin J P
We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.
Stadler, Zsofia K.; Battaglin, Francesca; Middha, Sumit; Hechtman, Jaclyn F.; Tran, Christina; Cercek, Andrea; Yaeger, Rona; Segal, Neil H.; Varghese, Anna M.; Reidy-Lagunes, Diane L.; Kemeny, Nancy E.; Salo-Mullen, Erin E.; Ashraf, Asad; Weiser, Martin R.; Garcia-Aguilar, Julio; Robson, Mark E.; Offit, Kenneth; Arcila, Maria E.; Berger, Michael F.; Shia, Jinru; Solit, David B.
Purpose Tumor screening for Lynch syndrome is recommended in all or most patients with colorectal cancer (CRC). In metastatic CRC, sequencing of RAS/BRAF is necessary to guide clinical management. We hypothesized that a next-generation sequencing (NGS) panel that identifies RAS/BRAF and other actionable mutations could also reliably identify tumors with DNA mismatch repair protein deficiency (MMR-D) on the basis of increased mutational load. Methods We identified all CRCs that underwent genomic mutation profiling with a custom NGS assay (MSK-IMPACT) between March 2014 and July 2015. Tumor mutational load, with exclusion of copy number changes, was determined for each case and compared with MMR status as determined by routine immunohistochemistry. Results Tumors from 224 patients with unique CRC analyzed for MMR status also underwent MSK-IMPACT. Thirteen percent (n = 28) exhibited MMR-D by immunohistochemistry. Using the 341-gene assay, 100% of the 193 tumors with 150 mutations each. Each of these tumors harbored the P286R hotspot POLE mutation consistent with the ultramutator phenotype. Among MMR-D tumors, the median number of mutations was 50 (range, 20 to 90) compared with six (range, 0 to 17) in MMR-proficient/POLE wild-type tumors (P < .001). With a mutational load cutoff of ≥ 20 and < 150 for MMR-D detection, sensitivity and specificity were both 1.0 (95% CI, 0.93 to 1.0). Conclusion A cutoff for mutational load can be identified via multigene NGS tumor profiling, which provides a highly accurate means of screening for MMR-D in the same assay that is used for tumor genotyping. PMID:27022117
Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin
Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.
Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin
Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008
Jeffrey W Koehler
Full Text Available A detailed understanding of the circulating pathogens in a particular geographic location aids in effectively utilizing targeted, rapid diagnostic assays, thus allowing for appropriate therapeutic and containment procedures. This is especially important in regions prevalent for highly pathogenic viruses co-circulating with other endemic pathogens such as the malaria parasite. The importance of biosurveillance is highlighted by the ongoing Ebola virus disease outbreak in West Africa. For example, a more comprehensive assessment of the regional pathogens could have identified the risk of a filovirus disease outbreak earlier and led to an improved diagnostic and response capacity in the region. In this context, being able to rapidly screen a single sample for multiple pathogens in a single tube reaction could improve both diagnostics as well as pathogen surveillance. Here, probes were designed to capture identifying filovirus sequence for the ebolaviruses Sudan, Ebola, Reston, Taï Forest, and Bundibugyo and the Marburg virus variants Musoke, Ci67, and Angola. These probes were combined into a single probe panel, and the captured filovirus sequence was successfully identified using the MiSeq next-generation sequencing platform. This panel was then used to identify the specific filovirus from nonhuman primates experimentally infected with Ebola virus as well as Bundibugyo virus in human sera samples from the Democratic Republic of the Congo, thus demonstrating the utility for pathogen detection using clinical samples. While not as sensitive and rapid as real-time PCR, this panel, along with incorporating additional sequence capture probe panels, could be used for broad pathogen screening and biosurveillance.
Full Text Available Abstract Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison.
Wang, Nian; Fang, Linchuan; Xin, Haiping; Wang, Lijun; Li, Shaohua
Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison.
Full Text Available Cardiofaciocutaneous syndrome (CFCS belongs to a group of developmental disorders due to defects in the Ras/Mitogen-Activated Protein Kinase (RAS/MAPK signaling pathway named RASophaties. While postnatal presentation of these disorders is well known, the prenatal and neonatal characteristics are less recognized. Noonan syndrome, Costello syndrome, and CFCS diagnosis should be considered in pregnancies with a normal karyotype and in the case of ultrasound findings such as increased nuchal translucency, polyhydramnios, macrosomia and cardiac defect. Because all the RASopathies share similar clinical features, their molecular characterization is complex, time consuming and expensive. Here we report a case of CFCS prenatally diagnosed through Next Generation Prenatal Diagnosis (NGPD, a new targeted approach that allows us to concurrently investigate all the genes involved in the RASophaties.
Sahoo, Malaya K.; Tan, Susanna K.; Chen, Sharon F.; Kapusinszky, Beatrix; Concepcion, Katherine R.; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M.; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C.; Anderson, Matthew W.; Concepcion, Waldo
BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies. PMID:26202116
Full Text Available Structure and diversity of microbial communities are an important research topic in biology, since microbes play essential roles in the ecology of various environments. Different DNA isolation protocols can lead to data bias and can affect results of next-generation sequencing. To evaluate the impact of protocols for DNA isolation from soil samples and also the influence of individual handling of samples, we compared results obtained by two researchers (R and T using two different DNA extraction kits: (1 MO BIO PowerSoil® DNA Isolation kit (MO_R and MO_T and (2 NucleoSpin® Soil kit (MN_R and MN_T. Samples were collected from six different sites on Okinawa Island, Japan. For all sites, differences in the results of microbial composition analyses (bacteria, archaea, fungi, and other eukaryotes, obtained by the two researchers using the two kits, were analyzed. For both researchers, the MN kit gave significantly higher yields of genomic DNA at all sites compared to the MO kit (ANOVA; P < 0.006. In addition, operational taxonomic units for some phyla and classes were missed in some cases: Micrarchaea were detected only in the MN_T and MO_R analyses; the bacterial phylum Armatimonadetes was detected only in MO_R and MO_T; and WIM5 of the phylum Amoebozoa of eukaryotes was found only in the MO_T analysis. Our results suggest the possibility of handling bias; therefore, it is crucial that replicated DNA extraction be performed by at least two technicians for thorough microbial analyses and to obtain accurate estimates of microbial diversity.
Zhou, B; Xin, L; Xu, L; Liu, Y H; Zhang, M M; Jing, R L; Liang, X Y; Cao, S B
Objective: To explore the utility of circulating tumor DNA detection in early breast cancer by using next-generation sequencing. Methods: This exploratory study of circulating tumor DNA detection is for early invasive breast cancer patients treated in Breast Disease Center, Peking University First Hospital from December 2015 to July 2016. Plasma samples were collected and were used to isolate plasma cell-free DNA.Exons or hotspots of 247 cancer related genes were sequenced by next-generation sequencing. Mutations and their correlation with clinic-pathological factors were analyzed. The correlation between mutations and clinic-pathological factors was evaluated by χ(2) test or Fisher's exact test. Results: Seventy-five patients were enrolled in this study. All patients were female and aged from 31 to 88 years with median age of 58 years. All patients' clinic-pathological records were complete. Sixty-four mutations in 18 genes (ALK, BCR, ERBB2, ROS1, PDGFRA, EGFR, FGFR2, CYP1B1, CALR, CASP7, BRAF, FGFR1, FGFR3, MET, NRAS, PTEN, KIT, SOD2) were detected in 47 (62.7%) among all 75 patients.Exons were captured in 10 genes, and mutations in 2 of 3 genes analyzed were clustered. Gene mutations were not correlated with menopausal status, histological type, primary tumor (T), regional lymph nodes (N), TNM stage, histological grade, estrogen receptor status, progesterone receptor status, human epidermal growth factor receptor 2 status, Ki-67 and molecular subtype (all P >0.05). Conclusion: Circulating tumor DNA sequencing by next-generation sequencing was useful for detecting breast cancer-related mutations.
Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Uyaguari-Diaz, Miguel I; Slobodan, Jared R; Nesbitt, Matthew J; Croxen, Matthew A; Isaac-Renton, Judith; Prystajecky, Natalie A; Tang, Patrick
Next-generation sequencing of environmental samples can be challenging because of the variable DNA quantity and quality in these samples. High quality DNA libraries are needed for optimal results from next-generation sequencing. Environmental samples such as water may have low quality and quantities of DNA as well as contaminants that co-precipitate with DNA. The mechanical and enzymatic processes involved in extraction and library preparation may further damage the DNA. Gel size selection enables purification and recovery of DNA fragments of a defined size for sequencing applications. Nevertheless, this task is one of the most time-consuming steps in the DNA library preparation workflow. The protocol described here enables complete automation of agarose gel loading, electrophoretic analysis, and recovery of targeted DNA fragments. In this study, we describe a high-throughput approach to prepare high quality DNA libraries from freshwater samples that can be applied also to other environmental samples. We used an indirect approach to concentrate bacterial cells from environmental freshwater samples; DNA was extracted using a commercially available DNA extraction kit, and DNA libraries were prepared using a commercial transposon-based protocol. DNA fragments of 500 to 800 bp were gel size selected using Ranger Technology, an automated electrophoresis workstation. Sequencing of the size-selected DNA libraries demonstrated significant improvements to read length and quality of the sequencing reads.
Full Text Available The information from ancient DNA (aDNA provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome of two extinct passenger pigeons (Ectopistes migratorius using de novo assembly of massive short (90 bp, paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.
Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien
The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111
Khandeparker, L.; Kuchi, N.; Kale, D.; Anil, A.C.
Microbial community structure was analyzed from tropical monsoon influenced Mandovi-Zuari (Ma-Zu) estuarine sediment by means of Next Gen Sequencing (NGS) approach using Ion Torrent PGM™. The sequencing generated 80,282 raw sequence reads. Barcoding...
Kim, Y Y; Hwang, J; Kim, H-S; Kwon, H J; Kim, S; Lee, J H; Lee, J H
Mesiodens is the most common type of supernumerary tooth which includes a population prevalence of 0.15%-1.9%. Alongside evidence that the condition is heritable, mutations in single genes have been reported in few human supernumerary tooth cases. Gene sequencing methods in tradition way are time-consuming and labor-intensive, whereas next-generation sequencing and bioinformatics are cost-effective for large samples and target sizes. We describe the application of a targeted next-generation sequencing (NGS) and bioinformatics approach to samples from 17 mesiodens patients. Subjects were diagnosed on the basis of panoramic radiograph. A total of 101 candidate genes which were captured custom genes were sequenced on the Illumina HiSeq 2500. Multistep bioinformatics processing was performed including variant identification, base calling, and in silico analysis of putative disease-causing variants. Targeted capture identified 88 non-synonymous, rare, exonic variants involving 42 of the 101 candidate genes. Moreover, we investigated gene co-occurrence relationships between the genomic alterations and identified 88 significant relationships among 18 most recurrent driver alterations. Our search for co-occurring genetic alterations revealed that such alterations interact cooperatively to drive mesiodens. We discovered a gene co-occurrence network in mesiodens patients with functionally enriched gene groups in the sonic hedgehog (SHH), bone morphogenetic proteins (BMP), and wingless integrated (WNT) signaling pathways. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd. All rights reserved.
Ji, Yun; Abrams, Natalie; Zhu, Wei; Salinas, Eddie; Yu, Zhiya; Palmer, Douglas C; Jailwala, Parthav; Franco, Zulmarie; Roychoudhuri, Rahul; Stahlberg, Eric; Gattinoni, Luca; Restifo, Nicholas P
The pmel-1 T cell receptor transgenic mouse has been extensively employed as an ideal model system to study the mechanisms of tumor immunology, CD8+ T cell differentiation, autoimmunity and adoptive immunotherapy. The 'zygosity' of the transgene affects the transgene expression levels and may compromise optimal breeding scheme design. However, the integration sites for the pmel-1 mouse have remained uncharacterized. This is also true for many other commonly used transgenic mice created before the modern era of rapid and inexpensive next-generation sequencing. Here, we show that whole genome sequencing can be used to determine the exact pmel-1 genomic integration site, even with relatively 'shallow' (8X) coverage. The results were used to develop a validated polymerase chain reaction-based genotyping assay. For the first time, we provide a quick and convenient polymerase chain reaction method to determine the dosage of pmel-1 transgene for this freely and publically available mouse resource. We also demonstrate that next-generation sequencing provides a feasible approach for mapping foreign DNA integration sites, even when information of the original vector sequences is only partially known.
Schmidt, Ane Y; Hansen, Thomas V O; Ahlborn, Lise B
), it has become feasible to provide CNV information and sequence data using a single platform. We report the use of NGS gene panel sequencing on the Illumina MiSeq platform and JSI SeqPilot SeqNext software to call germline CNVs in BRCA1 and BRCA2. For validation 18 different BRCA1/BRCA2 CNVs previously...... identified by MLPA in 48 Danish breast and/or ovarian cancer families were analyzed. Moreover, 120 patient samples previously determined as negative for BRCA1/BRCA2 CNVs by MLPA were included in the analysis. Comparison of the NGS data with the data from MLPA revealed that the sensitivity was 100%, whereas......Genetic testing of BRCA1/2 includes screening for single nucleotide variants and small insertions/deletions and for larger copy number variations (CNVs), primarily by Sanger sequencing and multiplex ligation-dependent probe amplification (MLPA). With the advent of next-generation sequencing (NGS...
Bashtrykov, Pavel; Jeltsch, Albert
Methylation of cytosine bases in DNA is one of the main epigenetic signals regulating gene expression and chromatin structure. The distribution of DNA methylation in the genome has a cell-type-specific pattern and can be modulated by internal or external stimuli. One of the most powerful approaches to investigate DNA methylation patterns is bisulfite conversion of the DNA followed by DNA sequencing, which allows to determine methylation patterns at a single-cytosine resolution. Here, we present a protocol for bisulfite DNA methylation analysis of targeted genomic regions using amplicon-based next-generation sequencing (NGS) on an Illumina sequencing system. We use a PCR-free library generation approach and implement a nested strategy for double molecular barcoding of samples (combining indexing of adapters and in-line barcoding of individual amplicons) which allows highly multiplexed sequencing. Also, we discuss the main limitations of this technology in particular in relation to clonal DNA amplification and other PCR artifacts.
Ferrario, Chiara; Lugli, Gabriele Andrea; Ossiprandi, Maria Cristina; Turroni, Francesca; Milani, Christian; Duranti, Sabrina; Mancabelli, Leonardo; Mangifesta, Marta; Alessandri, Giulia; van Sinderen, Douwe; Ventura, Marco
Contamination of food by chemicals or pathogenic bacteria may cause particular illnesses that are linked to food consumption, commonly referred to as foodborne diseases. Bacteria are present in/on various foods products, such as fruits, vegetables and ready-to-eat products. Bacteria that cause foodborne diseases are known as foodborne pathogens (FBPs). Accurate detection methods that are able to reveal the presence of FBPs in food matrices are in constant demand, in order to ensure safe foods with a minimal risk of causing foodborne diseases. Here, a multiplex PCR-based Illumina sequencing method for FBP detection in food matrices was developed. Starting from 25 bacterial targets and 49 selected PCR primer pairs, a primer collection called foodborne pathogen - panel (FPP) consisting of 12 oligonucleotide pairs was developed. The FPP allows a more rapid and reliable identification of FBPs compared to classical cultivation methods. Furthermore, FPP permits sensitive and specific FBP detection in about two days from food sample acquisition to bioinformatics-based identification. The FPP is able to simultaneously identify eight different bacterial pathogens, i.e. Listeria monocytogenes, Campylobacter jejuni, Campylobacter coli, Salmonella enterica subsp. enterica serovar enteritidis, Escherichia coli, Shigella sonnei, Staphylococcus aureus and Yersinia enterocolitica, in a given food matrix at a threshold contamination level of 10 1 cell/g. Moreover, this novel detection method may represent an alternative and/or a complementary approach to PCR-based techniques, which are routinely used for FBP detection, and could be implemented in (parts of) the food chain as a quality check. Copyright © 2017 Elsevier B.V. All rights reserved.
Full Text Available Bradysia odoriphaga (Diptera: Sciaridae is the most important pest of Chinese chive. Insecticides are used widely and frequently to control B. odoriphaga in China. However, the performance of the insecticides chlorpyrifos and clothianidin in controlling the Chinese chive maggot is quite different. Using next generation sequencing technology, different expression unigenes (DEUs in B. odoriphaga were detected after treatment with chlorpyrifos and clothianidin for 6 and 48 h in comparison with control. The number of DEUs ranged between 703 and 1161 after insecticide treatment. In these DEUs, 370–863 unigenes can be classified into 41–46 categories of gene ontology (GO, and 354–658 DEUs can be mapped into 987–1623 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. The expressions of DEUs related to insecticide-metabolism-related genes were analyzed. The cytochrome P450-like unigene group was the largest group in DEUs. Most glutathione S-transferase-like unigenes were down-regulated and most sodium channel-like unigenes were up-regulated after insecticide treatment. Finally, 14 insecticide-metabolism-related unigenes were chosen to confirm the relative expression in each treatment by quantitative Real Time Polymerase Chain Reaction (qRT-PCR. The results of qRT-PCR and RNA Sequencing (RNA-Seq are fairly well-established. Our results demonstrate that a next-generation sequencing tool facilitates the identification of insecticide-metabolism-related genes and the illustration of the insecticide mechanisms of chlorpyrifos and clothianidin.
Full Text Available BACKGROUND: The concept of the utilization of rearranged ends for development of personalized biomarkers has attracted much attention owing to its clinical applicability. Although targeted next-generation sequencing (NGS for recurrent rearrangements has been successful in hematologic malignancies, its application to solid tumors is problematic due to the paucity of recurrent translocations. However, copy-number breakpoints (CNBs, which are abundant in solid tumors, can be utilized for identification of rearranged ends. METHOD: As a proof of concept, we performed targeted next-generation sequencing at copy-number breakpoints (TNGS-CNB in nine colon cancer cases including seven primary cancers and two cell lines, COLO205 and SW620. For deduction of CNBs, we developed a novel competitive single-nucleotide polymorphism (cSNP microarray method entailing CNB-region refinement by competitor DNA. RESULT: Using TNGS-CNB, 19 specific rearrangements out of 91 CNBs (20.9% were identified, and two polymerase chain reaction (PCR-amplifiable rearrangements were obtained in six cases (66.7%. And significantly, TNGS-CNB, with its high positive identification rate (82.6% of PCR-amplifiable rearrangements at candidate sites (19/23, just from filtering of aligned sequences, requires little effort for validation. CONCLUSION: Our results indicate that TNGS-CNB, with its utility for identification of rearrangements in solid tumors, can be successfully applied in the clinical laboratory for cancer-relapse and therapy-response monitoring.
Tan, BoonFei; Ng, Charmaine; Nshimyimana, Jean Pierre; Loh, Lay Leng; Gin, Karina Y-H; Thompson, Janelle R
Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS) technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU) rRNA hypervariable regions have allowed identification of signature microbial species that serve as bioindicators for sewage contamination in these environments. Beyond amplicon sequencing, metagenomic and metatranscriptomic analyses of microbial communities in fresh water environments reveal the genetic capabilities and interplay of waterborne microorganisms, shedding light on the mechanisms for production and biodegradation of toxins and other contaminants. This review discusses the challenges and benefits of applying NGS-based methods to water quality research and assessment. We will consider the suitability and biases inherent in the application of NGS as a screening tool for assessment of biological risks and discuss the potential and limitations for direct quantitative interpretation of NGS data. Secondly, we will examine case studies from recent literature where NGS based methods have been applied to topics in water quality assessment, including development of bioindicators for sewage pollution and microbial source tracking, characterizing the distribution of toxin and antibiotic resistance genes in water samples, and investigating mechanisms of biodegradation of harmful pollutants that threaten water quality. Finally, we provide a short review of emerging NGS platforms and their potential applications to the next generation of water quality assessment tools.
De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre
Premise of the study: Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. Methods and Results: A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. Conclusions: The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management. PMID:27610273
De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre
Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management.
Full Text Available Abstract Background With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated. Results We developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability. Conclusions ShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at http://www.cbg.ethz.ch/software/shorah.
Zagordi, Osvaldo; Bhattacharya, Arnab; Eriksson, Nicholas; Beerenwinkel, Niko
With next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated. We developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability. ShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at http://www.cbg.ethz.ch/software/shorah.
Bijwaard, Karen; Dickey, Jennifer S; Kelm, Kellie; Težak, Živana
The rapid emergence and clinical translation of novel high-throughput sequencing technologies created a need to clarify the regulatory pathway for the evaluation and authorization of these unique technologies. Recently, the US FDA authorized for marketing four next generation sequencing (NGS)-based diagnostic devices which consisted of two heritable disease-specific assays, library preparation reagents and a NGS platform that are intended for human germline targeted sequencing from whole blood. These first authorizations can serve as a case study in how different types of NGS-based technology are reviewed by the FDA. In this manuscript we describe challenges associated with the evaluation of these novel technologies and provide an overview of what was reviewed. Besides making validated NGS-based devices available for in vitro diagnostic use, these first authorizations create a regulatory path for similar future instruments and assays.
Xiao, Jianping; Guo, Xueqin; Wang, Yong
disease-causing mutations. Sanger sequencing was performed on all subjects to confirm the candidate mutations and assess cosegregation within the family. Results: Clinical examinations of the proband showed typical characteristics of RP. Three candidate heterozygous mutations in 3 genes associated with RP...... were detected in the proband by targeted NGS. The 3 mutations were confirmed by Sanger sequencing and the deletion (c.357_358delAA) in PRPF31 was shown to cosegregate with RP phenotype in 7 affected family members, but not in 3 unaffected family members. Conclusions: The deletion (c.357_358del......Purpose: To identify disease-causing mutations in a Chinese patient with retinitis pigmentosa (RP). Methods: A detailed clinical examination was performed on the proband. Targeted next-generation sequencing (NGS) combined with bioinformatics analysis was performed on the proband to detect candidate...
Mollerup, Sarah; Friis-Nielsen, Jens; Vinner, Lasse
Propionibacterium acnes is the most abundant bacterium on human skin, particularly in sebaceous areas. P. acnes is suggested to be an opportunistic pathogen involved in the development of diverse medical conditions, but is also a proven contaminant of human samples and surgical wounds. Its...... significance as a pathogen is consequently a matter of debate.In the present study we investigated the presence of P. acnes DNA in 250 next generation sequencing datasets generated from 180 samples of 20 different sample types, mostly of cancerous origin. The samples were either subjected to microbial...... enrichment, involving nuclease treatment to reduce the amount of host nucleic acids, or shotgun-sequenced.We detected high proportions of P. acnes in enriched samples, particularly skin derived and other tissue samples, with levels being higher in enriched compared to shotgun-sequenced samples. P. acnes...
Riman, Sarah; Kiesler, Kevin M; Borsuk, Lisa A; Vallone, Peter M
Standard Reference Materials SRM 2392 and 2392-I are intended to provide quality control when amplifying and sequencing human mitochondrial genome sequences. The National Institute of Standards and Technology (NIST) offers these SRMs to laboratories performing DNA-based forensic human identification, molecular diagnosis of mitochondrial diseases, mutation detection, evolutionary anthropology, and genetic genealogy. The entire mtGenome (∼16569bp) of SRM 2392 and 2392-I have previously been characterized at NIST by Sanger sequencing. Herein, we used the sensitivity, specificity, and accuracy offered by next generation sequencing (NGS) to: (1) re-sequence the certified values of the SRM 2392 and 2392-I; (2) confirm Sanger data with a high coverage new sequencing technology; (3) detect lower level heteroplasmies (sequencing communities in the adoption of NGS methods. To obtain a consensus sequence for the SRMs as well as identify and control any bias, sequencing was performed using two NGS platforms and data was analyzed using different bioinformatics pipelines. Our results confirm five low level heteroplasmy sites that were not previously observed with Sanger sequencing: three sites in the GM09947A template in SRM 2392 and two sites in the HL-60 template in SRM 2392-I. Copyright © 2017 Elsevier B.V. All rights reserved.
Full Text Available Annalisa Altimari,1,* Dario de Biase,2,* Giovanna De Maglio,3 Elisa Gruppioni,1 Elisa Capizzi,1 Alessio Degiovanni,1 Antonia D'Errico,1 Annalisa Pession,2 Stefano Pizzolitto,3 Michelangelo Fiorentino,1,# Giovanni Tallini2,#1Laboratory of Molecular Oncologic and Transplantation Pathology, S. Orsola-Malpighi Hospital, Bologna, 2Laboratory of Molecular Pathology, Anatomic Pathology, Bellaria Hospital, Bologna, 3Department of Pathology, S. Maria della Misericordia Hospital, Udine, Italy*These authors contributed equally to this work #These authors share senior authorshipAbstract: Detection of KRAS mutations in archival pathology samples is critical for therapeutic appropriateness of anti-EGFR monoclonal antibodies in colorectal cancer. We compared the sensitivity, specificity, and accuracy of Sanger sequencing, ARMS-Scorpion (TheraScreen® real-time polymerase chain reaction (PCR, pyrosequencing, chip array hybridization, and 454 next-generation sequencing to assess KRAS codon 12 and 13 mutations in 60 nonconsecutive selected cases of colorectal cancer. Twenty of the 60 cases were detected as wild-type KRAS by all methods with 100% specificity. Among the 40 mutated cases, 13 were discrepant with at least one method. The sensitivity was 85%, 90%, 93%, and 92%, and the accuracy was 90%, 93%, 95%, and 95% for Sanger sequencing, TheraScreen real-time PCR, pyrosequencing, and chip array hybridization, respectively. The main limitation of Sanger sequencing was its low analytical sensitivity, whereas TheraScreen real-time PCR, pyrosequencing, and chip array hybridization showed higher sensitivity but suffered from the limitations of predesigned assays. Concordance between the methods was k = 0.79 for Sanger sequencing and k > 0.85 for the other techniques. Tumor cell enrichment correlated significantly with the abundance of KRAS-mutated deoxyribonucleic acid (DNA, evaluated as ΔCt for TheraScreen real-time PCR (P = 0.03, percentage of mutation for
Chen, Huapu; Che, Zhiwei; Li, Jiantao; Dai, Mingli; Xiang, Ling; Deng, Siping; Zhu, Chunhua; Huang, Hai; Li, Guangli
Using Illumina next-generation sequencing (NGS), the complete mitochondrial genome of the Psenopsis anomala was sequenced in the present study. The mitochondrial genome of P. anomala is 16,528 bp long and consists of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, and a control region. The structure about gene order and composition of P. anomala mitochondrial genome is similar to those of most other vertebrates. The nucleotide compositions of the light strand in descending order is 29.18% of T, 27.97% of G, 27.06% of A, and 15.79% of C. With the exception of the NADH dehydrogenase subunit 6 (ND6) and eight tRNA genes, other mitochondrial genes are encoded on the heavy strand. The phylogenetic analysis by maximum-likelihood (ML) method shown that the Psenopsis anomala was closer to Peprilus triacanthus in the phylogenetic relationship.
Shore, Sabrina; Henderson, Jordana M; Lebedev, Alexandre; Salcedo, Michelle P; Zon, Gerald; McCaffrey, Anton P; Paul, Natasha; Hogrefe, Richard I
For most sample types, the automation of RNA and DNA sample preparation workflows enables high throughput next-generation sequencing (NGS) library preparation. Greater adoption of small RNA (sRNA) sequencing has been hindered by high sample input requirements and inherent ligation side products formed during library preparation. These side products, known as adapter dimer, are very similar in size to the tagged library. Most sRNA library preparation strategies thus employ a gel purification step to isolate tagged library from adapter dimer contaminants. At very low sample inputs, adapter dimer side products dominate the reaction and limit the sensitivity of this technique. Here we address the need for improved specificity of sRNA library preparation workflows with a novel library preparation approach that uses modified adapters to suppress adapter dimer formation. This workflow allows for lower sample inputs and elimination of the gel purification step, which in turn allows for an automatable sRNA library preparation protocol.
Full Text Available Transcriptome analysis of polar bear (Ursus maritimus tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV. Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos and black bear (Ursus americanus but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.
Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D
Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.
Jeck, William R; Parker, Joel; Carson, Craig C; Shields, Janiel M; Sambade, Maria J; Peters, Eldon C; Burd, Christin E; Thomas, Nancy E; Chiang, Derek Y; Liu, Wenjin; Eberhard, David A; Ollila, David; Grilley-Olson, Juneko; Moschos, Stergios; Neil Hayes, D; Sharpless, Norman E
Somatic sequencing of cancers has produced new insight into tumorigenesis, tumor heterogeneity, and disease progression, but the vast majority of genetic events identified are of indeterminate clinical significance. Here, we describe a NextGen sequencing approach to fully analyzing 248 genes, including all those of known clinical significance in melanoma. This strategy features solution capture of DNA followed by multiplexed, high-throughput sequencing and was evaluated in 31 melanoma cell lines and 18 tumor tissues from patients with metastatic melanoma. Mutations in melanoma cell lines correlated with their sensitivity to corresponding small molecule inhibitors, confirming, for example, lapatinib sensitivity in ERBB4 mutant lines and identifying a novel activating mutation of BRAF. The latter event would not have been identified by clinical sequencing and was associated with responsiveness to a BRAF kinase inhibitor. This approach identified focal copy number changes of PTEN not found by standard methods, such as comparative genomic hybridization (CGH). Actionable mutations were found in 89% of the tumor tissues analyzed, 56% of which would not be identified by standard-of-care approaches. This work shows that targeted sequencing is an attractive approach for clinical use in melanoma. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.
Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552
Hye Suck An
Full Text Available Mytilus coruscus (family Mytilidae is one of the most important marine shellfish species in Korea. During the past few decades, this species has become endangered due to the loss of habitats and overfishing. Despite this species’ importance, information on its genetic background is scarce. In this study, we developed microsatellite markers for M. coruscus using next-generation sequencing. A total of 263,900 raw reads were obtained from a quarter-plate run on the 454 GS-FLX titanium platform, and 176,327 unique sequences were generated with an average length of 381 bp; 2569 (1.45% sequences contained a minimum of five di- to tetra-nucleotide repeat motifs. Of the 51 loci screened, 46 were amplified successfully, and 22 were polymorphic among 30 individuals, with seven of trinucleotide repeats and three of tetranucleotide repeats. All loci exhibited high genetic variability, with an average of 17.32 alleles per locus, and the mean observed and expected heterozygosities were 0.67 and 0.90, respectively. In addition, cross-amplification was tested for all 22 loci in another congener species, M. galloprovincialis. None of the primer pairs resulted in effective amplification, which might be due to their high mutation rates. Our work demonstrated the utility of next-generation 454 sequencing as a method for the rapid and cost-effective identification of microsatellites. The high degree of polymorphism exhibited by the 22 newly developed microsatellites will be useful in future conservation genetic studies of this species.
Fiszer, Dorota; Shaw, Marie-Anne; Fisher, Nickla A.; Carr, Ian M.; Gupta, Pawan K.; Watkins, Elizabeth J.; de Sa, Daniel Roiz; Kim, Jerry H.; Hopkins, Philip M.
Background Variants in RYR1 are associated with the majority of cases of malignant hyperthermia (MH), a form of heat illness pharmacogenetically triggered by general anesthetics, and they have also been associated with exertional heat illness. CACNA1S has also been implicated in MH. We applied a targeted next generation sequencing approach to identify variants in RYR1 and CACNA1S in a cohort of unrelated patients diagnosed with MH susceptibility. We also provide the first comprehensive report of sequencing of these two genes in a cohort of survivors of exertional heat illness. Methods DNA extracted from blood was genotyped using a “long” polymerase chain reaction technique, with sequencing on the Illumina GAII® or MiSeq® platforms (Illumina Inc., San Diego, CA). Variants were assessed for pathogenicity using bioinformatic approaches. For further follow up DNA from additional family members and up to 211 MH normal and 556 MH susceptible unrelated individuals was tested. Results In 29 MH patients we identified three pathogenic and four novel RYR1 variants, with a further five RYR1 variants previously reported in association with MH. Three novel RYR1 variants were found in the exertional heat illness cohort (n = 28) along with two more previously reported in association with MH. Two other variants were reported previously associated with centronuclear myopathy. We found one and three rare variants of unknown significance in CACNA1S in the MH and exertional heat illness cohorts respectively. Conclusion Targeted next generation sequencing proved efficient at identifying diagnostically useful and potentially implicated variants in RYR1 and CACNA1S in MH and exertional heat illness. PMID:25658027
Fiszer, Dorota; Shaw, Marie-Anne; Fisher, Nickla A; Carr, Ian M; Gupta, Pawan K; Watkins, Elizabeth J; Roiz de Sa, Daniel; Kim, Jerry H; Hopkins, Philip M
Variants in RYR1 are associated with the majority of cases of malignant hyperthermia (MH), a form of heat illness pharmacogenetically triggered by general anesthetics, and they have also been associated with exertional heat illness (EHI). CACNA1S has also been implicated in MH. The authors applied a targeted next-generation sequencing approach to identify variants in RYR1 and CACNA1S in a cohort of unrelated patients diagnosed with MH susceptibility. They also provide the first comprehensive report of sequencing of these two genes in a cohort of survivors of EHI. DNA extracted from blood was genotyped using a "long" polymerase chain reaction technique, with sequencing on the Illumina GAII or MiSeq platforms (Illumina Inc., USA). Variants were assessed for pathogenicity using bioinformatic approaches. For further follow-up, DNA from additional family members and up to 211 MH normal and 556 MH-susceptible unrelated individuals was tested. In 29 MH patients, the authors identified three pathogenic and four novel RYR1 variants, with a further five RYR1 variants previously reported in association with MH. Three novel RYR1 variants were found in the EHI cohort (n = 28) along with two more previously reported in association with MH. Two other variants were reported previously associated with centronuclear myopathy. The authors found one and three rare variants of unknown significance in CACNA1S in the MH and EHI cohorts, respectively. Targeted next-generation sequencing proved efficient at identifying diagnostically useful and potentially implicated variants in RYR1 and CACNA1S in MH and EHI.
Mouatt, Julia Thidamarth Vilstrup
The sequencing of ancient DNA provides perspectives on the genetic history of past populations and extinct species. However, ancient DNA research presents specific limitations mostly due to DNA survival, damage and contamination. Yet with stringent laboratory procedures, the sensitivity of target...... enrichment methods and the massive throughput and latest advances within DNA sequencing, the field of ancient DNA has flourished in later years. Those advances have even enabled the sequencing of complete genomes from the past, moving the field into genomic sciences. In this thesis we have used these latest...... ial genomes of extant and extinct taxa, and dated major radiation events within Equidae. We have also revealed the phylogenetic origins of hutias, a group of capromyid rodents from the West Indies using museum specimens and a museomic approach, and at the other end of the spectrum, characterized...
Christensen, Rikke; Væth, Signe; Thorsen, Kasper
Background: Charcot-Marie-Tooth Disease (CMT) is one of the most common inherited neurological diseases. Today, more than 70 CMT related genes are known to cause inherited neuropathy. The diagnostic strategy in most laboratories is based on Sanger-sequencing of few genes. In our patient cohort......, Sanger sequencing of 4 genes have led to a diagnosis in approximately 30% of the patients. Aims: 1) Development of a targeted NGS platform containing 63 genes that currently are found to be associated with CMT. 2) Analysis of the increased diagnostic yield using this platform to analyze 200 CMT samples...... previously analyzed using Sanger sequencing without identification of a disease causing mutation. Materials and Methods: Libraries for 200 patient samples obtained for CMT diagnostics were prepared using Illumina Truseq and target enrichment using SeqCap EZ Choise Library (Nimblegen). The libraries were...
Fadista, João; Bendixen, Christian
extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...
Sønstebø, J. H.; Gielly, L.; Brysting, A. K.
) intron; a short (13-158 bp) and variable region with highly conserved flanking sequences. For taxonomic reference, a whole trnL intron sequence database was constructed from recently collected material of 842 species, representing all widespread and/or ecologically important taxa of the species......Palaeoenvironments and former climates are typically inferred from pollen and macrofossil records. This approach is time-consuming and suffers from low taxonomic resolution and biased taxon sampling. Here, we test an alternative DNA-based approach utilizing the P6 loop in the chloroplast trnL (UAA...
Full Text Available As plenty of nonmodel plants are without genomic sequences, the combination of molecular technologies and the next generation sequencing (NGS platform has led to a new approach to study the genetic variations of these plants. Software GATK, SOAPsnp, samtools, and others are often used to deal with the NGS data. In this study, BLAST was applied to call SNPs from 16 mixed functional gene’s sequence data of polyploidy wheat. In total 1.2 million reads were obtained with the average of 7500 reads per genes. To get accurate information, 390,992 pair reads were successfully assembled before aligning to those functional genes. Standalone BLAST tools were used to map assembled sequence to functional genes, respectively. Polynomial fitting was applied to find the suitable minor allele frequency (MAF threshold at 6% for assembled reads of each functional gene. SNPs accuracy form assembled reads, pretrimmed reads, and original reads were compared, which declared that SNPs mined from the assembled reads were more reliable than others. It was also demonstrated that mixed samples’ NGS sequences and then analysis by BLAST were an effective, low-cost, and accurate way to mine SNPs for nonmodel species. Assembled reads and polynomial fitting threshold were recommended for more accurate SNPs target.
Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J
In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available Abstract Background Polyploidy is important from a phylogenetic perspective because of its immense past impact on evolution and its potential future impact on diversification, survival and adaptation, especially in plants. Molecular population genetics studies of polyploid organisms have been difficult because of problems in sequencing multiple-copy nuclear genes using Sanger sequencing. This paper describes a method for sequencing a barcoded mixture of targeted gene regions using next-generation sequencing methods to overcome these problems. Results Using 64 3-bp barcodes, we successfully sequenced three chloroplast and two nuclear gene regions (each of which contained two gene copies with up to two alleles per individual in a total of 60 individuals across 11 species of Australian Poa grasses. This method had high replicability, a low sequencing error rate (after appropriate quality control and a low rate of missing data. Eighty-eight percent of the 320 gene/individual combinations produced sequence reads, and >80% of individuals produced sufficient reads to detect all four possible nuclear alleles of the homeologous nuclear loci with 95% probability. We applied this method to a group of sympatric Australian alpine Poa species, which we discovered to share an allopolyploid ancestor with a group of American Poa species. All markers revealed extensive allele sharing among the Australian species and so we recommend that the current taxonomy be re-examined. We also detected hypermutation in the trnH-psbA marker, suggesting it should not be used as a land plant barcode region. Some markers indicated differentiation between Tasmanian and mainland samples. Significant positive spatial genetic structure was detected at Conclusions Our results demonstrate that 454 sequencing of barcoded amplicon mixtures can be used to reliably sample all alleles of homeologous loci in polyploid species and successfully investigate phylogenetic relationships among
Taboada, Eduardo N; Graham, Morag R; Carriço, João A; Van Domselaar, Gary
Public health labs and food regulatory agencies globally are embracing whole genome sequencing (WGS) as a revolutionary new method that is positioned to replace numerous existing diagnostic and microbial typing technologies with a single new target: the microbial draft genome. The ability to cheaply generate large amounts of microbial genome sequence data, combined with emerging policies of food regulatory and public health institutions making their microbial sequences increasingly available and public, has served to open up the field to the general scientific community. This open data access policy shift has resulted in a proliferation of data being deposited into sequence repositories and of novel bioinformatics software designed to analyze these vast datasets. There also has been a more recent drive for improved data sharing to achieve more effective global surveillance, public health and food safety. Such developments have heightened the need for enhanced analytical systems in order to process and interpret this new type of data in a timely fashion. In this review we outline the emergence of genomics, bioinformatics and open data in the context of food safety. We also survey major efforts to translate genomics and bioinformatics technologies out of the research lab and into routine use in modern food safety labs. We conclude by discussing the challenges and opportunities that remain, including those expected to play a major role in the future of food safety science.
Eduardo N. Taboada
Full Text Available Public health labs and food regulatory agencies globally are embracing whole genome sequencing (WGS as a revolutionary new method that is positioned to replace numerous existing diagnostic and microbial typing technologies with a single new target: the microbial draft genome. The ability to cheaply generate large amounts of microbial genome sequence data, combined with emerging policies of food regulatory and public health institutions making their microbial sequences increasingly available and public, has served to open up the field to the general scientific community. This open data access policy shift has resulted in a proliferation of data being deposited into sequence repositories and of novel bioinformatics software designed to analyze these vast datasets. There also has been a more recent drive for improved data sharing to achieve more effective global surveillance, public health and food safety. Such developments have heightened the need for enhanced analytical systems in order to process and interpret this new type of data in a timely fashion. In this review we outline the emergence of genomics, bioinformatics and open data in the context of food safety. We also survey major efforts to translate genomics and bioinformatics technologies out of the research lab and into routine use in modern food safety labs. We conclude by discussing the challenges and opportunities that remain, including those expected to play a major role in the future of food safety science.
Alana Alexander; Debbie Steel; Beth Slikas; Kendra Hoekzema; Colm Carraher; Matthew Parks; Richard Cronn; C. Scott Baker
Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20...
Jul 26, 2017 ... condition as diverse as epilepsy at a low cost compared to traditional Sanger sequencing (Lemke et al. 2012; Németh et al. 2013). Our 377 gene epilepsy NGS test was developed to include genes known to cause or have published association with epilepsy and seizure-related disorders. Given the scale of ...
Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.
Jørgensen, Johannes Ravn; Carstensen, Jens Michael; Søren, Knudsen
Seeds of Barley (Hordeum vulgare) are infected by a high number of fungi, including pathogens such as Fusarium graminearum, F. culmorum, F. poae, F. avenaceum and Pyrenophora teres. Fusarium spp. is a widely distributed fungus causing yield reduction in a range of agricultural crops and many...... species in the genus produce mycotoxins responsible for serious quality deterioration. In malting barley, Fusarium also has a negative effect by causing gushing in beer. A number of barley seeds (app. 200) assumed to be infected by fungal from different origins and years of cultivation were tested by NGS...... sequencing the ITS (Internal Transcribed Spacer) region from total DNA. Approximately 2-4000 sequences were obtained from each seed and these were subsequently identified to species level in order to give an exact identification of fungal genera on each seed. The main fungal genera identified were Fusarium...
Bozan, Mahir; Akyol, Çağrı; Ince, Orhan; Aydin, Sevcan; Ince, Bahar
The anaerobic digestion of lignocellulosic wastes is considered an efficient method for managing the world's energy shortages and resolving contemporary environmental problems. However, the recalcitrance of lignocellulosic biomass represents a barrier to maximizing biogas production. The purpose of this review is to examine the extent to which sequencing methods can be employed to monitor such biofuel conversion processes. From a microbial perspective, we present a detailed insight into anaerobic digesters that utilize lignocellulosic biomass and discuss some benefits and disadvantages associated with the microbial sequencing techniques that are typically applied. We further evaluate the extent to which a hybrid approach incorporating a variation of existing methods can be utilized to develop a more in-depth understanding of microbial communities. It is hoped that this deeper knowledge will enhance the reliability and extent of research findings with the end objective of improving the stability of anaerobic digesters that manage lignocellulosic biomass.
Tanaka, T.; Kobayashi, F.; Joshi, G.P.; Šimková, Hana; Nasuda, S.; Doležel, Jaroslav; Handa, H.
Roč. 21, č. 2 (2014), s. 103-114 ISSN 1340-2838 R&D Projects: GA ČR GBP501/12/G090 Grant - others:GA MŠk(CZ) ED0007/01/01 Program:ED Institutional support: RVO:61389030 Keywords : wheat * chromosome 6B * genome sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 5.477, year: 2014
Archer, J.; Weber, Jan; Henry, K.; Winner, D.; Gibson, R.; Lee, L.; Paxinos, E.; Arts, E. J.; Robertson, D. L.; Mimms, L.; Quinones-Mateu, M. E.
Roč. 7, č. 11 (2012), e49602/1-e49602/17 E-ISSN 1932-6203 R&D Projects: GA MŠk(CZ) LK11207 Institutional research plan: CEZ:AV0Z40550506 Keywords : HIV -1 tropism * V3 region * deep sequencing Subject RIV: EE - Microbiology, Virology Impact factor: 3.730, year: 2012 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0049602
Gobet, Angélique; Boetius, Antje; Ramette, Alban
Changes in richness and bacterial community structure obtained via 454 Massively Parallel Tag Sequencing (MPTS) and Automated Ribosomal Intergenic Analysis (ARISA) were systematically compared to determine whether and how the ecological knowledge obtained from both molecular techniques could be combined. We evaluated community changes over time and depth in marine coastal sands at different levels of taxonomic resolutions, sequence corrections and sequence abundances. Although richness over depth layers or sampling dates greatly varied [∼ 30% and 70-80% new operational taxonomic units (OTU) between two samples with ARISA and MPTS respectively], overall patterns of community variations were similar with both approaches. Alpha-diversity estimated by ARISA-derived OTU was most similar to that obtained from MPTS-derived OTU defined at the order level. Similar patterns of OTU replacement were also found with MPTS at the family level and with 20-25% rare types removed. Using ARISA or MPTS datasets with lower resolution, such as those containing only resident OTU, yielded a similar set of significant contextual variables explaining bacterial community changes. Hence, ARISA as a rapid and low-cost fingerprinting technique represents a valid starting point for more in-depth exploration of community composition when complemented by the detailed taxonomic description offered by MPTS. © 2013 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.
Trombetta, John J; Gennert, David; Lu, Diana; Satija, Rahul; Shalek, Alex K; Regev, Aviv
For the past several decades, due to technical limitations, the field of transcriptomics has focused on population-level measurements that can mask significant differences between individual cells. With the advent of single-cell RNA-Seq, it is now possible to profile the responses of individual cells at unprecedented depth and thereby uncover, transcriptome-wide, the heterogeneity that exists within these populations. This unit describes a method that merges several important technologies to produce, in high-throughput, single-cell RNA-Seq libraries. Complementary DNA (cDNA) is made from full-length mRNA transcripts using a reverse transcriptase that has terminal transferase activity. This, when combined with a second "template-switch" primer, allows for cDNAs to be constructed that have two universal priming sequences. Following preamplification from these common sequences, Nextera XT is used to prepare a pool of 96 uniquely indexed samples ready for Illumina sequencing. Copyright © 2014 John Wiley & Sons, Inc.
Ruiz-Ruano, Francisco J; Cuadrado, Ángeles; Montiel, Eugenia E; Camacho, Juan Pedro M; López-León, María Dolores
Simple sequence repeats (SSRs), also known as microsatellites, are one of the prominent DNA sequences shaping the repeated fraction of eukaryotic genomes. In spite of their profuse use as molecular markers for a variety of genetic and evolutionary studies, their genomic location, distribution, and function are not yet well understood. Here we report the first thorough joint analysis of microsatellite motifs at both genomic and chromosomal levels in animal species, by a combination of 454 sequencing and fluorescent in situ hybridization (FISH) techniques performed on two grasshopper species. The in silico analysis of the 454 reads suggested that microsatellite expansion is not driving size increase of these genomes, as SSR abundance was higher in the species showing the smallest genome. However, the two species showed the same uneven and nonrandom location of SSRs, with clear predominance of dinucleotide motifs and association with several types of repetitive elements, mostly histone gene spacers, ribosomal DNA intergenic spacers (IGS), and transposable elements (TEs). The FISH analysis showed a dispersed chromosome distribution of microsatellite motifs in euchromatic regions, in coincidence with chromosome location patterns previously observed for many mobile elements in these species. However, some SSR motifs were clustered, especially those located in the histone gene cluster.
Massaia, Andrea; Xue, Yali
The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.
Allum, Fiona; Shao, Xiaojian; Guénard, Frédéric; Simon, Marie-Michelle; Busche, Stephan; Caron, Maxime; Lambourne, John; Lessard, Julie; Tandre, Karolina; Hedman, Åsa K; Kwan, Tony; Ge, Bing; Rönnblom, Lars; McCarthy, Mark I; Deloukas, Panos; Richmond, Todd; Burgess, Daniel; Spector, Timothy D; Tchernof, André; Marceau, Simon; Lathrop, Mark; Vohl, Marie-Claude; Pastinen, Tomi; Grundberg, Elin
Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.
Full Text Available Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU rRNA hypervariable regions have allowed identification of signature microbial species that serve as bioindicators for sewage contamination in these environments. Beyond amplicon sequencing, metagenomic and metatranscriptomic analyses of microbial communities in fresh water environments reveal the genetic capabilities and interplay of waterborne microorganisms, shedding light on the mechanisms for production and biodegradation of toxins and other contaminants. This review discusses the challenges and benefits of applying NGS-based methods to water quality research and assessment. We will consider the suitability and biases inherent in the application of NGS as a screening tool for assessment of biological risks and discuss the potential and limitations for direct quantitative interpretation of NGS data. Secondly, we will examine case studies from recent literature where NGS based methods have been applied to topics in water quality assessment, including development of bioindicators for sewage pollution and microbial source tracking, characterizing the distribution of toxin and antibiotic resistance genes in water samples, and investigating mechanisms of biodegradation of harmful pollutants that threaten water quality. Finally, we provide a short review of emerging NGS platforms and their potential applications to the next generation of water quality assessment tools.
Okubo, Mariko; Minami, Narihiro; Goto, Kanako; Goto, Yuichi; Noguchi, Satoru; Mitsuhashi, Satomi; Nishino, Ichizo
Duchenne and Becker muscular dystrophies (DMD/BMD) are the most common inherited neuromuscular disease. The genetic diagnosis is not easily made because of the large size of the dystrophin gene, complex mutational spectrum and high number of tests patients undergo for diagnosis. Multiplex ligation-dependent probe amplification (MLPA) has been used as the initial diagnostic test of choice. Although MLPA can diagnose 70% of DMD/BMD patients having deletions/duplications, the remaining 30% of patients with small mutations require further analysis, such as Sanger sequencing. We applied a high-throughput method using Ion Torrent next-generation sequencing technology and diagnosed 92% of patients with DMD/BMD in a single analysis. We designed a multiplex primer pool for DMD and sequenced 67 cases having different mutations: 37 with deletions/duplications and 30 with small mutations or short insertions/deletions in DMD, using an Ion PGM sequencer. The results were compared with those from MLPA or Sanger sequencing. All deletions were detected. In contrast, 50% of duplications were correctly identified compared with the MLPA method. Small insertions in consecutive bases could not be detected. We estimated that Ion Torrent sequencing could diagnose ~92% of DMD/BMD patients according to the mutational spectrum of our cohort. Our results clearly indicate that this method is suitable for routine clinical practice providing novel insights into comprehensive genetic information for future molecular therapy.
McPherson, Hannah; van der Merwe, Marlien; Delaney, Sven K; Edwards, Mark A; Henry, Robert J; McIntosh, Emma; Rymer, Paul D; Milner, Melita L; Siow, Juelian; Rossetto, Maurizio
With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology.
Full Text Available Hiroshi Ikeda,1 Kazuya Ishiguro,1 Tetsuyuki Igarashi,1 Yuka Aoki,1 Toshiaki Hayashi,1 Tadao Ishida,1 Yasushi Sasaki,1,2 Takashi Tokino,2 Yasuhisa Shinomura1 1Department of Gastroenterology, Rheumatology and Clinical Immunology, 2Medical Genome Sciences, Research Institute for Frontier Medicine, Sapporo Medical University, Sapporo, Japan Abstract: A 69-year-old man was diagnosed with IgG λ-type multiple myeloma (MM, Stage II in October 2010. He was treated with one cycle of high-dose dexamethasone. After three cycles of bortezomib, the patient exhibited slow elevations in the free light-chain levels and developed a significant new increase of serum M protein. Bone marrow cytogenetic analysis revealed a complex karyotype characteristic of malignant plasma cells. To better understand the molecular pathogenesis of this patient, we sequenced for mutations in t