Penelope K Lindeque
Full Text Available BACKGROUND: Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel next generation sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify richness and diversity of a mixed zooplankton assemblage from a productive time series site in the Western English Channel. METHODOLOGY/PRINCIPLE FINDINGS: Plankton net hauls (200 µm were taken at the Western Channel Observatory station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,041 sequences were obtained for all samples. The sequences clustered into 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 135 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 58 taxonomic groups. CONCLUSIONS: Metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and hard-to-identify meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for elucidating the true diversity and species richness of zooplankton communities. While this approach allows for broad diversity assessments of plankton it may
Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske
The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation...... rates that previously were obtained only from extrapolations of results from in vitro kinetic experiments performed over short timescales. For example, recent next-generation sequencing of ancient DNA reveals purine bases as one of the main targets of postmortem hydrolytic damage, through base...... elimination and strand breakage. It also shows substantially increased rates of DNA base-loss at guanosine. In this review, we argue that the latter results from an electron resonance structure unique to guanosine rather than adenosine having an extra resonance structure over guanosine as previously suggested....
Mardis, Elaine R.
Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Full Text Available Next Generation Sequencing (NGS refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance. The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche, Illumina (formerly Solexa Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies and the Heliscope (Helicos Corporation. The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. [...
Scholtalbers, J.; Rossler, J.; Sorn, P.; Graaf, J. de; Boisguerin, V.; Castle, J.; Sahin, U.
SUMMARY: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input
McDaniel, Andrew S; Stall, Jennifer N; Hovelson, Daniel H; Cani, Andi K; Liu, Chia-Jen; Tomlins, Scott A; Cho, Kathleen R
High-grade serous carcinoma (HGSC) is the most prevalent and lethal form of ovarian cancer. HGSCs frequently arise in the distal fallopian tubes rather than the ovary, developing from small precursor lesions called serous tubal intraepithelial carcinomas (TICs, or more specifically, STICs). While STICs have been reported to harbor TP53 mutations, detailed molecular characterizations of these lesions are lacking. We performed targeted next-generation sequencing (NGS) on formalin-fixed, paraffin-embedded tissue from 4 women, 2 with HGSC and 2 with uterine endometrioid carcinoma (UEC) who were diagnosed as having synchronous STICs. We detected concordant mutations in both HGSCs with synchronous STICs, including TP53 mutations as well as assumed germline BRCA1/2 alterations, confirming a clonal association between these lesions. Next-generation sequencing confirmed the presence of a STIC clonally unrelated to 1 case of UEC, and NGS of the other tubal lesion diagnosed as a STIC unexpectedly supported the lesion as a micrometastasis from the associated UEC. We demonstrate that targeted NGS can identify genetic alterations in minute lesions, such as TICs, and confirm TP53 mutations as early driving events for HGSC. Next-generation sequencing also demonstrated unexpected associations between presumed STICs and synchronous carcinomas, providing evidence that some TICs are actually metastases rather than HGSC precursors.
Zoll, Jan; Snelders, Eveline; Verweij, Paul E; Melchers, Willem J G
New state-of-the-art techniques in sequencing offer valuable tools in both detection of mycobiota and in understanding of the molecular mechanisms of resistance against antifungal compounds and virulence. Introduction of new sequencing platform with enhanced capacity and a reduction in costs for sequence analysis provides a potential powerful tool in mycological diagnosis and research. In this review, we summarize the applications of next-generation sequencing techniques in mycology.
The existing techniques have contributed significantly to our current knowledge of allelic diversity. At present, sequence-based typing (SBT) methods, in particular next-generation sequencing. (NGS), provide the highest possible resolution. NGS platforms were initially only used for genomic sequencing, but also showed.
Full Text Available Background: Recently, a growing number of novel genetic defects underlying primary immunodeficiencies (PID have been identified, increasing the number of PID up to more than 250 well-defined forms. Next-generation sequencing (NGS technologies and proper filtering strategies greatly contributed to this rapid evolution, providing the possibility to rapidly and simultaneously analyze large numbers of genes or the whole exome. Objective: To evaluate the role of targeted next-generation sequencing and whole exome sequencing in the diagnosis of a case series, characterized by complex or atypical clinical features suggesting a PID, difficult to diagnose using the current diagnostic procedures.Methods: We retrospectively analyzed genetic variants identified through targeted next-generation sequencing or whole exome sequencing in 45 patients with complex PID of unknown etiology. Results: 40 variants were identified using targeted next-generation sequencing, while 5 were identified using whole exome sequencing. Newly identified genetic variants were classified into 4 groups: I variations associated with a well-defined PID; II variations associated with atypical features of a well-defined PID; III functionally relevant variations potentially involved in the immunological features; IV non-diagnostic genotype, in whom the link with phenotype is missing. We reached a conclusive genetic diagnosis in 7/45 patients (~16%. Among them, 4 patients presented with a typical well-defined PID. In the remaining 3 cases, mutations were associated with unexpected clinical features, expanding the phenotypic spectrum of typical PIDs. In addition, we identified 31 variants in 10 patients with complex phenotype, individually not causative per se of the disorder.Conclusion: NGS technologies represent a cost-effective and rapid first-line genetic approaches for the evaluation of complex PIDs. Whole exome sequencing, despite a moderate higher cost compared to targeted, is
Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D
- Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.
Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.
Elingaramil, Sauli; Li, Xiaolong; He, Nongyue
Next-generation sequencing technologies, microarrays and advances in bio nanotechnology have had an enormous impact on research within a short time frame. This impact appears certain to increase further as many biomedical institutions are now acquiring these prevailing new technologies. Beyond conventional sampling of genome content, wide-ranging applications are rapidly evolving for next-generation sequencing, microarrays and nanotechnology. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted re sequencing and discovery of transcription factor binding sites, noncoding RNA expression profiling and molecular diagnostics. This paper thus discusses current applications of nanotechnology, next-generation sequencing technologies and microarrays in biomedical research and highlights the transforming potential these technologies offer.
Full Text Available The invention of next-generation-sequencing has revolutionized almost all fields of genetics, but few have profited from it as much as the field of ancient DNA research. From its beginnings as an interesting but rather marginal discipline, ancient DNA research is now on its way into the centre of evolutionary biology. In less than a year from its invention next-generation-sequencing had increased the amount of DNA sequence data available from extinct organisms by several orders of magnitude. Ancient DNA research is now not only adding a temporal aspect to evolutionary studies and allowing for the observation of evolution in real time, it also provides important data to help understand the origins of our own species. Here we review progress that has been made in next-generation-sequencing of ancient DNA over the past five years and evaluate sequencing strategies and future directions.
Full Text Available Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care.
Full Text Available Accessory, supernumerary, or—most simply—B chromosomes, are found in many eukaryotic karyotypes. These small chromosomes do not follow the usual pattern of segregation, but rather are transmitted in a higher than expected frequency. As increasingly being demonstrated by next-generation sequencing (NGS, their structure comprises fragments of standard (A chromosomes, although in some plant species, their sequence also includes contributions from organellar genomes. Transcriptomic analyses of various animal and plant species have revealed that, contrary to what used to be the common belief, some of the B chromosome DNA is protein-encoding. This review summarizes the progress in understanding B chromosome biology enabled by the application of next-generation sequencing technology and state-of-the-art bioinformatics. In particular, a contrast is drawn between a direct sequencing approach and a strategy based on a comparative genomics as alternative routes that can be taken towards the identification of B chromosome sequences.
Bräutigam, Andrea; Gowik, Udo
Next generation sequencing (NGS) technologies have opened fascinating opportunities for the analysis of plants with and without a sequenced genome on a genomic scale. During the last few years, NGS methods have become widely available and cost effective. They can be applied to a wide variety of biological questions, from the sequencing of complete eukaryotic genomes and transcriptomes, to the genome-scale analysis of DNA-protein interactions. In this review, we focus on the use of NGS for pla...
Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk
to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......Next Generation Sequencing (NGS) is becoming more adopted into viral research and will be the preferred technology in the years to come. We have recently sequenced several strains of Classical Swine Fever Virus (CSFV) by NGS on both Genome Sequencer FLX (GS FLX) and Iontorrent PGM platforms...
Møller, Rikke S.; Dahl, Hans A.; Helbig, Ingo
During the last decade, next generation sequencing technologies such as targeted gene panels, whole exome sequencing and whole genome sequencing have led to an explosion of gene identifications in monogenic epilepsies including both familial epilepsies and severe epilepsies, often referred to as ...
Full Text Available The emergence of next-generation sequencing (NGS platforms imposes increasing demands on statistical methods and bioinformatic tools for the analysis and the management of the huge amounts of data generated by these technologies. Even at the early stages of their commercial availability, a large number of softwares already exist for analyzing NGS data. These tools can be fit into many general categories including alignment of sequence reads to a reference, base-calling and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection and genome browsing. This manuscript aims to guide readers in the choice of the available computational tools that can be used to face the several steps of the data analysis workflow.
Piednoël, M.; Aberer, A.J.; Schneeweiss, G. M.; Macas, Jiří; Novák, Petr; Gundlach, H.; Temsch, E.M.; Renner, S.S.
Roč. 29, č. 11 (2012), s. 3601-3611 ISSN 0737-4038 Institutional research plan: CEZ:AV0Z50510513 Institutional support: RVO:60077344 Keywords : next-generation sequencing * polyploidy * genome size * Ty3/Gypsy * transposable elements Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 10.353, year: 2012
Endrullat, Christoph; Glökler, Jörn; Franke, Philipp; Frohme, Marcus
DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.
Yang, Ye; Liu, Juan
We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh
MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.
Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo
Purpose: Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. Methods: To identify the causative gene, next-generation sequencing bas...
Suresh, Padmanaban S; Venkatesh, Thejaswini; Tsutsumi, Rie; Shetty, Abhishek
Contemporary molecular biology research tools have enriched numerous areas of biomedical research that address challenging diseases, including endocrine cancers (pituitary, thyroid, parathyroid, adrenal, testicular, ovarian, and neuroendocrine cancers). These tools have placed several intriguing clues before the scientific community. Endocrine cancers pose a major challenge in health care and research despite considerable attempts by researchers to understand their etiology. Microarray analyses have provided gene signatures from many cells, tissues, and organs that can differentiate healthy states from diseased ones, and even show patterns that correlate with stages of a disease. Microarray data can also elucidate the responses of endocrine tumors to therapeutic treatments. The rapid progress in next-generation sequencing methods has overcome many of the initial challenges of these technologies, and their advantages over microarray techniques have enabled them to emerge as valuable aids for clinical research applications (prognosis, identification of drug targets, etc.). A comprehensive review describing the recent advances in next-generation sequencing methods and their application in the evaluation of endocrine and endocrine-related cancers is lacking. The main purpose of this review is to illustrate the concepts that collectively constitute our current view of the possibilities offered by next-generation sequencing technological platforms, challenges to relevant applications, and perspectives on the future of clinical genetic testing of patients with endocrine tumors. We focus on recent discoveries in the use of next-generation sequencing methods for clinical diagnosis of endocrine tumors in patients and conclude with a discussion on persisting challenges and future objectives.
Background. The large number of population-specific polymorphisms present in the HLA complex in the South African (SA) population reduces the probability of finding an adequate HLA-matched donor for individuals in need of an unrelated haematopoietic stem cell transplantation (HSCT). Next-generation sequencing ...
Deurenberg, Ruud H.; Bathoorn, Erik; Chlebowicz, Monika A.; Couto, Natacha; Ferdous, Mithila; Garcia-Cobos, Silvia; Kooistra-Smid, Anna M. D.; Raangs, Erwin C.; Rosema, Sigrid; Veloo, Alida C. M.; Zhou, Kai; Friedrich, Alexander W.; Rossen, John W. A.
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data,
Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko
Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to ...
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Doyle, Stephen R; Griffith, Ian S; Murphy, Nick P; Strugnell, Jan M
The complete mitochondrial genome of the Eastern Rock lobster, Sagmariasus verreauxi, is reported for the first time. Using low-coverage, long read MiSeq next generation sequencing, we constructed and determined the mtDNA genome organization of the 15,470 bp sequence from two isolates from Eastern Tasmania, Australia and Northern New Zealand, and identified 46 polymorphic nucleotides between the two sequences. This genome sequence and its genetic polymorphisms will likely be useful in understanding the distribution and population connectivity of the Eastern Rock Lobster, and in the fisheries management of this commercially important species.
Full Text Available Abstract Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison.
Gustavo S. Fernandes
Full Text Available OBJECTIVES: With the development of next-generation sequencing (NGS technologies, DNA sequencing has been increasingly utilized in clinical practice. Our goal was to investigate the impact of genomic evaluation on treatment decisions for heavily pretreated patients with metastatic cancer. METHODS: We analyzed metastatic cancer patients from a single institution whose cancers had progressed after all available standard-of-care therapies and whose tumors underwent next-generation sequencing analysis. We determined the percentage of patients who received any therapy directed by the test, and its efficacy. RESULTS: From July 2013 to December 2015, 185 consecutive patients were tested using a commercially available next-generation sequencing-based test, and 157 patients were eligible. Sixty-six patients (42.0% were female, and 91 (58.0% were male. The mean age at diagnosis was 52.2 years, and the mean number of pre-test lines of systemic treatment was 2.7. One hundred and seventy-seven patients (95.6% had at least one identified gene alteration. Twenty-four patients (15.2% underwent systemic treatment directed by the test result. Of these, one patient had a complete response, four (16.7% had partial responses, two (8.3% had stable disease, and 17 (70.8% had disease progression as the best result. The median progression-free survival time with matched therapy was 1.6 months, and the median overall survival was 10 months. CONCLUSION: We identified a high prevalence of gene alterations using an next-generation sequencing test. Although some benefit was associated with the matched therapy, most of the patients had disease progression as the best response, indicating the limited biological potential and unclear clinical relevance of this practice.
Willenbrock, Hanni; Salomon, Jesper; Søkilde, Rolf
Recently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two...... technologies have only been compared based on biological data, leading to the conclusion that, although they are somewhat correlated, expression values differ significantly. Here, we use synthetic RNA samples, resembling human microRNA samples, to find that microarray expression measures actually correlate...... better with sample RNA content than expression measures obtained from sequencing data. In addition, microarrays appear highly sensitive and perform equivalently to next-generation sequencing in terms of reproducibility and relative ratio quantification....
Bowen, Margot Elizabeth
Next Generation Sequencing (NGS) technologies have dramatically increased the throughput and lowered the cost of DNA sequencing. In this thesis, I apply these technologies to unresolved questions in skeletal development and disease. Firstly, I use targeted re-sequencing of genomic DNA to identify the genetic cause of the cartilage tumor syndrome, metachondromatosis (MC). I show that the majority of MC patients carry heterozygous loss-of-function mutations in the PTPN11 gene, which encodes a p...
Iqbal, Z.; Neveling, K.; Razzaq, A.; Shahzad, M.; Zahoor, M.Y.; Qasim, M.; Gilissen, C.F.H.A.; Wieskamp, N.; Kwint, M.P.; Gijsen, S.; de Brouwer, A.P.; Veltman, J.A.; Riazuddin, S.; Bokhoven, J.H.L.M. van
BACKGROUNDS AND AIMS: Next generation sequencing (NGS) approaches have revolutionized the identification of mutations underlying genetic disorders. This technology is particularly useful for the identification of mutations in known and new genes for conditions with extensive genetic heterogeneity.
During the last two decades, genotyping technology has advanced rapidly, which enabled the tremendous success of genome-wide association studies (GWAS) in the search of disease susceptibility loci (DSLs). However, only a small fraction of the overall predicted heritability can be explained by the DSLs discovered. One possible explanation for this ”missing heritability” phenomenon is that many causal variants are rare. The recent development of high-throughput next-generation sequencing (NGS) ...
Heredia, Nicholas J
Digital PCR is a valuable tool to quantify next-generation sequencing (NGS) libraries precisely and accurately. Accurately quantifying NGS libraries enable accurate loading of the libraries on to the sequencer and thus improve sequencing performance by reducing under and overloading error. Accurate quantification also benefits users by enabling uniform loading of indexed/barcoded libraries which in turn greatly improves sequencing uniformity of the indexed/barcoded samples. The advantages gained by employing the Droplet Digital PCR (ddPCR™) library QC assay includes the precise and accurate quantification in addition to size quality assessment, enabling users to QC their sequencing libraries with confidence.
Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo
Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND.
Full Text Available Purpose: Norrie disease (ND is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. Methods: To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. Results: We identified a novel missense variant (c.314C>A located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. Conclusion: c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND.
Amy E O'Connell
Full Text Available The Wiskott Aldrich syndrome (WAS is due to mutations of the WAS gene encoding for the cytoskeletal WAS protein (WASp, leading to abnormal downstream signaling from the T cell and B cell antigen receptors (TCR, BCR. We hypothesized that the impaired signaling through the TCR and BCR in WAS would subsequently lead to aberrations in the immune repertoire of WAS patients. Using next generation sequencing, the T cell receptor beta (TRB and B cell immunoglobulin heavy chain (IGH repertoires of 8 patients with WAS and 6 controls were sequenced. Clonal expansions were identified within memory CD4+ cells, as well as in total, naïve and memory CD8+ cells from WAS patients. In the B cell compartment, WAS patient IGH repertoires were also clonally expanded and showed skewed usage of IGHV and IGHJ genes, and increased usage of IGHG constant genes, compared with controls. To our knowledge, this is the first study that demonstrates significant abnormalities of the immune repertoire in WAS patients using next generation sequencing.
Full Text Available In vitro selection technology has transformed the development of therapeutic monoclonal antibodies. Using methods such as phage, ribosome, and yeast display, high affinity binders can be selected from diverse repertoires. Here, we review strategies for the next-generation sequencing (NGS of phage- and other antibody-display libraries, as well as NGS platforms and analysis tools. Moreover, we discuss recent examples relating to the use of NGS to assess library diversity, clonal enrichment, and affinity maturation.
Full Text Available BACKGROUND: Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools. RESULTS: We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics. Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim. CONCLUSIONS: NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it's freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php.
Until recently, the focus in dental research has been on studying a small fraction of the oral microbiome—so-called opportunistic pathogens. With the advent of next-generation sequencing (NGS) technologies, researchers now have the tools that allow for profiling of the microbiomes and metagenomes at
Full Text Available The yeast two-hybrid (Y2H system exploits host cell genetics in order to display binary protein-protein interactions (PPIs via defined and selectable phenotypes. Numerous improvements have been made to this method, adapting the screening principle for diverse applications, including drug discovery and the scale-up for proteome wide interaction screens in human and other organisms. Here we discuss a systematic workflow and analysis scheme for screening data generated by Y2H and related assays that includes high-throughput selection procedures, readout of comprehensive results via next-generation sequencing (NGS, and the interpretation of interaction data via quantitative statistics. The novel assays and tools will serve the broader scientific community to harness the power of NGS technology to address PPI networks in health and disease. We discuss examples of how this next-generation platform can be applied to address specific questions in diverse fields of biology and medicine.
Jul 26, 2017 ... Clinical utility of a 377 gene custom next-generation sequencing epilepsy panel ... number of genes, making it a very attractive option for a condition as .... clinical value of various test offerings to guide decision making.
So Mee Kwon
Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.
Fernando J Rossello
Full Text Available Next-generation sequencing (NGS studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC, a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.
Milne, Iain; Bayer, Micha; Cardle, Linda; Shaw, Paul; Stephen, Gordon; Wright, Frank; Marshall, David
Summary: Tablet is a lightweight, high-performance graphical viewer for next-generation sequence assemblies and alignments. Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine. Availability: Tablet is freely available for Microsoft Windows, Apple Mac OS X, Linux and Solaris. Fully bundled installers can be downloaded from http://bioinf.scri.ac.uk/tablet in 32- and 64-bit versions. Contact: firstname.lastname@example.org PMID:19965881
nervous system ABSTRACT Objective: To determine the feasibility of next-generation sequencing (NGS) microbiome ap- proaches in the diagnosis of infectious...V, van Doorn HR, Nghia HD, et al. Identification of a new cyclovirus in cerebrospinal fluid of patients with acute central nervous system infections...Kumar, et al. system Next-generation sequencing in neuropathologic diagnosis of infections of the nervous This information is current as of June 13
Deurenberg, Ruud H.; Bathoorn, Erik; Chlebowicz, Monika A.; Monge Gomes do Couto, Natacha; Ferdous, Mithila; Garcia-Cobos, Silvia; Kooistra-Smid, Anna M. D.; Raangs, Erwin C.; Rosema, Sigrid; Veloo, Alida C. M.; Zhou, Kai; Friedrich, Alexander W.; Rossen, John W. A.
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data,
Full Text Available ABSTRACT Next-generation sequencing (NGS is the catch all terms that used to explain several different modern sequencing technologies which let us to sequence nucleic acids much more rapidly and cheaply than the formerly used Sanger sequencing, and as such have revolutionized the study of molecular biology and genomics with excellent resolution and accuracy. Over the past years, many academic companies and institutions have continued technological advances to expand NGS applications from research to the clinic. In this review, the performance and technical features of current NGS platforms were described. Furthermore, advances in the applying of NGS technologies towards the progress of clinical molecular diagnostics were emphasized. General advantages and disadvantages of each sequencing system are summarized and compared to guide the selection of NGS platforms for specific research aims.
Fumagalli, Matteo; Garrett Vieira, Filipe Jorge; Korneliussen, Thorfinn Sand
method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy to investigate population structure via Principal Components Analysis. Through extensive simulations, we compare the new method herein proposed to approaches based...... on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled......Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data...
Chan Cheong Xin
Full Text Available Abstract Thanks to advances in next-generation technologies, genome sequences are now being generated at breadth (e.g. across environments and depth (thousands of closely related strains, individuals or samples unimaginable only a few years ago. Phylogenomics – the study of evolutionary relationships based on comparative analysis of genome-scale data – has so far been developed as industrial-scale molecular phylogenetics, proceeding in the two classical steps: multiple alignment of homologous sequences, followed by inference of a tree (or multiple trees. However, the algorithms typically employed for these steps scale poorly with number of sequences, such that for an increasing number of problems, high-quality phylogenomic analysis is (or soon will be computationally infeasible. Moreover, next-generation data are often incomplete and error-prone, and analysis may be further complicated by genome rearrangement, gene fusion and deletion, lateral genetic transfer, and transcript variation. Here we argue that next-generation data require next-generation phylogenomics, including so-called alignment-free approaches. Reviewers Reviewed by Mr Alexander Panchin (nominated by Dr Mikhail Gelfand, Dr Eugene Koonin and Prof Peter Gogarten. For the full reviews, please go to the Reviewers’ comments section.
Kwok, Hin; Chiang, Alan Kwok Shing
Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Full Text Available Genomic sequences of Epstein–Barr virus (EBV have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Novák, Petr; Neumann, Pavel; Macas, Jiří
Roč. 11, č. 1 (2010), s. 378-389 ISSN 1471-2105 R&D Projects: GA MŠk(CZ) OC10037; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : repetitive DNA * plant genome * next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.028, year: 2010
Full Text Available Next-generation sequencing (NGS has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus; beet curly top virus and beet severe curly top virus (curtovirus; and bean yellow dwarf virus (mastrevirus. The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus and cucumber vein yellowing virus (ipomovirus, family, Potyviridae by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.Keywords: Next-generation sequencing, NGS, plant virology, plant viruses, viroids, resistance to plant viruses by CRISPR-Cas9
Piednoël, Mathieu; Aberer, Andre J.; Schneeweiss, Gerald M.; Macas, Jiri; Novak, Petr; Gundlach, Heidrun; Temsch, Eva M.; Renner, Susanne S.
We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%–28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%–22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types. PMID:22723303
Holcomb, C L; Rastrou, M; Williams, T C; Goodridge, D; Lazaro, A M; Tilanus, M; Erlich, H A
The high-resolution human leukocyte antigen (HLA) genotyping assay that we developed using 454 sequencing and Conexio software uses generic polymerase chain reaction (PCR) primers for DRB exon 2. Occasionally, we observed low abundance DRB amplicon sequences that resulted from in vitro PCR 'crossing over' between DRB1 and DRB3/4/5. These hybrid sequences, revealed by the clonal sequencing property of the 454 system, were generally observed at a read depth of 5%-10% of the true alleles. They usually contained at least one mismatch with the IMGT/HLA database, and consequently, were easily recognizable and did not cause a problem for HLA genotyping. Sometimes, however, these artifactual sequences matched a rare allele and the automatic genotype assignment was incorrect. These observations raised two issues: (1) could PCR conditions be modified to reduce such artifacts? and (2) could some of the rare alleles listed in the IMGT/HLA database be artifacts rather than true alleles? Because PCR crossing over occurs during late cycles of PCR, we compared DRB genotypes resulting from 28 and (our standard) 35 cycles of PCR. For all 21 cell line DNAs amplified for 35 cycles, crossover products were detected. In 33% of the cases, these hybrid sequences corresponded to named alleles. With amplification for only 28 cycles, these artifactual sequences were not detectable. To investigate whether some rare alleles in the IMGT/HLA database might be due to PCR artifacts, we analyzed four samples obtained from the investigators who submitted the sequences. In three cases, the sequences were generated from true alleles. In one case, our 454 sequencing revealed an error in the previously submitted sequence. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Full Text Available Next generation sequencing (NGS instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, only three research groups working in plant sciences have exploited this potentiality. They showed that pooled NGS can provide results in excellent agreement with those obtained by individual Sanger sequencing. Aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method we will explain in detail the variations in study design and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled next generation sequencing can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity and Tajima’s D. Finally we will discuss applications and future perspectives of the multiplexed NGS approach.
Schreiber, Matthew; Dorschner, Michael; Tsuang, Debby
Schizophrenia is a debilitating lifelong illness that lacks a cure and poses a worldwide public health burden. The disease is characterized by a heterogeneous clinical and genetic presentation that complicates research efforts to identify causative genetic variations. This review examines the potential of current findings in schizophrenia and in other related neuropsychiatric disorders for application in next-generation technologies, particularly whole-exome sequencing (WES) and whole-genome sequencing (WGS). These approaches may lead to the discovery of underlying genetic factors for schizophrenia and may thereby identify and target novel therapeutic targets for this devastating disorder. © 2013 Wiley Periodicals, Inc.
Full Text Available Many viruses, including the clinically relevant RNA viruses HIV and HCV, exist in large populations and display high genetic heterogeneity within and between infected hosts. Assessing intra-patient viral genetic diversity is essential for understanding the evolutionary dynamics of viruses, for designing effective vaccines, and for the success of antiviral therapy. Next-generation sequencing technologies allow the rapid and cost-effective acquisition of thousands to millions of short DNA sequences from a single sample. However, this approach entails several challenges in experimental design and computational data analysis. Here, we review the entire process of inferring viral diversity from sample collection to computing measures of genetic diversity. We discuss sample preparation, including reverse transcription and amplification, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination. The use of different next-generation sequencing platforms and their sequencing error profiles are compared in the context of various applications of diversity estimation, ranging from the detection of single nucleotide variants to the reconstruction of whole-genome haplotypes. We describe the statistical and computational challenges arising from these technical artifacts, and we review existing approaches, including available software, for their solution. Finally, we discuss open problems, and highlight successful biomedical applications and potential future clinical use of next-generation sequencing to estimate viral diversity.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone
Rasmussen, Maria; Sunde, Lone; Nielsen, Marlene Louise
Aim and Introduction Identification of abnormal kidneys in the fetus may lead to termination of the pregnancy and raises questions about the underlying cause and recurrence risk in future pregnancies. In this study, we investigate the effectiveness of targeted next generation sequencing in fetuses...... with prenatally detected kidney anomalies in order to uncover genetic explanations and assess recurrence risk. Also, we aim to study the relation between genetic findings and post mortem kidney histology. Methods The study comprises fetuses diagnosed prenatally with bilateral kidney anomalies that have undergone...... postmortem examination. The approximately 110 genes included in the targeted panel were chosen on the basis of their potential involvement in embryonic kidney development, cystic kidney disease, or the renin-angiotensin system. DNA was extracted from fetal tissue samples or cultured chorion villus cells...
Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer; Jun, Albert S; Asnaghi, Laura; Salzberg, Steven L; Eberhart, Charles G
We test the ability of next-generation sequencing, combined with computational analysis, to identify a range of organisms causing infectious keratitis. This retrospective study evaluated 16 cases of infectious keratitis and four control corneas in formalin-fixed tissues from the pathology laboratory. Infectious cases also were analyzed in the microbiology laboratory using culture, polymerase chain reaction, and direct staining. Classified sequence reads were analyzed with two different metagenomics classification engines, Kraken and Centrifuge, and visualized using the Pavian software tool. Sequencing generated 20 to 46 million reads per sample. On average, 96% of the reads were classified as human, 0.3% corresponded to known vectors or contaminant sequences, 1.7% represented microbial sequences, and 2.4% could not be classified. The two computational strategies successfully identified the fungal, bacterial, and amoebal pathogens in most patients, including all four bacterial and mycobacterial cases, five of six fungal cases, three of three Acanthamoeba cases, and one of three herpetic keratitis cases. In several cases, additional potential pathogens also were identified. In one case with cytomegalovirus identified by Kraken and Centrifuge, the virus was confirmed by direct testing, while two where Staphylococcus aureus or cytomegalovirus were identified by Centrifuge but not Kraken could not be confirmed. Confirmation was not attempted for an additional three potential pathogens identified by Kraken and 11 identified by Centrifuge. Next generation sequencing combined with computational analysis can identify a wide range of pathogens in formalin-fixed corneal specimens, with potential applications in clinical diagnostics and research.
Alkhateeb, Abedalrhman; Rueda, Luis
Next-generation sequencing technology generates a huge number of reads (short sequences), which contain a vast amount of genomic data. The sequencing process, however, comes with artifacts. Preprocessing of sequences is mandatory for further downstream analysis. We present Zseq, a linear method that identifies the most informative genomic sequences and reduces the number of biased sequences, sequence duplications, and ambiguous nucleotides. Zseq finds the complexity of the sequences by counting the number of unique k-mers in each sequence as its corresponding score and also takes into the account other factors such as ambiguous nucleotides or high GC-content percentage in k-mers. Based on a z-score threshold, Zseq sweeps through the sequences again and filters those with a z-score less than the user-defined threshold. Zseq algorithm is able to provide a better mapping rate; it reduces the number of ambiguous bases significantly in comparison with other methods. Evaluation of the filtered reads has been conducted by aligning the reads and assembling the transcripts using the reference genome as well as de novo assembly. The assembled transcripts show a better discriminative ability to separate cancer and normal samples in comparison with another state-of-the-art method. Moreover, de novo assembled transcripts from the reads filtered by Zseq have longer genomic sequences than other tested methods. Estimating the threshold of the cutoff point is introduced using labeling rules with optimistic results.
Jonathan B Puritz
Full Text Available The field of phylogeography has long since realized the need and utility of incorporating nuclear DNA (nDNA sequences into analyses. However, the use of nDNA sequence data, at the population level, has been hindered by technical laboratory difficulty, sequencing costs, and problematic analytical methods dealing with genotypic sequence data, especially in non-model organisms. Here, we present a method utilizing the 454 GS-FLX Titanium pyrosequencing platform with the capacity to simultaneously sequence two species of sea star (Meridiastra calcar and Parvulastra exigua at five different nDNA loci across 16 different populations of 20 individuals each per species. We compare results from 3 populations with traditional Sanger sequencing based methods, and demonstrate that this next-generation sequencing platform is more time and cost effective and more sensitive to rare variants than Sanger based sequencing. A crucial advantage is that the high coverage of clonally amplified sequences simplifies haplotype determination, even in highly polymorphic species. This targeted next-generation approach can greatly increase the use of nDNA sequence loci in phylogeographic and population genetic studies by mitigating many of the time, cost, and analytical issues associated with highly polymorphic, diploid sequence markers.
Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...
Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.
On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human
Larsen, Martin Jakob; Burton, Mark; Thomassen, Mads
Accurate mutation detection is essential in clinical genetic diagnostics of monogenic hereditary diseases. Targeted next generation sequencing (NGS) provides a promising and cost-effective alternative to Sanger sequencing and MLPA analysis currently used in most diagnostic laboratories. One...... of mutation positive controls previously characterized by Sanger/MLPA analysis. Agilent SureSelect Target-Enrichment kits were used for capturing a set of genes associated with hereditary breast and ovarian cancer syndrome and a compilation of genes involved in multiple rare single gene disorders......, respectively. For diagnostics, the sequencing coverage is essential, wherefore a minimum coverage of 30x per nucleotide in the coding regions was used as our primary quality criterion. For the majority of the included genes, we obtained adequate gene coverage, in which we were able to detect 100% of the known...
Ustek, Duran; Sirma, Sema; Gumus, Ergun; Arikan, Muzaffer; Cakiris, Aris; Abaci, Neslihan; Mathew, Jaicy; Emrence, Zeliha; Azakli, Hulya; Cosan, Fulya; Cakar, Atilla; Parlak, Mahmut; Kursun, Olcay
One application of next-generation sequencing (NGS) is the targeted resequencing of interested genes which has not been used in viral integration site analysis of gene therapy applications. Here, we combined targeted sequence capture array and next generation sequencing to address the whole genome profiling of viral integration sites. Human 293T and K562 cells were transduced with a HIV-1 derived vector. A custom made DNA probe sets targeted pLVTHM vector used to capture lentiviral vector/human genome junctions. The captured DNA was sequenced using GS FLX platform. Seven thousand four hundred and eighty four human genome sequences flanking the long terminal repeats (LTR) of pLVTHM fragment sequences matched with an identity of at least 98% and minimum 50 bp criteria in both cells. In total, 203 unique integration sites were identified. The integrations in both cell lines were totally distant from the CpG islands and from the transcription start sites and preferentially located in introns. A comparison between the two cell lines showed that the lentiviral-transduced DNA does not have the same preferred regions in the two different cell lines. Copyright © 2012 Elsevier B.V. All rights reserved.
Fabio eMarroni; Sara ePinosio; Sara ePinosio; Michele eMorgante
Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, only three research groups working in plant sciences have exploited this potentiality. They showed that pooled NGS can provide results in excellent agreement with those obt...
Marroni, Fabio; Pinosio, Sara; Morgante, Michele
Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by indiv...
Wouters, Roel H P; Bijlsma, Rhodé M; Ausems, Margreet G E M; van Delden, Johannes J M; Voest, Emile E; Bredenoord, Annelien L
Ever since genetic testing is possible for specific mutations, ethical debate has sparked on the question of whether professionals have a duty to warn not only patients but also their relatives that might be at risk for hereditary diseases. As next generation sequencing swiftly finds its way into
Rama R Gullapalli
Full Text Available The Human Genome Project (HGP provided the initial draft of mankind′s DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.  We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it′s hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M
Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10 7 log 10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (U L _U S ) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated U L_ U S segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.
Breese, Marcus R.; Liu, Yunlong
Summary: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. These tools provide a stable and modular platform for data management and analysis.
Kim, Su Yeon; Lohmueller, Kirk E; Albrechtsen, Anders
Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., frequency estimation...
Molenaar, Nicholas; Burger, Johan T; Maree, Hans J
The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).
Weisschuh, Nicole; Mayer, Anja K; Strom, Tim M
Retinal dystrophies (RD) constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing...
Bowling, Bethany; Zimmer, Erin; Pyatt, Robert E.
Although the development of next-generation (NextGen) sequencing technologies has revolutionized genomic research and medicine, the incorporation of these topics into the classroom is challenging, given an implied high degree of technical complexity. We developed an easy-to-implement, interactive classroom activity investigating the similarities…
Blomstrøm, Monica Marie
several growth modulators and invasion modulators were identified and independently validated. These candidates revealed a group of genes with metastasis-related functions in vitro that are involved in RNA-related processes, such as RNA-processing. Moreover, a general feature was that proliferation......) and non-CSCs. The main goal of this project was to functionally characterize a set of candidate genes recovered from next-generation sequencing analysis for their role in breast cancer metastasis formation. The starting gene set comprised 104 gene variants; i.e. 57 wildtype and 47 mutated variants. During...
Gong, Zhuwen; Yu, Yongguo; Zhang, Qigang; Gu, Xuefan
To provide prenatal diagnosis for a pregnant woman who had given birth to a child with Fanconi anemia with combined next-generation sequencing (NGS) and Sanger sequencing. For the affected child, potential mutations of the FANCA gene were analyzed with NGS. Suspected mutation was verified with Sanger sequencing. For prenatal diagnosis, genomic DNA was extracted from cultured fetal amniotic fluid cells and subjected to analysis of the same mutations. A low-frequency frameshifting mutation c.989_995del7 (p.H330LfsX2, inherited from his father) and a truncating mutation c.3971C>T (p.P1324L, inherited from his mother) have been identified in the affected child and considered to be pathogenic. The two mutations were subsequently verified by Sanger sequencing. Upon prenatal diagnosis, the fetus was found to carry two mutations. The combined next-generation sequencing and Sanger sequencing can reduce the time for diagnosis and identify subtypes of Fanconi anemia and the mutational sites, which has enabled reliable prenatal diagnosis of this disease.
Joensen, Katrine Grimstrup; Engsbro, A L Ø; Lukjancenko, Oksana
The accurate microbiological diagnosis of diarrhoea involves numerous laboratory tests and, often, the pathogen is not identified in time to guide clinical management. With next-generation sequencing (NGS) becoming cheaper, it has huge potential in routine diagnostics. The aim of this study...... was to evaluate the potential of NGS-based diagnostics through direct sequencing of faecal samples. Fifty-eight clinical faecal samples were obtained from patients with diarrhoea as part of the routine diagnostics at Hvidovre University Hospital, Denmark. Ten samples from healthy individuals were also included...
Keller, A; Danner, N; Grimmer, G; Ankenbrand, M; von der Ohe, K; von der Ohe, W; Rost, S; Härtel, S; Steffan-Dewenter, I
The identification of pollen plays an important role in ecology, palaeo-climatology, honey quality control and other areas. Currently, expert knowledge and reference collections are essential to identify pollen origin through light microscopy. Pollen identification through molecular sequencing and DNA barcoding has been proposed as an alternative approach, but the assessment of mixed pollen samples originating from multiple plant species is still a tedious and error-prone task. Next-generation sequencing has been proposed to avoid this hindrance. In this study we assessed mixed pollen probes through next-generation sequencing of amplicons from the highly variable, species-specific internal transcribed spacer 2 region of nuclear ribosomal DNA. Further, we developed a bioinformatic workflow to analyse these high-throughput data with a newly created reference database. To evaluate the feasibility, we compared results from classical identification based on light microscopy from the same samples with our sequencing results. We assessed in total 16 mixed pollen samples, 14 originated from honeybee colonies and two from solitary bee nests. The sequencing technique resulted in higher taxon richness (deeper assignments and more identified taxa) compared to light microscopy. Abundance estimations from sequencing data were significantly correlated with counted abundances through light microscopy. Simulation analyses of taxon specificity and sensitivity indicate that 96% of taxa present in the database are correctly identifiable at the genus level and 70% at the species level. Next-generation sequencing thus presents a useful and efficient workflow to identify pollen at the genus and species level without requiring specialised palynological expert knowledge. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.
Next-generation sequencing technologies are able to produce high-throughput short sequence reads in a cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. Here I survey their major applications ranging...
Tabatabaeifar, Siavosh; Kruse, Torben A; Thomassen, Mads
Background: Oral cavity cancer is a subgroup of head and neck cancer which is the world’s 6th most common cancer form. Oral squamous cell carcinomas (OSCC) constitute almost all oral cavity cancers, and OSCC are primarily attributed by excessive alcohol consumption and tobacco exposure...... of tumour cells exists. Conclusions: Use of next generation sequencing in oral cavity cancer can give valuable insight into the biology of the disease. By investigating intra tumour heterogeneity we see that the different tumour specimens in each patient are quite homogenous, but evidence of heterogeneous...
Shen, Li; Shao, Ningyi; Liu, Xiaochuan; Nestler, Eric
Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury
Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
Hegele, Robert A; Ban, Matthew R; Cao, Henian; McIntyre, Adam D; Robinson, John F; Wang, Jian
To evaluate the potential clinical translation of high-throughput next-generation sequencing (NGS) methods in diagnosis and management of dyslipidemia. Recent NGS experiments indicate that most causative genes for monogenic dyslipidemias are already known. Thus, monogenic dyslipidemias can now be diagnosed using targeted NGS. Targeting of dyslipidemia genes can be achieved by either: designing custom reagents for a dyslipidemia-specific NGS panel; or performing genome-wide NGS and focusing on genes of interest. Advantages of the former approach are lower cost and limited potential to detect incidental pathogenic variants unrelated to dyslipidemia. However, the latter approach is more flexible because masking criteria can be altered as knowledge advances, with no need for re-design of reagents or follow-up sequencing runs. Also, the cost of genome-wide analysis is decreasing and ethical concerns can likely be mitigated. DNA-based diagnosis is already part of the clinical diagnostic algorithms for familial hypercholesterolemia. Furthermore, DNA-based diagnosis is supplanting traditional biochemical methods to diagnose chylomicronemia caused by deficiency of lipoprotein lipase or its co-factors. The increasing availability and decreasing cost of clinical NGS for dyslipidemia means that its potential benefits can now be evaluated on a larger scale.
Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K
Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition
Skotte, Line; Korneliussen, Thorfinn Sand; Albrechtsen, Anders
computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies...... of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach...... to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains...
Sucher, Nikolaus J; Hennell, James R; Carles, Maria C
DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.
Kuśmirek, Wiktor; Nowak, Robert M.; Neumann, Łukasz
The next generation sequencing techniques produce a large amount of sequencing data. Some part of the genome are composed of repetitive DNA sequences, which are very problematic for the existing genome assemblers. We propose a modification of the algorithm for a DNA assembly, which uses the relative frequency of reads to properly reconstruct repetitive sequences. The new approach was implemented and tested, as a demonstration of the capability of our software we present some results for model organisms. The new implementation, using a three-layer software architecture was selected, where the presentation layer, data processing layer, and data storage layer were kept separate. Source code as well as demo application with web interface and the additional data are available at project web-page: http://dnaasm.sourceforge.net.
Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy
Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Aparisi, María J; Aller, Elena; Fuster-García, Carla; García-García, Gema; Rodrigo, Regina; Vázquez-Manrique, Rafael P; Blanco-Kelly, Fiona; Ayuso, Carmen; Roux, Anne-Françoise; Jaijo, Teresa; Millán, José M
Usher syndrome is an autosomal recessive disease that associates sensorineural hearing loss, retinitis pigmentosa and, in some cases, vestibular dysfunction. It is clinically and genetically heterogeneous. To date, 10 genes have been associated with the disease, making its molecular diagnosis based on Sanger sequencing, expensive and time-consuming. Consequently, the aim of the present study was to develop a molecular diagnostics method for Usher syndrome, based on targeted next generation sequencing. A custom HaloPlex panel for Illumina platforms was designed to capture all exons of the 10 known causative Usher syndrome genes (MYO7A, USH1C, CDH23, PCDH15, USH1G, CIB2, USH2A, GPR98, DFNB31 and CLRN1), the two Usher syndrome-related genes (HARS and PDZD7) and the two candidate genes VEZT and MYO15A. A cohort of 44 patients suffering from Usher syndrome was selected for this study. This cohort was divided into two groups: a test group of 11 patients with known mutations and another group of 33 patients with unknown mutations. Forty USH patients were successfully sequenced, 8 USH patients from the test group and 32 patients from the group composed of USH patients without genetic diagnosis. We were able to detect biallelic mutations in one USH gene in 22 out of 32 USH patients (68.75%) and to identify 79.7% of the expected mutated alleles. Fifty-three different mutations were detected. These mutations included 21 missense, 8 nonsense, 9 frameshifts, 9 intronic mutations and 6 large rearrangements. Targeted next generation sequencing allowed us to detect both point mutations and large rearrangements in a single experiment, minimizing the economic cost of the study, increasing the detection ratio of the genetic cause of the disease and improving the genetic diagnosis of Usher syndrome patients.
Szabadosova, Viktoria; Boronova, Iveta; Ferenc, Peter; Tothova, Iveta; Bernasovska, Jarmila; Zigova, Michaela; Kmec, Jan; Bernasovsky, Ivan
As the leading cause of congestive heart failure, cardiomyopathy represents a heterogenous group of heart muscle disorders. Despite considerable progress being made in the genetic diagnosis of cardiomyopathy by detection of the mutations in the most prevalent cardiomyopathy genes, the cause remains unsolved in many patients. High-throughput mutation screening in the disease genes for cardiomyopathy is now possible because of using target enrichment followed by next-generation sequencing. The aim of the study was to analyze a panel of genes associated with dilated or hypertrophic cardiomyopathy based on previously published results in order to identify the subjects at risk. The method of next-generation sequencing by IlluminaHiSeq 2500 platform was used to detect sequence variants in 16 individuals diagnosed with dilated or hypertrophic cardiomyopathy. Detected variants were filtered and the functional impact of amino acid changes was predicted by computational programs. DNA samples of the 16 patients were analyzed by whole exome sequencing. We identified six nonsynonymous variants that were shown to be pathogenic in all used prediction softwares: rs3744998 (EPG5), rs11551768 (MGME1), rs148374985 (MURC), rs78461695 (PLEC), rs17158558 (RET) and rs2295190 (SYNE1). Two of the analyzed sequence variants had minor allele frequency (MAF)MURC), rs34580776 (MYBPC3). Our data support the potential role of the detected variants in pathogenesis of dilated or hypertrophic cardiomyopathy; however, the possibility that these variants might not be true disease-causing variants but are susceptibility alleles that require additional mutations or injury to cause the clinical phenotype of disease must be considered. © 2017 Wiley Periodicals, Inc.
Full Text Available Hiroshi Ikeda,1 Kazuya Ishiguro,1 Tetsuyuki Igarashi,1 Yuka Aoki,1 Toshiaki Hayashi,1 Tadao Ishida,1 Yasushi Sasaki,1,2 Takashi Tokino,2 Yasuhisa Shinomura1 1Department of Gastroenterology, Rheumatology and Clinical Immunology, 2Medical Genome Sciences, Research Institute for Frontier Medicine, Sapporo Medical University, Sapporo, Japan Abstract: A 69-year-old man was diagnosed with IgG λ-type multiple myeloma (MM, Stage II in October 2010. He was treated with one cycle of high-dose dexamethasone. After three cycles of bortezomib, the patient exhibited slow elevations in the free light-chain levels and developed a significant new increase of serum M protein. Bone marrow cytogenetic analysis revealed a complex karyotype characteristic of malignant plasma cells. To better understand the molecular pathogenesis of this patient, we sequenced for mutations in the entire coding regions of 409 cancer-related genes using a semiconductor-based sequencing platform. Sequencing analysis revealed eight nonsynonymous somatic mutations in addition to several copy number variants, including CCND1 and RB1. These alterations may play roles in the pathobiology of this disease. This targeted next-generation sequencing can allow for the prediction of drug resistance and facilitate improvements in the treatment of MM patients. Keywords: multiple myeloma, drug resistance, genome-wide sequencing, semiconductor sequencer, target therapy
Fahnøe, Ulrik; Orton, Richard; Höper, Dirk
Next Generation Sequencing (NGS) has rapidly become the preferred technology in nucleotide sequencing, and can be applied to unravel molecular adaptation of RNA viruses such as Classical Swine Fever Virus (CSFV). However, the detection of low frequency variants within viral populations by NGS...... is affected by errors introduced during sample preparation and sequencing, and so far no definitive solution to this problem has been presented....
Andersen, Jeppe D; Pereira, Vania; Pietroni, Carlotta
The simultaneous sequencing of samples from multiple individuals increases the efficiency of next-generation sequencing (NGS) while also reducing costs. Here we describe a novel and simple approach for sequencing DNA from multiple individuals per barcode. Our strategy relies on the endonuclease...... digestion of PCR amplicons prior to library preparation, creating a specific fragment pattern for each individual that can be resolved after sequencing. By using both barcodes and restriction fragment patterns, we demonstrate the ability to sequence the human melanocortin 1 receptor (MC1R) genes from 72...... individuals using only 24 barcoded libraries....
Ravi K Patel
Full Text Available Next generation sequencing (NGS technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools and analysis (statistics tools. A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.
Børsting, Claus; Morling, Niels
articles and presentations at conferences with forensic aspects of NGS. These contributions have demonstrated that NGS offers new possibilities for forensic genetic case work. More information may be obtained from unique samples in a single experiment by analyzing combinations of markers (STRs, SNPs......It has been almost a decade since the first next generation sequencing (NGS) technologies emerged and quickly changed the way genetic research is conducted. Today, full genomes are mapped and published almost weekly and with ever increasing speed and decreasing costs. NGS methods and platforms have...... matured during the last 10 years, and the quality of the sequences has reached a level where NGS is used in clinical diagnostics of humans. Forensic genetic laboratories have also explored NGS technologies and especially in the last year, there has been a small explosion in the number of scientific...
Full Text Available Pipelines for the analysis of Next-Generation Sequencing (NGS data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/.
Soltis Douglas E
Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance
Currás-Freixes, Maria; Piñeiro-Yañez, Elena; Montero-Conde, Cristina; Apellániz-Ruiz, María; Calsina, Bruna; Mancikova, Veronika; Remacha, Laura; Richter, Susan; Ercolino, Tonino; Rogowski-Lehmann, Natalie; Deutschbein, Timo; Calatayud, María; Guadalix, Sonsoles; Álvarez-Escolá, Cristina; Lamas, Cristina; Aller, Javier; Sastre-Marcos, Julia; Lázaro, Conxi; Galofré, Juan C.; Patiño-García, Ana; Meoro-Avilés, Amparo; Balmaña-Gelpi, Judith; De Miguel-Novoa, Paz; Balbín, Milagros; Matías-Guiu, Xavier; Letón, Rocío; Inglada-Pérez, Lucía; Torres-Pérez, Rafael; Roldán-Romero, Juan M.; Rodríguez-Antona, Cristina; Fliedner, Stephanie M J; Opocher, Giuseppe; Pacak, Karel; Korpershoek, Esther; de Krijger, Ronald R.; Vroonen, Laurent; Mannelli, Massimo; Fassnacht, Martin; Beuschlein, Felix; Eisenhofer, Graeme; Cascón, Alberto; Al-Shahrour, Fátima; Robledo, Mercedes
Genetic diagnosis is recommended for all pheochromocytoma and paraganglioma (PPGL) cases, as driver mutations are identified in approximately 80% of the cases. As the list of related genes expands, genetic diagnosis becomes more time-consuming, and targeted next-generation sequencing (NGS) has
Daoud, Hussein; Luco, Stephanie M.; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M.; Graham, Gail E.; Richer, Julie; Armour, Christine; Bulman, Dennis E.; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A.; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M.; Dyment, David A.
Background: Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. Methods: We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children’s Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype–phenotype correlations. Results: Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys–Drash syndrome. Interpretation: This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. PMID:27241786
Daoud, Hussein; Luco, Stephanie M; Li, Rui; Bareke, Eric; Beaulieu, Chandree; Jarinova, Olga; Carson, Nancy; Nikkel, Sarah M; Graham, Gail E; Richer, Julie; Armour, Christine; Bulman, Dennis E; Chakraborty, Pranesh; Geraghty, Michael; Lines, Matthew A; Lacaze-Masmonteil, Thierry; Majewski, Jacek; Boycott, Kym M; Dyment, David A
Rare diseases often present in the first days and weeks of life and may require complex management in the setting of a neonatal intensive care unit (NICU). Exhaustive consultations and traditional genetic or metabolic investigations are costly and often fail to arrive at a final diagnosis when no recognizable syndrome is suspected. For this pilot project, we assessed the feasibility of next-generation sequencing as a tool to improve the diagnosis of rare diseases in newborns in the NICU. We retrospectively identified and prospectively recruited newborns and infants admitted to the NICU of the Children's Hospital of Eastern Ontario and the Ottawa Hospital, General Campus, who had been referred to the medical genetics or metabolics inpatient consult service and had features suggesting an underlying genetic or metabolic condition. DNA from the newborns and parents was enriched for a panel of clinically relevant genes and sequenced on a MiSeq sequencing platform (Illumina Inc.). The data were interpreted with a standard informatics pipeline and reported to care providers, who assessed the importance of genotype-phenotype correlations. Of 20 newborns studied, 8 received a diagnosis on the basis of next-generation sequencing (diagnostic rate 40%). The diagnoses were renal tubular dysgenesis, SCN1A-related encephalopathy syndrome, myotubular myopathy, FTO deficiency syndrome, cranioectodermal dysplasia, congenital myasthenic syndrome, autosomal dominant intellectual disability syndrome type 7 and Denys-Drash syndrome. This pilot study highlighted the potential of next-generation sequencing to deliver molecular diagnoses rapidly with a high success rate. With broader use, this approach has the potential to alter health care delivery in the NICU. © 2016 Canadian Medical Association or its licensors.
Xie, Jing; Lu, Xiongxiong; Wu, Xue; Lin, Xiaoyi; Zhang, Chao; Huang, Xiaofang; Chang, Zhili; Wang, Xinjing; Wen, Chenlei; Tang, Xiaomei; Shi, Minmin; Zhan, Qian; Chen, Hao; Deng, Xiaxing; Peng, Chenghong; Li, Hongwei; Fang, Yuan; Shao, Yang; Shen, Baiyong
Targeted therapies including monoclonal antibodies and small molecule inhibitors have dramatically changed the treatment of cancer over past 10 years. Their therapeutic advantages are more tumor specific and with less side effects. For precisely tailoring available targeted therapies to each individual or a subset of cancer patients, next-generation sequencing (NGS) has been utilized as a promising diagnosis tool with its advantages of accuracy, sensitivity, and high throughput. We developed and validated a NGS-based cancer genomic diagnosis targeting 115 prognosis and therapeutics relevant genes on multiple specimen including blood, tumor tissue, and body fluid from 10 patients with different cancer types. The sequencing data was then analyzed by the clinical-applicable analytical pipelines developed in house. We have assessed analytical sensitivity, specificity, and accuracy of the NGS-based molecular diagnosis. Also, our developed analytical pipelines were capable of detecting base substitutions, indels, and gene copy number variations (CNVs). For instance, several actionable mutations of EGFR,PIK3CA,TP53, and KRAS have been detected for indicating drug susceptibility and resistance in the cases of lung cancer. Our study has shown that NGS-based molecular diagnosis is more sensitive and comprehensive to detect genomic alterations in cancer, and supports a direct clinical use for guiding targeted therapy.
Natalia V Ivanova
Full Text Available DNA-based testing has been gaining acceptance as a tool for authentication of a wide range of food products; however, its applicability for testing of herbal supplements remains contentious.We utilized Sanger and Next-Generation Sequencing (NGS for taxonomic authentication of fifteen herbal supplements representing three different producers from five medicinal plants: Echinacea purpurea, Valeriana officinalis, Ginkgo biloba, Hypericum perforatum and Trigonella foenum-graecum. Experimental design included three modifications of DNA extraction, two lysate dilutions, Internal Amplification Control, and multiple negative controls to exclude background contamination. Ginkgo supplements were also analyzed using HPLC-MS for the presence of active medicinal components.All supplements yielded DNA from multiple species, rendering Sanger sequencing results for rbcL and ITS2 regions either uninterpretable or non-reproducible between the experimental replicates. Overall, DNA from the manufacturer-listed medicinal plants was successfully detected in seven out of eight dry herb form supplements; however, low or poor DNA recovery due to degradation was observed in most plant extracts (none detected by Sanger; three out of seven-by NGS. NGS also revealed a diverse community of fungi, known to be associated with live plant material and/or the fermentation process used in the production of plant extracts. HPLC-MS testing demonstrated that Ginkgo supplements with degraded DNA contained ten key medicinal components.Quality control of herbal supplements should utilize a synergetic approach targeting both DNA and bioactive components, especially for standardized extracts with degraded DNA. The NGS workflow developed in this study enables reliable detection of plant and fungal DNA and can be utilized by manufacturers for quality assurance of raw plant materials, contamination control during the production process, and the final product. Interpretation of results should
Ivanova, Natalia V; Kuzmina, Maria L; Braukmann, Thomas W A; Borisenko, Alex V; Zakharov, Evgeny V
DNA-based testing has been gaining acceptance as a tool for authentication of a wide range of food products; however, its applicability for testing of herbal supplements remains contentious. We utilized Sanger and Next-Generation Sequencing (NGS) for taxonomic authentication of fifteen herbal supplements representing three different producers from five medicinal plants: Echinacea purpurea, Valeriana officinalis, Ginkgo biloba, Hypericum perforatum and Trigonella foenum-graecum. Experimental design included three modifications of DNA extraction, two lysate dilutions, Internal Amplification Control, and multiple negative controls to exclude background contamination. Ginkgo supplements were also analyzed using HPLC-MS for the presence of active medicinal components. All supplements yielded DNA from multiple species, rendering Sanger sequencing results for rbcL and ITS2 regions either uninterpretable or non-reproducible between the experimental replicates. Overall, DNA from the manufacturer-listed medicinal plants was successfully detected in seven out of eight dry herb form supplements; however, low or poor DNA recovery due to degradation was observed in most plant extracts (none detected by Sanger; three out of seven-by NGS). NGS also revealed a diverse community of fungi, known to be associated with live plant material and/or the fermentation process used in the production of plant extracts. HPLC-MS testing demonstrated that Ginkgo supplements with degraded DNA contained ten key medicinal components. Quality control of herbal supplements should utilize a synergetic approach targeting both DNA and bioactive components, especially for standardized extracts with degraded DNA. The NGS workflow developed in this study enables reliable detection of plant and fungal DNA and can be utilized by manufacturers for quality assurance of raw plant materials, contamination control during the production process, and the final product. Interpretation of results should involve an
Balliu, Brunilda; Uh, Hae-Won; Tsonaka, Roula; Boehringer, Stefan; Helmer, Quinta; Houwing-Duistermaat, Jeanine J
In this analysis, we investigate the contributions that linkage-based methods, such as identical-by-descent mapping, can make to association mapping to identify rare variants in next-generation sequencing data. First, we identify regions in which cases share more segments identical-by-descent around a putative causal variant than do controls. Second, we use a two-stage mixed-effect model approach to summarize the single-nucleotide polymorphism data within each region and include them as covariates in the model for the phenotype. We assess the impact of linkage disequilibrium in determining identical-by-descent states between individuals by using markers with and without linkage disequilibrium for the first part and the impact of imputation in testing for association by using imputed genome-wide association studies or raw sequence markers for the second part. We apply the method to next-generation sequencing longitudinal family data from Genetic Association Workshop 18 and identify a significant region at chromosome 3: 40249244-41025167 (p-value = 2.3 × 10(-3)).
Eiler, A.; Drakare, S.; Bertilsson, S.; Pernthaler, J.; Peura, S.; Rofner, C.; Šimek, Karel; Yang, Y.; Znachor, Petr; Lindström, E.S.
Roč. 8, č. 1 (2013), e53516 E-ISSN 1932-6203 R&D Projects: GA ČR(CZ) GA206/08/0015 Institutional support: RVO:60077344 Keywords : phytoplankton * next generation sequencing * diversity Subject RIV: EE - Microbiology, Virology Impact factor: 3.534, year: 2013
Cabanski Christopher R
Full Text Available Abstract Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.
Cottrell, Catherine E; Al-Kateb, Hussam; Bredemeyer, Andrew J; Duncavage, Eric J; Spencer, David H; Abel, Haley J; Lockwood, Christina M; Hagemann, Ian S; O'Guin, Stephanie M; Burcea, Lauren C; Sawyer, Christopher S; Oschwald, Dayna M; Stratman, Jennifer L; Sher, Dorie A; Johnson, Mark R; Brown, Justin T; Cliften, Paul F; George, Bijoy; McIntosh, Leslie D; Shrivastava, Savita; Nguyen, Tudung T; Payton, Jacqueline E; Watson, Mark A; Crosby, Seth D; Head, Richard D; Mitra, Robi D; Nagarajan, Rakesh; Kulkarni, Shashikant; Seibert, Karen; Virgin, Herbert W; Milbrandt, Jeffrey; Pfeifer, John D
Currently, oncology testing includes molecular studies and cytogenetic analysis to detect genetic aberrations of clinical significance. Next-generation sequencing (NGS) allows rapid analysis of multiple genes for clinically actionable somatic variants. The WUCaMP assay uses targeted capture for NGS analysis of 25 cancer-associated genes to detect mutations at actionable loci. We present clinical validation of the assay and a detailed framework for design and validation of similar clinical assays. Deep sequencing of 78 tumor specimens (≥ 1000× average unique coverage across the capture region) achieved high sensitivity for detecting somatic variants at low allele fraction (AF). Validation revealed sensitivities and specificities of 100% for detection of single-nucleotide variants (SNVs) within coding regions, compared with SNP array sequence data (95% CI = 83.4-100.0 for sensitivity and 94.2-100.0 for specificity) or whole-genome sequencing (95% CI = 89.1-100.0 for sensitivity and 99.9-100.0 for specificity) of HapMap samples. Sensitivity for detecting variants at an observed 10% AF was 100% (95% CI = 93.2-100.0) in HapMap mixes. Analysis of 15 masked specimens harboring clinically reported variants yielded concordant calls for 13/13 variants at AF of ≥ 15%. The WUCaMP assay is a robust and sensitive method to detect somatic variants of clinical significance in molecular oncology laboratories, with reduced time and cost of genetic analysis allowing for strategic patient management. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Boyle, Michael D
Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.
Yun, Sajung; Yun, Sijung
Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
Full Text Available Objective. Wilson’s disease is a disorder of copper metabolism which is fatal without treatment. The great number of disease-causing ATP7B gene mutations and the variable clinical presentation of WD may cause a real diagnostic challenge. The emergence of next-generation sequencing provides a time-saving, cost-effective method for full sequencing of the whole ATP7B gene compared to the traditional Sanger sequencing. This is the first report on the clinical use of NGS to examine ATP7B gene. Materials and Methods. We used Ion Torrent Personal Genome Machine in four heterozygous patients for the identification of the other mutations and also in two patients with no known mutation. One patient with acute on chronic liver failure was a candidate for acute liver transplantation. The results were validated by Sanger sequencing. Results. In each case, the diagnosis of Wilson’s disease was confirmed by identifying the mutations in both alleles within 48 hours. One novel mutation (p.Ala1270Ile was found beyond the eight other known ones. The rapid detection of the mutations made possible the prompt diagnosis of WD in a patient with acute liver failure. Conclusions. According to our results we found next-generation sequencing a very useful, reliable, time-saving, and cost-effective method for diagnosing Wilson’s disease in selected cases.
Full Text Available Chronic kidney disease (CKD has a prevalence of approximately 10% in adult populations. CKD can progress to end-stage renal disease (ESRD and this is usually fatal unless some form of renal replacement therapy (chronic dialysis or renal transplantation is provided. There is an inherited predisposition to CKD with several genetic risk markers now identified. The UMOD gene has been associated with CKD of varying aetiologies. An AmpliSeq next generation sequencing panel was developed to facilitate comprehensive sequencing of the UMOD gene, covering exonic and regulatory regions. SNPs and CpG sites in the genomic region encompassing UMOD were evaluated for association with CKD in two studies; the UK Wellcome Trust Case-Control 3 Renal Transplant Dysfunction Study (n = 1088 and UK-ROI GENIE GWAS (n = 1726. A technological comparison of two Ion Torrent machines revealed 100% allele call concordance between S5 XL™ and PGM™ machines. One SNP (rs183962941, located in a non-coding region of UMOD, was nominally associated with ESRD (p = 0.008. No association was identified between UMOD variants and estimated glomerular filtration rate. Analysis of methylation data for over 480,000 CpG sites revealed differential methylation patterns within UMOD, the most significant of these was cg03140788 p = 3.7 x 10-10.
Ng Sarah B
Full Text Available Abstract Background Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses. We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus to capture (enrich for, and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison. Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. Results We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. Conclusions This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon
Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H
Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multip...
Rico, Ciro; Normandeau, Eric; Dion-Côté, Anne-Marie; Rico, María Inés; Côté, Guillaume; Bernatchez, Louis
Next-generation sequencing (NGS) is revolutionising marker development and the rapidly increasing amount of transcriptomes published across a wide variety of taxa is providing valuable sequence databases for the identification of genetic markers without the need to generate new sequences. Microsatellites are still the most important source of polymorphic markers in ecology and evolution. Motivated by our long-term interest in the adaptive radiation of a non-model species complex of whitefishes (Coregonus spp.), in this study, we focus on microsatellite characterisation and multiplex optimisation using transcriptome sequences generated by Illumina® and Roche-454, as well as online databases of Expressed Sequence Tags (EST) for the study of whitefish evolution and demographic history. We identified and optimised 40 polymorphic loci in multiplex PCR reactions and validated the robustness of our analyses by testing several population genetics and phylogeographic predictions using 494 fish from five lakes and 2 distinct ecotypes.
Full Text Available Transcripts are known to be incorporated in particles of DNA viruses belonging to the families of Herpesviridae and Mimiviridae, but the presence of transcripts in other DNA viruses, such as poxviruses, has not been analyzed yet. Therefore, we first established a next-generation-sequencing (NGS-based protocol, enabling the unbiased identification of transcripts in virus particles. Subsequently, we applied our protocol to analyze RNA in an emerging zoonotic member of the Poxviridae family, namely Cowpox virus. Our results revealed the incorporation of 19 viral transcripts, while host identifications were restricted to ribosomal and mitochondrial RNA. Most viral transcripts had an unknown and immunomodulatory function, suggesting that transcript incorporation may be beneficial for poxvirus immune evasion. Notably, the most abundant transcript originated from the D5L/I1R gene that encodes a viral inhibitor of the host cytoplasmic DNA sensing machinery.
Wain, John; Keddy, Karen H.; Hendriksen, Rene S.
The publication of studies using next generation sequencing to analyse large numbers of bacterial isolates from global epidemics is transforming microbiology, epidemiology and public health. The emergence of multidrug resistant Salmonella Typhimurium ST313 is one example. While the epidemiology...... in Africa appears to be human-to-human spread and the association with invasive disease almost absolute, more needs to be done to exclude the possibility of animal reservoirs and to transfer the ability to track all Salmonella infections to the laboratories in the front line. In this mini-review we...
Nimwegen, K.J.M. van; Soest, R.A.; Veltman, J.A.; Nelen, M.R.; Wilt, G.J. van der; Peart-Vissers, L.E.L.M.; Grutters, J.P.C.
BACKGROUND: The substantial technological advancements in next-generation sequencing (NGS), combined with dropping costs, have allowed for a swift diffusion of NGS applications in clinical settings. Although several commercial parties report to have broken the $1000 barrier for sequencing an entire
Full Text Available Epilepsy is a neurological disorder characterized by an increased predisposition for seizures. Although this definition suggests that it is a single disorder, epilepsy encompasses a group of disorders with diverse aetiologies and outcomes. A genetic basis for epilepsy syndromes has been postulated for several decades, with several mutations in specific genes identified that have increased our understanding of the genetic influence on epilepsies. With 70-80% of epilepsy cases identified to have a genetic cause, there are now hundreds of genes identified to be associated with epilepsy syndromes which can be analyzed using next generation sequencing (NGS techniques such as targeted gene panels, whole exome sequencing (WES and whole genome sequencing (WGS. For effective use of these methodologies, diagnostic laboratories and clinicians require information on the relevant workflows including analysis and sequencing depth to understand the specific clinical application and diagnostic capabilities of these gene sequencing techniques. As epilepsy is a complex disorder, the differences associated with each technique influence the ability to form a diagnosis along with an accurate detection of the genetic etiology of the disorder. In addition, for diagnostic testing, an important parameter is the cost-effectiveness and the specific diagnostic outcome of each technique. Here, we review these commonly used NGS techniques to determine their suitability for application to epilepsy genetic diagnostic testing.
Cancer will cause 13 million deaths by the year of 2030, ranking the second leading cause of death worldwide. Previous studies indicate that most of the cancers originate from cells that acquired somatic mutations and evolved as Darwin Theory. Ten biological insights of cancer have been summarized...... recently. Cutting-age technologies like next generation sequencing (NGS) enable exploring cancer genome and evolution much more efficiently. However, integrated cancer genome sequencing studies showed great inter-/intra-tumoral heterogeneity (ITH) and complex evolution patterns beyond the cancer biological...... knowledge we previously know. There is very limited knowledge of East Asia lung cancer genome except enrichment of EGFR mutations and lack of KRAS mutations. We carried out integrated genomic, transcriptomic and methylomic analysis of 335 primary Chinese lung adenocarcinomas (LUAD) and 35 corresponding...
Kato, Takeshi; Morisada, Naoya; Nagase, Hiroaki; Nishiyama, Masahiro; Toyoshima, Daisaku; Nakagawa, Taku; Maruyama, Azusa; Fu, Xue Jun; Nozu, Kandai; Wada, Hiroko; Takada, Satoshi; Iijima, Kazumoto
CDKL5-related encephalopathy is an X-linked dominantly inherited disorder that is characterized by early infantile epileptic encephalopathy or atypical Rett syndrome. We describe a 5-year-old Japanese boy with intractable epilepsy, severe developmental delay, and Rett syndrome-like features. Onset was at 2 months, when his electroencephalogram showed sporadic single poly spikes and diffuse irregular poly spikes. We conducted a genetic analysis using an Illumina® TruSight™ One sequencing panel on a next-generation sequencer. We identified two epilepsy-associated single nucleotide variants in our case: CDKL5 p.Ala40Val and KCNQ2 p.Glu515Asp. CDKL5 p.Ala40Val has been previously reported to be responsible for early infantile epileptic encephalopathy. In our case, the CDKL5 heterozygous mutation showed somatic mosaicism because the boy's karyotype was 46,XY. The KCNQ2 variant p.Glu515Asp is known to cause benign familial neonatal seizures-1, and this variant showed paternal inheritance. Although we believe that the somatic mosaic CDKL5 mutation is mainly responsible for the neurological phenotype in the patient, the KCNQ2 variant might have some neurological effect. Genetic analysis by next-generation sequencing is capable of identifying multiple variants in a patient. Copyright © 2015 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.
Full Text Available BACKGROUND: Pacific white shrimp (Litopenaeus vannamei, the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. METHODOLOGY/PRINCIPAL FINDINGS: This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG categories, 8171 unigenes were assigned into 51 Gene ontology (GO functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. CONCLUSIONS/SIGNIFICANCE: The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei.
Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W
Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.
Lee, Wonseok; Ahn, Sojin; Taye, Mengistie; Sung, Samsun; Lee, Hyun-Jeong; Cho, Seoae; Kim, Heebal
Goats (Capra hircus) are one of the oldest species of domesticated animals. Native Korean goats are a particularly interesting group, as they are indigenous to the area and were raised in the Korean peninsula almost 2,000 years ago. Although they have a small body size and produce low volumes of milk and meat, they are quite resistant to lumbar paralysis. Our study aimed to reveal the distinct genetic features and patterns of selection in native Korean goats by comparing the genomes of native Korean goat and crossbred goat populations. We sequenced the whole genome of 15 native Korean goats and 11 crossbred goats using next-generation sequencing (Illumina platform) to compare the genomes of the two populations. We found decreased nucleotide diversity in the native Korean goats compared to the crossbred goats. Genetic structural analysis demonstrated that the native Korean goat and crossbred goat populations shared a common ancestry, but were clearly distinct. Finally, to reveal the native Korean goat’s selective sweep region, selective sweep signals were identified in the native Korean goat genome using cross-population extended haplotype homozygosity (XP-EHH) and a cross-population composite likelihood ratio test (XP-CLR). As a result, we were able to identify candidate genes for recent selection, such as the CCR3 gene, which is related to lumbar paralysis resistance. Combined with future studies and recent goat genome information, this study will contribute to a thorough understanding of the native Korean goat genome. PMID:27989103
Lee, Wonseok; Ahn, Sojin; Taye, Mengistie; Sung, Samsun; Lee, Hyun-Jeong; Cho, Seoae; Kim, Heebal
Goats ( Capra hircus ) are one of the oldest species of domesticated animals. Native Korean goats are a particularly interesting group, as they are indigenous to the area and were raised in the Korean peninsula almost 2,000 years ago. Although they have a small body size and produce low volumes of milk and meat, they are quite resistant to lumbar paralysis. Our study aimed to reveal the distinct genetic features and patterns of selection in native Korean goats by comparing the genomes of native Korean goat and crossbred goat populations. We sequenced the whole genome of 15 native Korean goats and 11 crossbred goats using next-generation sequencing (Illumina platform) to compare the genomes of the two populations. We found decreased nucleotide diversity in the native Korean goats compared to the crossbred goats. Genetic structural analysis demonstrated that the native Korean goat and crossbred goat populations shared a common ancestry, but were clearly distinct. Finally, to reveal the native Korean goat's selective sweep region, selective sweep signals were identified in the native Korean goat genome using cross-population extended haplotype homozygosity (XP-EHH) and a cross-population composite likelihood ratio test (XP-CLR). As a result, we were able to identify candidate genes for recent selection, such as the CCR3 gene, which is related to lumbar paralysis resistance. Combined with future studies and recent goat genome information, this study will contribute to a thorough understanding of the native Korean goat genome.
Full Text Available Abstract Background Flax (Linum usitatissimum L. is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents. Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from
Tan, BoonFei; Ng, Charmaine; Nshimyimana, Jean Pierre; Loh, Lay Leng; Gin, Karina Y-H; Thompson, Janelle R
Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS) technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU) rRNA hypervariable regions have allowed identification of signature microbial species that serve as bioindicators for sewage contamination in these environments. Beyond amplicon sequencing, metagenomic and metatranscriptomic analyses of microbial communities in fresh water environments reveal the genetic capabilities and interplay of waterborne microorganisms, shedding light on the mechanisms for production and biodegradation of toxins and other contaminants. This review discusses the challenges and benefits of applying NGS-based methods to water quality research and assessment. We will consider the suitability and biases inherent in the application of NGS as a screening tool for assessment of biological risks and discuss the potential and limitations for direct quantitative interpretation of NGS data. Secondly, we will examine case studies from recent literature where NGS based methods have been applied to topics in water quality assessment, including development of bioindicators for sewage pollution and microbial source tracking, characterizing the distribution of toxin and antibiotic resistance genes in water samples, and investigating mechanisms of biodegradation of harmful pollutants that threaten water quality. Finally, we provide a short review of emerging NGS platforms and their potential applications to the next generation of water quality assessment tools.
Full Text Available Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU rRNA hypervariable regions have allowed identification of signature microbial species that serve as bioindicators for sewage contamination in these environments. Beyond amplicon sequencing, metagenomic and metatranscriptomic analyses of microbial communities in fresh water environments reveal the genetic capabilities and interplay of waterborne microorganisms, shedding light on the mechanisms for production and biodegradation of toxins and other contaminants. This review discusses the challenges and benefits of applying NGS-based methods to water quality research and assessment. We will consider the suitability and biases inherent in the application of NGS as a screening tool for assessment of biological risks and discuss the potential and limitations for direct quantitative interpretation of NGS data. Secondly, we will examine case studies from recent literature where NGS based methods have been applied to topics in water quality assessment, including development of bioindicators for sewage pollution and microbial source tracking, characterizing the distribution of toxin and antibiotic resistance genes in water samples, and investigating mechanisms of biodegradation of harmful pollutants that threaten water quality. Finally, we provide a short review of emerging NGS platforms and their potential applications to the next generation of water quality assessment tools.
Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng
Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.
Gian Marco Luna
Full Text Available Aquatic sediments are the repository of a variety of anthropogenic pollutants, including bacteria of fecal origin, that reach the aquatic environment from a variety of sources. Although fecal bacteria can survive for long periods of time in aquatic sediments, the microbiological quality of sediments is almost entirely neglected when performing quality assessments of aquatic ecosystems. Here we investigated the relative abundance, patterns and diversity of fecal bacterial populations in two coastal areas in the Northern Adriatic Sea (Italy: the Po river prodelta (PRP, an estuarine area receiving significant contaminant discharge from one of the largest European rivers and the Lagoon of Venice (LV, a transitional environment impacted by a multitude of anthropogenic stressors. From both areas, several indicators of fecal and sewage contamination were determined in the sediments using Next Generation Sequencing (NGS of 16S rDNA amplicons. At both areas, fecal contamination was high, with fecal bacteria accounting for up to 3.96% and 1.12% of the sediment bacterial assemblages in PRP and LV, respectively. The magnitude of the fecal signature was highest in the PRP site, highlighting the major role of the Po river in spreading microbial contaminants into the adjacent coastal area. In the LV site, fecal pollution was highest in the urban area, and almost disappeared when moving to the open sea. Our analysis revealed a large number of fecal Operational Taxonomic Units (OTU, 960 and 181 in PRP and LV, respectively and showed a different fecal signature in the two areas, suggesting a diverse contribution of human and non-human sources of contamination. These results highlight the potential of NGS techniques to gain insights into the origin and fate of different fecal bacteria populations in aquatic sediments.
Hawkins, Steve F C; Guest, Paul C
The emergence of next-generation sequencing (NGS) over the last 10 years has increased the efficiency of DNA sequencing in terms of speed, ease, and price. However, the exact quantification of a NGS library is crucial in order to obtain good data on sequencing platforms developed by the current market leader Illumina. Different approaches for DNA quantification are available currently and the most commonly used are based on analysis of the physical properties of the DNA through spectrophotometric or fluorometric methods. Although these methods are technically simple, they do not allow exact quantification as can be achieved using a real-time quantitative PCR (qPCR) approach. A qPCR protocol for DNA quantification with applications in NGS library preparation studies is presented here. This can be applied in various fields of study such as medical disorders resulting from nutritional programming disturbances.
Inherited retinal degenerative diseases (RDDs) display wide variation in their mode of inheritance, underlying genetic defects, age of onset, and phenotypic severity. Molecular mechanisms have not been delineated for many retinal diseases, and treatment options are limited. In most instances, genotype-phenotype correlations have not been elucidated because of extensive clinical and genetic heterogeneity. Next-generation sequencing (NGS) methods, including exome, genome, transcriptome and epigenome sequencing, provide novel avenues towards achieving comprehensive understanding of the genetic architecture of RDDs. Whole-exome sequencing (WES) has already revealed several new RDD genes, whereas RNA-Seq and ChIP-Seq analyses are expected to uncover novel aspects of gene regulation and biological networks that are involved in retinal development, aging and disease. In this review, we focus on the genetic characterization of retinal and macular degeneration using NGS technology and discuss the basic framework for further investigations. We also examine the challenges of NGS application in clinical diagnosis and management. PMID:24112618
Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong
Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.
Full Text Available Classification of pediatric brain tumors with unusual histologic and clinical features may be a diagnostic challenge to the pathologist. We present a case of a 12-year-old girl with a primary intracranial tumor. The tumor classification was not certain initially, and the site of origin and clinical behavior were unusual. Genomic characterization of the tumor using a Clinical Laboratory Improvement Amendment (CLIA-certified next-generation sequencing assay assisted in the diagnosis and translated into patient benefit, albeit transient. Our case argues that next generation sequencing may play a role in the pathological classification of pediatric brain cancers and guiding targeted therapy, supporting additional studies of genetically targeted therapeutics.
Van Amerongen, Rosa A.; Retèl, Valesca P.; Coupé, Veerle M.H.; Nederlof, Petra M.; Vogel, Maartje J.; Van Harten, Wim H.
Next-generation sequencing (NGS) has reached the molecular diagnostic laboratories. Although the NGS technology aims to improve the effectiveness of therapies by selecting the most promising therapy, concerns are that NGS testing is expensive and that the 'benefits' are not yet in relation to these
Next generation sequencing technology has become widely available and it offers many new opportunities in vaccine technology. Both human and veterinary medicine has numerous examples of adventitious agents being found in live vaccines. In veterinary medicine a continuing trend is the use of viral ...
Boyle, Michael D.
Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists. PMID:23653696
Advances in Next Generation Sequencing (NGS) allow for rapid development of genomics resources needed to generate molecular diagnostics assays for infectious agents. NGS approaches are particularly helpful for organisms that cannot be cultured, such as the downy mildew pathogens, a group of biotrop...
Børsting, Claus; Morling, Niels
It has been almost a decade since the first next generation sequencing (NGS) technologies emerged and quickly changed the way genetic research is conducted. Today, full genomes are mapped and published almost weekly and with ever increasing speed and decreasing costs. NGS methods and platforms have matured during the last 10 years, and the quality of the sequences has reached a level where NGS is used in clinical diagnostics of humans. Forensic genetic laboratories have also explored NGS technologies and especially in the last year, there has been a small explosion in the number of scientific articles and presentations at conferences with forensic aspects of NGS. These contributions have demonstrated that NGS offers new possibilities for forensic genetic case work. More information may be obtained from unique samples in a single experiment by analyzing combinations of markers (STRs, SNPs, insertion/deletions, mRNA) that cannot be analyzed simultaneously with the standard PCR-CE methods used today. The true variation in core forensic STR loci has been uncovered, and previously unknown STR alleles have been discovered. The detailed sequence information may aid mixture interpretation and will increase the statistical weight of the evidence. In this review, we will give an introduction to NGS and single-molecule sequencing, and we will discuss the possible applications of NGS in forensic genetics. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P
Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.
Satish K Guttikonda
Full Text Available Demand for the commercial use of genetically modified (GM crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.
Brhelova, Eva; Antonova, Mariya; Pardy, Filip; Kocmanova, Iva; Mayer, Jiri; Racil, Zdenek; Lengerova, Martina
Rapid identification and characterization of multidrug-resistant Klebsiella pneumoniae strains is necessary due to the increasing frequency of severe infections in patients. The decreasing cost of next-generation sequencing enables us to obtain a comprehensive overview of genetic information in one step. The aim of this study is to demonstrate and evaluate the utility and scope of the application of web-based databases to next-generation sequenced (NGS) data. The whole genomes of 11 clinical Klebsiella pneumoniae isolates were sequenced using Illumina MiSeq. Selected web-based tools were used to identify a variety of genetic characteristics, such as acquired antimicrobial resistance genes, multilocus sequence types, plasmid replicons, and identify virulence factors, such as virulence genes, cps clusters, urease-nickel clusters and efflux systems. Using web-based tools hosted by the Center for Genomic Epidemiology, we detected resistance to 8 main antimicrobial groups with at least 11 acquired resistance genes. The isolates were divided into eight sequence types (ST11, 23, 37, 323, 433, 495 and 562, and a new one, ST1646). All of the isolates carried replicons of large plasmids. Capsular types, virulence factors and genes coding AcrAB and OqxAB efflux pumps were detected using BIGSdb-Kp, whereas the selected virulence genes, identified in almost all of the isolates, were detected using CLC Genomic Workbench software. Applying appropriate web-based online tools to NGS data enables the rapid extraction of comprehensive information that can be used for more efficient diagnosis and treatment of patients, while data processing is free of charge, easy and time-efficient.
Kim, Hanyoup; Jebrail, Mais J; Sinha, Anupama; Bent, Zachary W; Solberg, Owen D; Williams, Kelly P; Langevin, Stanley A; Renzi, Ronald F; Van De Vreugde, James L; Meagher, Robert J; Schoeniger, Joseph S; Lane, Todd W; Branda, Steven S; Bartsch, Michael S; Patel, Kamlesh D
Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.
Full Text Available Next-generation sequencing (NGS is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM. The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories.
Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar
Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
McCormack, John E.; Maley, James M.; Hird, Sarah M.
divergence in four phylogenetically diverse avian systems using a method for quick and cost-effective generation of primary DNA sequence data using pyrosequencing. NGS data were processed using an analytical pipeline that reduces many reads into two called alleles per locus per individual. Using single...... throughout the genome. Using eight loci found in Zonotrichia and Junco lineages, we were also able to generate a species tree of these sparrow sister genera, demonstrating the potential of this method for generating data amenable to coalescent-based analysis. We discuss improvements that should enhance...
Michael D. Boyle
Full Text Available Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.
Patel, Nirali M; Michelini, Vanessa V; Snell, Jeff M; Balu, Saianand; Hoyle, Alan P; Parker, Joel S; Hayward, Michele C; Eberhard, David A; Salazar, Ashley H; McNeillie, Patrick; Xu, Jia; Huettner, Claudia S; Koyama, Takahiko; Utro, Filippo; Rhrissorrakrai, Kahn; Norel, Raquel; Bilal, Erhan; Royyuru, Ajay; Parida, Laxmi; Earp, H Shelton; Grilley-Olson, Juneko E; Hayes, D Neil; Harvey, Stephen J; Sharpless, Norman E; Kim, William Y
Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human "molecular tumor boards" (MTBs). The purpose of this study was to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials. The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who
Full Text Available BACKGROUND: Transcriptome profiling of patterns of RNA expression is a powerful approach to identify networks of genes that play a role in disease. To date, most mRNA profiling of tissues has been accomplished using microarrays, but next-generation sequencing can offer a richer and more comprehensive picture. METHODOLOGY/PRINCIPAL FINDINGS: ECO is a rare multi-system developmental disorder caused by a homozygous mutation in ICK encoding intestinal cell kinase. We performed gene expression profiling using both cDNA microarrays and next-generation mRNA sequencing (mRNA-seq of skin fibroblasts from ECO-affected subjects. We then validated a subset of differentially expressed transcripts identified by each method using quantitative reverse transcription-polymerase chain reaction (qRT-PCR. Finally, we used gene ontology (GO to identify critical pathways and processes that were abnormal according to each technical platform. Methodologically, mRNA-seq identifies a much larger number of differentially expressed genes with much better correlation to qRT-PCR results than the microarray (r² = 0.794 and 0.137, respectively. Biologically, cDNA microarray identified functional pathways focused on anatomical structure and development, while the mRNA-seq platform identified a higher proportion of genes involved in cell division and DNA replication pathways. CONCLUSIONS/SIGNIFICANCE: Transcriptome profiling with mRNA-seq had greater sensitivity, range and accuracy than the microarray. The two platforms generated different but complementary hypotheses for further evaluation.
Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M
Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Full Text Available Abstract Background Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Results Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. Conclusions The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Korneliussen, Thorfinn Sand; Moltke, Ida; Albrechtsen, Anders
A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima's D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. Howeve......, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the value of the statistic among genomic regions....
Simon H Tausch
Full Text Available The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.We developed RAMBO-K (Read Assignment Method Based On K-mers, a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python are available from http://sourceforge.net/projects/rambok/.
Wei, Lijuan; Xiao, Meili; Hayward, Alice; Fu, Donghui
Next-generation sequencing (NGS) produces numerous (often millions) short DNA sequence reads, typically varying between 25 and 400 bp in length, at a relatively low cost and in a short time. This revolutionary technology is being increasingly applied in whole-genome, transcriptome, epigenome and small RNA sequencing, molecular marker and gene discovery, comparative and evolutionary genomics, and association studies. The Brassica genus comprises some of the most agro-economically important crops, providing abundant vegetables, condiments, fodder, oil and medicinal products. Many Brassica species have undergone the process of polyploidization, which makes their genomes exceptionally complex and can create difficulties in genomics research. NGS injects new vigor into Brassica research, yet also faces specific challenges in the analysis of complex crop genomes and traits. In this article, we review the advantages and limitations of different NGS technologies and their applications and challenges, using Brassica as an advanced model system for agronomically important, polyploid crops. Specifically, we focus on the use of NGS for genome resequencing, transcriptome sequencing, development of single-nucleotide polymorphism markers, and identification of novel microRNAs and their targets. We present trends and advances in NGS technology in relation to Brassica crop improvement, with wide application for sophisticated genomics research into agronomically important polyploid crops.
Full Text Available Next-generation sequencing has become more widely used to reveal genetic defect in monogenic disorders. Retinitis pigmentosa (RP, the leading cause of hereditary blindness worldwide, has been attributed to more than 67 disease-causing genes. Due to the extreme genetic heterogeneity, using general molecular screening alone is inadequate for identifying genetic predispositions in susceptible individuals. In order to identify underlying mutation rapidly, we utilized next-generation sequencing in a four-generation Chinese family with RP. Two affected patients and an unaffected sibling were subjected to whole exome sequencing. Through bioinformatics analysis and direct sequencing confirmation, we identified p.R135W transition in the rhodopsin gene. The mutation was subsequently confirmed to cosegregate with the disease in the family. In this study, our results suggest that whole exome sequencing is a robust method in diagnosing familial hereditary disease.
Anderson, Matthew W.; Schrijver, Iris
In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpreta...
Tinhofer, Ingeborg; Niehr, Franziska; Konschak, Robert; Liebs, Sandra; Munz, Matthias; Stenzinger, Albrecht; Weichert, Wilko; Keilholz, Ulrich; Budach, Volker
The introduction of next-generation sequencing (NGS) in the field of cancer research has boosted worldwide efforts of genome-wide personalized oncology aiming at identifying predictive biomarkers and novel actionable targets. Despite considerable progress in understanding the molecular biology of distinct cancer entities by the use of this revolutionary technology and despite contemporaneous innovations in drug development, translation of NGS findings into improved concepts for cancer treatment remains a challenge. The aim of this article is to describe shortly the NGS platforms for DNA sequencing and in more detail key achievements and unresolved hurdles. A special focus will be given on potential clinical applications of this innovative technique in the field of radiation oncology
Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.
Full Text Available The assessment of genetically modified (GM crops for regulatory approval currently requires a detailed molecular characterization of the DNA sequence and integrity of the transgene locus. In addition, molecular characterization is a critical component of event selection and advancement during product development. Typically, molecular characterization has relied on Southern blot analysis to establish locus and copy number along with targeted sequencing of polymerase chain reaction products spanning any inserted DNA to complete the characterization process. Here we describe the use of next generation (NexGen sequencing and junction sequence analysis bioinformatics in a new method for achieving full molecular characterization of a GM event without the need for Southern blot analysis. In this study, we examine a typical GM soybean [ (L. Merr.] line and demonstrate that this new method provides molecular characterization equivalent to the current Southern blot-based method. We also examine an event containing in vivo DNA rearrangement of multiple transfer DNA inserts to demonstrate that the new method is effective at identifying complex cases. Next generation sequencing and bioinformatics offers certain advantages over current approaches, most notably the simplicity, efficiency, and consistency of the method, and provides a viable alternative for efficiently and robustly achieving molecular characterization of GM crops.
Cseke Leland J
Full Text Available Abstract Background Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. Results We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. Conclusions The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.
Jimenez, Nelson Lopez; Flannick, Jason; Yahyavi, Mani; Li, Jiang; Bardakjian, Tanya; Tonkin, Leath; Schneider, Adele; Sherr, Elliott H; Slavotinek, Anne M
Anophthalmia/microphthalmia (A/M) is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M. We used a pooled sequencing design, together with custom single nucleotide polymorphism (SNP) calling software. We verified predicted sequence alterations using Sanger sequencing. We verified three mutations - c.542delC in SOX2, resulting in p.Pro181Argfs*22, p.Glu105X in OTX2 and p.Cys240X in FOXE3. We found several novel sequence alterations and SNPs that were likely to be non-pathogenic - p.Glu42Lys in CRYBA4, p.Val201Met in FOXE3 and p.Asp291Asn in VSX2. Our analysis methodology gave one false positive result comprising a mutation in PAX6 (c.1268A > T, predicting p.X423LeuextX*15) that was not verified by Sanger sequencing. We also failed to detect one 20 base pair (bp) deletion and one 3 bp duplication in SOX2. Our results demonstrated the power of next-generation sequencing with pooled sample groups for the rapid screening of candidate genes for A/M as we were correctly able to identify disease-causing mutations. However, next-generation sequencing was less useful for small, intragenic deletions and duplications. We did not find mutations in 10/15 patients and conclude that there is a need for further gene discovery in A/M.
Łopacińska-Jørgensen, Joanna M; Pedersen, Jonas Nyvold; Bak, Mads
Next-generation sequencing (NGS) has caused a revolution, yet left a gap: long-range genetic information from native, non-amplified DNA fragments is unavailable. It might be obtained by optical mapping of megabase-sized DNA molecules. Frequently only a specific genomic region is of interest, so...
Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi
Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.
Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.
Gangras, Pooja; Dayeh, Daniel M; Mabin, Justin W; Nakanishi, Kotaro; Singh, Guramrit
Argonaute proteins (AGOs) are loaded with small RNAs as guides to recognize target mRNAs. Since the target specificity heavily depends on the base complementarity between two strands, it is important to identify small guide and long target RNAs bound to AGOs. For this purpose, next-generation sequencing (NGS) technologies have extended our appreciation truly to the nucleotide level. However, the identification of RNAs via NGS from scarce RNA samples remains a challenge. Further, most commercial and published methods are compatible with either small RNAs or long RNAs, but are not equally applicable to both. Therefore, a single method that yields quantitative, bias-free NGS libraries to identify small and long RNAs from low levels of input will be of wide interest. Here, we introduce such a procedure that is based on several modifications of two published protocols and allows robust, sensitive, and reproducible cloning and sequencing of small amounts of RNAs of variable lengths. The method was applied to the identification of small RNAs bound to a purified eukaryotic AGO. Following ligation of a DNA adapter to RNA 3'-end, the key feature of this method is to use the adapter for priming reverse transcription (RT) wherein biotinylated deoxyribonucleotides specifically incorporated into the extended complementary DNA. Such RT products are enriched on streptavidin beads, circularized while immobilized on beads and directly used for PCR amplification. We provide a stepwise guide to generate RNA-Seq libraries, their purification, quantification, validation, and preparation for next-generation sequencing. We also provide basic steps in post-NGS data analyses using Galaxy, an open-source, web-based platform.
Schmidt, Ane Y; Hansen, Thomas V O; Ahlborn, Lise B
Genetic testing of BRCA1/2 includes screening for single nucleotide variants and small insertions/deletions and for larger copy number variations (CNVs), primarily by Sanger sequencing and multiplex ligation-dependent probe amplification (MLPA). With the advent of next-generation sequencing (NGS)...
Grünewald, Inga; Vollbrecht, Claudia; Meinrath, Jeannine; Meyer, Moritz F; Heukamp, Lukas C; Drebber, Uta; Quaas, Alexander; Beutner, Dirk; Hüttenbrink, Karl-Bernd; Wardelmann, Eva; Hartmann, Wolfgang; Büttner, Reinhard; Odenthal, Margarete; Stenner, Markus
Salivary gland cancer represents a heterogeneous group of malignant tumors. Due to their low incidence and the existence of multiple morphologically defined subtypes, these tumors are still poorly understood with regard to their molecular pathogenesis and therapeutically relevant genetic alterations.Performing a systematic and comprehensive study covering 13 subtypes of salivary gland cancer, next generation sequencing was done on 84 tissue samples of parotid gland cancer using multiplex PCR for enrichment of cancer related gene loci covering hotspots of 46 cancer genes.Mutations were identified in 22 different genes. The most frequent alterations affected TP53, followed by RAS genes, PIK3CA, SMAD4 and members of the ERB family. HRAS mutations accounted for more than 90% of RAS mutations, occurring especially in epithelial-myoepithelial carcinomas and salivary duct carcinomas. Additional mutations in PIK3CA also affected particularly epithelial-myoepithelial carcinomas and salivary duct carcinomas, occurring simultaneously with HRAS mutations in almost all cases, pointing to an unknown and therapeutically relevant molecular constellation. Interestingly, 14% of tumors revealed mutations in surface growth factor receptor genes including ALK, HER2, ERBB4, FGFR, cMET and RET, which might prove to be targetable by new therapeutic agents. 6% of tumors revealed mutations in SMAD4.In summary, our data provide novel insight into the fundamental molecular heterogeneity of salivary gland cancer, relevant in terms of tumor classification and the establishment of targeted therapeutic concepts.
Schlaberg, Robert; Chiu, Charles Y; Miller, Steve; Procop, Gary W; Weinstock, George
- Metagenomic sequencing can be used for detection of any pathogens using unbiased, shotgun next-generation sequencing (NGS), without the need for sequence-specific amplification. Proof-of-concept has been demonstrated in infectious disease outbreaks of unknown causes and in patients with suspected infections but negative results for conventional tests. Metagenomic NGS tests hold great promise to improve infectious disease diagnostics, especially in immunocompromised and critically ill patients. - To discuss challenges and provide example solutions for validating metagenomic pathogen detection tests in clinical laboratories. A summary of current regulatory requirements, largely based on prior guidance for NGS testing in constitutional genetics and oncology, is provided. - Examples from 2 separate validation studies are provided for steps from assay design, and validation of wet bench and bioinformatics protocols, to quality control and assurance. - Although laboratory and data analysis workflows are still complex, metagenomic NGS tests for infectious diseases are increasingly being validated in clinical laboratories. Many parallels exist to NGS tests in other fields. Nevertheless, specimen preparation, rapidly evolving data analysis algorithms, and incomplete reference sequence databases are idiosyncratic to the field of microbiology and often overlooked.
Lin, Hsiu Chin; Wong, Yue Him; Tsang, Ling Ming; Chu, Ka Hou; Qian, Pei Yuan; Chan, Benny K K
This is the first study applying Next-Generation Sequencing (NGS) technology to survey the kinds, expression location, and pattern of adhesion-related genes in a membranous-based barnacle. A total of 77,528,326 and 59,244,468 raw sequence reads of total RNA were generated from the prosoma and the basis of Tetraclita japonica formosana, respectively. In addition, 55,441 and 67,774 genes were further assembled and analyzed. The combined sequence data from both body parts generates a total of 79,833 genes of which 47.7% were shared. Homologues of barnacle cement proteins - CP-19K, -52K, and -100K - were found and all were dominantly expressed at the basis where the cement gland complex is located. This is the main area where transcripts of cement proteins and other potential adhesion-related genes were detected. The absence of another common barnacle cement protein, CP-20K, in the adult transcriptome suggested a possible life-stage restricted gene function and/or a different mechanism in adhesion between membranous-based and calcareous-based barnacles. © 2013 © 2013 Taylor & Francis.
Lin, Hsiu Chin
This is the first study applying Next-Generation Sequencing (NGS) technology to survey the kinds, expression location, and pattern of adhesion-related genes in a membranous-based barnacle. A total of 77,528,326 and 59,244,468 raw sequence reads of total RNA were generated from the prosoma and the basis of Tetraclita japonica formosana, respectively. In addition, 55,441 and 67,774 genes were further assembled and analyzed. The combined sequence data from both body parts generates a total of 79,833 genes of which 47.7% were shared. Homologues of barnacle cement proteins - CP-19K, -52K, and -100K - were found and all were dominantly expressed at the basis where the cement gland complex is located. This is the main area where transcripts of cement proteins and other potential adhesion-related genes were detected. The absence of another common barnacle cement protein, CP-20K, in the adult transcriptome suggested a possible life-stage restricted gene function and/or a different mechanism in adhesion between membranous-based and calcareous-based barnacles. © 2013 © 2013 Taylor & Francis.
Gimode, Davis; Odeny, Damaris A; de Villiers, Etienne P; Wanyonyi, Solomon; Dida, Mathews M; Mneney, Emmarold E; Muchugi, Alice; Machuka, Jesse; de Villiers, Santie M
Finger millet is an important cereal crop in eastern Africa and southern India with excellent grain storage quality and unique ability to thrive in extreme environmental conditions. Since negligible attention has been paid to improving this crop to date, the current study used Next Generation Sequencing (NGS) technologies to develop both Simple Sequence Repeat (SSR) and Single Nucleotide Polymorphism (SNP) markers. Genomic DNA from cultivated finger millet genotypes KNE755 and KNE796 was sequenced using both Roche 454 and Illumina technologies. Non-organelle sequencing reads were assembled into 207 Mbp representing approximately 13% of the finger millet genome. We identified 10,327 SSRs and 23,285 non-homeologous SNPs and tested 101 of each for polymorphism across a diverse set of wild and cultivated finger millet germplasm. For the 49 polymorphic SSRs, the mean polymorphism information content (PIC) was 0.42, ranging from 0.16 to 0.77. We also validated 92 SNP markers, 80 of which were polymorphic with a mean PIC of 0.29 across 30 wild and 59 cultivated accessions. Seventy-six of the 80 SNPs were polymorphic across 30 wild germplasm with a mean PIC of 0.30 while only 22 of the SNP markers showed polymorphism among the 59 cultivated accessions with an average PIC value of 0.15. Genetic diversity analysis using the polymorphic SNP markers revealed two major clusters; one of wild and another of cultivated accessions. Detailed STRUCTURE analysis confirmed this grouping pattern and further revealed 2 sub-populations within wild E. coracana subsp. africana. Both STRUCTURE and genetic diversity analysis assisted with the correct identification of the new germplasm collections. These polymorphic SSR and SNP markers are a significant addition to the existing 82 published SSRs, especially with regard to the previously reported low polymorphism levels in finger millet. Our results also reveal an unexploited finger millet genetic resource that can be included in the regional
Full Text Available Finger millet is an important cereal crop in eastern Africa and southern India with excellent grain storage quality and unique ability to thrive in extreme environmental conditions. Since negligible attention has been paid to improving this crop to date, the current study used Next Generation Sequencing (NGS technologies to develop both Simple Sequence Repeat (SSR and Single Nucleotide Polymorphism (SNP markers. Genomic DNA from cultivated finger millet genotypes KNE755 and KNE796 was sequenced using both Roche 454 and Illumina technologies. Non-organelle sequencing reads were assembled into 207 Mbp representing approximately 13% of the finger millet genome. We identified 10,327 SSRs and 23,285 non-homeologous SNPs and tested 101 of each for polymorphism across a diverse set of wild and cultivated finger millet germplasm. For the 49 polymorphic SSRs, the mean polymorphism information content (PIC was 0.42, ranging from 0.16 to 0.77. We also validated 92 SNP markers, 80 of which were polymorphic with a mean PIC of 0.29 across 30 wild and 59 cultivated accessions. Seventy-six of the 80 SNPs were polymorphic across 30 wild germplasm with a mean PIC of 0.30 while only 22 of the SNP markers showed polymorphism among the 59 cultivated accessions with an average PIC value of 0.15. Genetic diversity analysis using the polymorphic SNP markers revealed two major clusters; one of wild and another of cultivated accessions. Detailed STRUCTURE analysis confirmed this grouping pattern and further revealed 2 sub-populations within wild E. coracana subsp. africana. Both STRUCTURE and genetic diversity analysis assisted with the correct identification of the new germplasm collections. These polymorphic SSR and SNP markers are a significant addition to the existing 82 published SSRs, especially with regard to the previously reported low polymorphism levels in finger millet. Our results also reveal an unexploited finger millet genetic resource that can be included
Full Text Available Abstract Background Polyploidy is important from a phylogenetic perspective because of its immense past impact on evolution and its potential future impact on diversification, survival and adaptation, especially in plants. Molecular population genetics studies of polyploid organisms have been difficult because of problems in sequencing multiple-copy nuclear genes using Sanger sequencing. This paper describes a method for sequencing a barcoded mixture of targeted gene regions using next-generation sequencing methods to overcome these problems. Results Using 64 3-bp barcodes, we successfully sequenced three chloroplast and two nuclear gene regions (each of which contained two gene copies with up to two alleles per individual in a total of 60 individuals across 11 species of Australian Poa grasses. This method had high replicability, a low sequencing error rate (after appropriate quality control and a low rate of missing data. Eighty-eight percent of the 320 gene/individual combinations produced sequence reads, and >80% of individuals produced sufficient reads to detect all four possible nuclear alleles of the homeologous nuclear loci with 95% probability. We applied this method to a group of sympatric Australian alpine Poa species, which we discovered to share an allopolyploid ancestor with a group of American Poa species. All markers revealed extensive allele sharing among the Australian species and so we recommend that the current taxonomy be re-examined. We also detected hypermutation in the trnH-psbA marker, suggesting it should not be used as a land plant barcode region. Some markers indicated differentiation between Tasmanian and mainland samples. Significant positive spatial genetic structure was detected at Conclusions Our results demonstrate that 454 sequencing of barcoded amplicon mixtures can be used to reliably sample all alleles of homeologous loci in polyploid species and successfully investigate phylogenetic relationships among
BoonFei eTan; Charmaine Marie Ng; Jean Pierre Nshimyimana; Jean Pierre Nshimyimana; Lay-Leng eLoh; Lay-Leng eLoh; Karina Yew-Hoong Gin; Janelle Renee Thompson; Janelle Renee Thompson
Water quality is an emergent property of a complex system comprised of interacting microbial populations and introduced microbial and chemical contaminants. Studies leveraging next-generation sequencing (NGS) technologies are providing new insights into the ecology of microbially mediated processes that influence fresh water quality such as algal blooms, contaminant biodegradation, and pathogen dissemination. In addition, sequencing methods targeting small subunit (SSU) rRNA hypervariable reg...
Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N
Ralf, Arwin; Montiel González, Diego; Zhong, Kaiyin; Kayser, Manfred
Next-generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y-chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy, and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data, it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.
Nabakishore Nayak; Mahesh Chanda Sahu
Next-generation sequencing (NGS) has the potential to provide typing results and detect resistance genes in a single assay, thus guiding timely treatment decisions and allowing rapid tracking of transmission of resistant clones. We can be evaluated the performance of a new NGS assay during an outbreak of sequence type 131 (ST131) Escherichia coli infections in a teaching hospital. The assay will be performed on 100 extended-spectrum- beta-lactamase (ESBL) E. coli isolates collected from UTI d...
Full Text Available Forty-two cytopathic effect (CPE-positive isolates were collected from 2008 to 2012. All isolates could not be identified for known viral pathogens by routine diagnostic assays. They were pooled into 8 groups of 5-6 isolates to reduce the sequencing cost. Next-generation sequencing (NGS was conducted for each group of mixed samples, and the proposed data analysis pipeline was used to identify viral pathogens in these mixed samples. Polymerase chain reaction (PCR or enzyme-linked immunosorbent assay (ELISA was individually conducted for each of these 42 isolates depending on the predicted viral types in each group. Two isolates remained unknown after these tests. Moreover, iteration mapping was implemented for each of these 2 isolates, and predicted human parechovirus (HPeV in both. In summary, our NGS pipeline detected the following viruses among the 42 isolates: 29 human rhinoviruses (HRVs, 10 HPeVs, 1 human adenovirus (HAdV, 1 echovirus and 1 rotavirus. We then focused on the 10 identified Taiwanese HPeVs because of their reported clinical significance over HRVs. Their genomes were assembled and their genetic diversity was explored. One novel 6-bp deletion was found in one HPeV-1 virus. In terms of nucleotide heterogeneity, 64 genetic variants were detected from these HPeVs using the mapped NGS reads. Most importantly, a recombination event was found between our HPeV-3 and a known HPeV-4 strain in the database. Similar event was detected in the other HPeV-3 strains in the same clade of the phylogenetic tree. These findings demonstrated that the proposed NGS data analysis pipeline identified unknown viruses from the mixed clinical samples, revealed their genetic identity and variants, and characterized their genetic features in terms of viral evolution.
Elbeaino, Toufic; Belghacem, Imen; Mascia, Tiziana; Gallitelli, Donato; Digiaro, Michele
Next-generation sequencing (NGS) allowed the assembly of the complete RNA-1 and RNA-2 sequences of a grapevine isolate of artichoke Italian latent virus (AILV). RNA-1 and RNA-2 are 7,338 and 4,630 nucleotides in length excluding the 3' terminal poly(A) tail, and encode two putative polyproteins of 255.8 kDa (p1) and 149.6 kDa (p2), respectively. All conserved motifs and predicted cleavage sites, typical for nepovirus polyproteins, were found in p1 and p2. AILV p1 and p2 share high amino acid identity with their homologues in beet ringspot virus (p1, 81% and p2, 71%), tomato black ring virus (p1, 79% and p2, 63%), grapevine Anatolian ringspot virus (p1, 65% and p2, 63%), and grapevine chrome mosaic virus (p1, 60% and p2, 54%), and to a lesser extent with other grapevine nepoviruses of subgroup A and C. Phylogenetic and sequence analyses, all confirmed the strict relationship of AILV with members classified in subgroup B of genus Nepovirus.
Full Text Available Risk assessment of tick-borne and zoonotic disease emergence necessitates sound knowledge of the particular microorganisms circulating within the communities of these major vectors. Assessment of pathogens carried by wild ticks must be performed without a priori, to allow for the detection of new or unexpected agents.We evaluated the potential of Next-Generation Sequencing techniques (NGS to produce an inventory of parasites carried by questing ticks. Sequences corresponding to parasites from two distinct genera were recovered in Ixodes ricinus ticks collected in Eastern France: Babesia spp. and Theileria spp. Four Babesia species were identified, three of which were zoonotic: B. divergens, Babesia sp. EU1 and B. microti; and one which infects cattle, B. major. This is the first time that these last two species have been identified in France. This approach also identified new sequences corresponding to as-yet unknown organisms similar to tropical Theileria species.Our findings demonstrate the capability of NGS to produce an inventory of live tick-borne parasites, which could potentially be transmitted by the ticks, and uncovers unexpected parasites in Western Europe.
Lopez Jimenez Nelson
Full Text Available Abstract Background Anophthalmia/microphthalmia (A/M is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M. Methods We used a pooled sequencing design, together with custom single nucleotide polymorphism (SNP calling software. We verified predicted sequence alterations using Sanger sequencing. Results We verified three mutations - c.542delC in SOX2, resulting in p.Pro181Argfs*22, p.Glu105X in OTX2 and p.Cys240X in FOXE3. We found several novel sequence alterations and SNPs that were likely to be non-pathogenic - p.Glu42Lys in CRYBA4, p.Val201Met in FOXE3 and p.Asp291Asn in VSX2. Our analysis methodology gave one false positive result comprising a mutation in PAX6 (c.1268A > T, predicting p.X423LeuextX*15 that was not verified by Sanger sequencing. We also failed to detect one 20 base pair (bp deletion and one 3 bp duplication in SOX2. Conclusions Our results demonstrated the power of next-generation sequencing with pooled sample groups for the rapid screening of candidate genes for A/M as we were correctly able to identify disease-causing mutations. However, next-generation sequencing was less useful for small, intragenic deletions and duplications. We did not find mutations in 10/15 patients and conclude that there is a need for further gene discovery in A/M.
McDaniel, Andrew S.; Stall, Jennifer N.; Hovelson, Daniel H.; Cani, Andi K.; Liu, Chia-Jen; Tomlins, Scott A.; Cho, Kathleen R.
Importance High-grade serous carcinoma (HGSC) is the most prevalent and lethal form of ovarian cancer. HGSCs frequently arise in the distal fallopian tubes rather than the ovary, developing from small precursor lesions called serous tubal intraepithelial carcinomas (TICs or more specifically STICs). While STICs have been reported to harbor TP53 mutations, detailed molecular characterizations of these lesions are lacking. Observations We performed targeted next generation sequencing (NGS) on formalin-fixed, paraffin- embedded tissue from four women, two with HGSC and two with uterine endometrioid carcinoma (UEC) who were diagnosed with synchronous STICs. We detected concordant mutations in both HGSCs with synchronous STICs, including TP53 mutations as well as assumed germline BRCA1/2 alterations, confirming a clonal relationship between these lesions. NGS confirmed the presence of a STIC clonally unrelated to one case of UEC. NGS of the other tubal lesion diagnosed as a STIC unexpectedly supported the lesion as a micrometastasis from the associated UEC. Conclusions and Relevance We demonstrate that targeted NGS can identify genetic lesions in minute lesions such as TICs, and confirm TP53 mutations as early driving events for HGSC. NGS also demonstrated unexpected relationships between presumed STICs and synchronous carcinomas, suggesting potential diagnostic and translational research applications. PMID:26181193
Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat
The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.
Chen, Xue; Sheng, Xunlun; Liu, Xiaoxing; Li, Huiping; Liu, Yani; Rong, Weining; Ha, Shaoping; Liu, Wenzhou; Kang, Xiaoli; Zhao, Kanxing; Zhao, Chen
USH2A mutations have been implicated in the disease etiology of several inherited diseases, including Usher syndrome type 2 (USH2), nonsyndromic retinitis pigmentosa (RP), and nonsyndromic deafness. The complex genetic and phenotypic spectrums relevant to USH2A defects make it difficult to manage patients with such mutations. In the present study, we aim to determine the genetic etiology and to characterize the correlated clinical phenotypes for three Chinese pedigrees with nonsyndromic RP, one with RP sine pigmento (RPSP), and one with USH2. Family histories and clinical details for all included patients were reviewed. Ophthalmic examinations included best corrected visual acuities, visual field measurements, funduscopy, and electroretinography. Targeted next-generation sequencing (NGS) was applied using two sequence capture arrays to reveal the disease causative mutations for each family. Genotype-phenotype correlations were also annotated. Seven USH2A mutations, including four missense substitutions (p.P2762A, p.G3320C, p.R3719H, and p.G4763R), two splice site variants (c.8223+1G>A and c.8559-2T>C), and a nonsense mutation (p.Y3745*), were identified as disease causative in the five investigated families, of which three reported to have consanguineous marriage. Among all seven mutations, six were novel, and one was recurrent. Two homozygous missense mutations (p.P2762A and p.G3320C) were found in one individual family suggesting a potential double hit effect. Significant phenotypic divergences were revealed among the five families. Three families of the five families were affected with early, moderated, or late onset RP, one with RPSP, and the other one with USH2. Our study expands the genotypic and phenotypic variability relevant to USH2A mutations, which would help with a clear insight into the complex genetic and phenotypic spectrums relevant to USH2A defects, and is complementary for a better management of patients with such mutations. We have also
Qiu, Biyuan; Ma, Tao; Peng, Chunyan; Zheng, Xiaoqin; Yang, Jiyun
The diagnosis of oculocutaneous albinism (OCA) is established using clinical signs and symptoms. OCA is, however, a highly genetically heterogeneous disease with mutations identified in at least nineteen unique genes, many of which produce overlapping phenotypic traits. Thus, differentiating genetic OCA subtypes for diagnoses and genetic counseling is challenging, based on clinical presentation alone, and would benefit from a comprehensive molecular diagnostic. To develop and validate a more comprehensive, targeted, next-generation-sequencing-based diagnostic for the identification of OCA-causing variants. The genomic DNA samples from 28 OCA probands were analyzed by targeted next-generation sequencing (NGS), and the candidate variants were confirmed through Sanger sequencing. We observed mutations in the TYR, OCA2, and SLC45A2 genes in 25/28 (89%) patients with OCA. We identified 38 pathogenic variants among these three genes, including 5 novel variants: c.1970G>T (p.Gly657Val), c.1669A>C (p.Thr557Pro), c.2339-2A>C, and c.1349C>G (p.Thr450Arg) in OCA2; c.459_470delTTTTGCTGCCGA (p.Ala155_Phe158del) in SLC45A2. Our findings expand the mutational spectrum of OCA in the Chinese population, and the assay we developed should be broadly useful as a molecular diagnostic, and as an aid for genetic counseling for OCA patients.
Szymanski, Maciej; Karlowski, Wojciech M
In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
Connor, Ashton A; Gallinger, Steven
Pancreatic ductal adenocarcinoma (PDAC) has the highest mortality rate of all epithelial malignancies and a paradoxically rising incidence rate. Clinical translation of next generation sequencing (NGS) of tumour and germline samples may ameliorate outcomes by identifying prognostic and predictive genomic and transcriptomic features in appreciable fractions of patients, facilitating enrolment in biomarker-matched trials. Areas covered: The literature on precision oncology is reviewed. It is found that outcomes may be improved across various malignancies, and it is suggested that current issues of adequate tissue acquisition, turnaround times, analytic expertise and clinical trial accessibility may lessen as experience accrues. Also reviewed are PDAC genomic and transcriptomic NGS studies, emphasizing discoveries of promising biomarkers, though these require validation, and the fraction of patients that will benefit from these outside of the research setting is currently unknown. Expert commentary: Clinical use of NGS with PDAC should be used in investigational contexts in centers with multidisciplinary expertise in cancer sequencing and pancreatic cancer management. Biomarker directed studies will improve our understanding of actionable genomic variation in PDAC, and improve outcomes for this challenging disease.
Xue, J J; Xue, J F; Xue, H Q; Guo, Y Y; Liu, Y; Ouyang, N
Albinism is a diverse group of hypopigmentary disorders caused by multiple-genetic defects. The genetic diagnosis of patients affected with albinism by Sanger sequencing is often complex, expensive, and time-consuming. In this study, we performed targeted next-generation sequencing to screen for 16 genes in a patient with albinism, and identified 21 genetic variants, including 19 known single nucleotide polymorphisms, one novel missense mutation (c.1456 G>A), and one disease-causing mutation (c.478 G>C). The novel mutation was not observed in 100 controls, and was predicted to be a damaging mutation by SIFT and Polyphen. Thus, we identified a novel mutation in SLC45A2 in a Chinese family, expanding the mutational spectrum of albinism. Our results also demonstrate that targeted next-generation sequencing is an effective genetic test for albinism.
Milicchio, Franco; Rose, Rebecca; Bian, Jiang; Min, Jae; Prosperi, Mattia
High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a 'cultural' gap between the end user and the developer. Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users' needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations. In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare.
Martínez, Francisco; Caro-Llopis, Alfonso; Roselló, Mónica; Oltra, Silvestre; Mayo, Sonia; Monfort, Sandra; Orellana, Carmen
Intellectual disability is a very complex condition where more than 600 genes have been reported. Due to this extraordinary heterogeneity, a large proportion of patients remain without a specific diagnosis and genetic counselling. The need for new methodological strategies in order to detect a greater number of mutations in multiple genes is therefore crucial. In this work, we screened a large panel of 1256 genes (646 pathogenic, 610 candidate) by next-generation sequencing to determine the molecular aetiology of syndromic intellectual disability. A total of 92 patients, negative for previous genetic analyses, were studied together with their parents. Clinically relevant variants were validated by conventional sequencing. A definitive diagnosis was achieved in 29 families by testing the 646 known pathogenic genes. Mutations were found in 25 different genes, where only the genes KMT2D, KMT2A and MED13L were found mutated in more than one patient. A preponderance of de novo mutations was noted even among the X linked conditions. Additionally, seven de novo probably pathogenic mutations were found in the candidate genes AGO1, JARID2, SIN3B, FBXO11, MAP3K7, HDAC2 and SMARCC2. Altogether, this means a diagnostic yield of 39% of the cases (95% CI 30% to 49%). The developed panel proved to be efficient and suitable for the genetic diagnosis of syndromic intellectual disability in a clinical setting. Next-generation sequencing has the potential for high-throughput identification of genetic variations, although the challenges of an adequate clinical interpretation of these variants and the knowledge on further unknown genes causing intellectual disability remain to be solved. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Marroni, Fabio; Pinosio, Sara; Morgante, Michele
Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by individual Sanger sequencing. The aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method, we will explain in detail the possible experimental and analytical approaches and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled NGS can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity, and Tajima's D. Finally, we will discuss applications and future perspectives of the multiplexed NGS approach.
Bardak, H; Gunay, M; Ercalik, Y; Bardak, Y; Ozbas, H; Bagci, O
Age-related macular degeneration (AMD) is the leading cause of blindness in developed countries. It is a complex disease with both genetic and environmental risk factors. To improve clinical management of this condition, it is important to develop risk assessment and prevention strategies for environmental influences, and establish a more effective treatment approach. The aim of the present study was to investigate age-related maculopathy susceptibility protein 2 (ARMS2) gene sequences among Turkish patients with exudative AMD. In addition to 39 advanced exudative AMD patients, 250 healthy individuals for whom exome sequencing data were available were included as a control group. Patients with a history of known environmental and systemic AMD risk factors were excluded. Genomic DNA was isolated from peripheral blood and analyzed using next-generation sequencing. All coding exons of the ARMS2 gene were assessed. Three different ARMS2 sequence variations (rs10490923, rs2736911, and rs10490924) were identified in both the patient and control group. Within the control group, two further ARMS2 gene variants (rs7088128 and rs36213074) were also detected. Logistic regression analysis revealed a relationship between the rs10490924 polymorphism and AMD in the Turkish population.
Full Text Available The discovery of prostate cancer biomarkers has been boosted by the advent of next-generation sequencing (NGS technologies. Nevertheless, many challenges still exist in exploiting the flood of sequence data and translating them into routine diagnostics and prognosis of prostate cancer. Here we review the recent developments in prostate cancer biomarkers by high throughput sequencing technologies. We highlight some fundamental issues of translational bioinformatics and the potential use of cloud computing in NGS data processing for the improvement of prostate cancer treatment.
Full Text Available The application of next-generation sequencing (NGS to characterize cancer genomes has resulted in the discovery of numerous genetic markers. Consequently, the number of markers that warrant routine screening in molecular diagnostic laboratories, often from limited tumor material, has increased. This increased demand has been difficult to manage by traditional low- and/or medium-throughput sequencing platforms. Massively parallel sequencing capabilities of NGS provide a much-needed alternative for mutation screening in multiple genes with a single low investment of DNA. However, implementation of NGS technologies, most of which are for research use only (RUO, in a diagnostic laboratory, needs extensive validation in order to establish Clinical Laboratory Improvement Amendments (CLIA and College of American Pathologists (CAP-compliant performance characteristics. Here, we have reviewed approaches for validation of NGS technology for routine screening of tumors. We discuss the criteria for selecting gene markers to include in the NGS panel and the deciding factors for selecting target capture approaches and sequencing platforms. We also discuss challenges in result reporting, storage and retrieval of the voluminous sequencing data and the future potential of clinical NGS.
Mouatt, Julia Thidamarth Vilstrup
enrichment methods and the massive throughput and latest advances within DNA sequencing, the field of ancient DNA has flourished in later years. Those advances have even enabled the sequencing of complete genomes from the past, moving the field into genomic sciences. In this thesis we have used these latest......The sequencing of ancient DNA provides perspectives on the genetic history of past populations and extinct species. However, ancient DNA research presents specific limitations mostly due to DNA survival, damage and contamination. Yet with stringent laboratory procedures, the sensitivity of target...... developments within ancient DNA research, including target enrichment capture and Next-Generation Sequencing, to address a range of evolutionary questions related to two major mammalian groups, equids and rodents. In particular we have resolved phylogenetic relationships within equids using complete mitochond...
Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T.; Scarpa, Aldo
Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for
Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard
Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.
Dheilly, Nolwenn M; Adema, Coen; Raftos, David A; Gourbal, Benjamin; Grunau, Christoph; Du Pasquier, Louis
Next generation sequencing (NGS) allows for the rapid, comprehensive and cost effective analysis of entire genomes and transcriptomes. NGS provides approaches for immune response gene discovery, profiling gene expression over the course of parasitosis, studying mechanisms of diversification of immune receptors and investigating the role of epigenetic mechanisms in regulating immune gene expression and/or diversification. NGS will allow meaningful comparisons to be made between organisms from different taxa in an effort to understand the selection of diverse strategies for host defence under different environmental pathogen pressures. At the same time, it will reveal the shared and unique components of the immunological toolkit and basic functional aspects that are essential for immune defence throughout the living world. In this review, we argue that NGS will revolutionize our understanding of immune responses throughout the animal kingdom because the depth of information it provides will circumvent the need to concentrate on a few "model" species. Copyright © 2014 Elsevier Ltd. All rights reserved.
Belstrøm, Daniel; Paster, Bruce J; Fiehn, Nils-Erik
Identification using Next Generation Sequencing) for comparison of the salivary microbiota in patients with periodontitis, patients with dental caries, and orally healthy individuals. The hypothesis was that this method could add on to the existing knowledge on salivary bacterial profiles in oral health...... and disease. DESIGN: Stimulated saliva samples (n=30) were collected from 10 patients with untreated periodontitis, 10 patients with untreated dental caries, and 10 orally healthy individuals. Salivary microbiota was analyzed using HOMINGS and statistical analysis was performed using Kruskal-Wallis test...... with Benjamini-Hochberg's correction. RESULTS: From a total of 30 saliva samples, a mean number of probe targets of 205 (range 120-353) were identified, and a statistically significant higher mean number of targets was registered in samples from patients with periodontitis (mean 220, range 143-306) and dental...
Yohe, Sophia; Hauge, Adam; Bunjer, Kari; Kemmer, Teresa; Bower, Matthew; Schomaker, Matthew; Onsongo, Getiria; Wilson, Jon; Erdmann, Jesse; Zhou, Yi; Deshpande, Archana; Spears, Michael D; Beckman, Kenneth; Silverstein, Kevin A T; Thyagarajan, Bharat
Although next-generation sequencing (NGS) can revolutionize molecular diagnostics, several hurdles remain in the implementation of this technology in clinical laboratories. To validate and implement an NGS panel for genetic diagnosis of more than 100 inherited diseases, such as neurologic conditions, congenital hearing loss and eye disorders, developmental disorders, nonmalignant diseases treated by hematopoietic cell transplantation, familial cancers, connective tissue disorders, metabolic disorders, disorders of sexual development, and cardiac disorders. The diagnostic gene panels ranged from 1 to 54 genes with most of panels containing 10 genes or fewer. We used a liquid hybridization-based, target-enrichment strategy to enrich 10 067 exons in 568 genes, followed by NGS with a HiSeq 2000 sequencing system (Illumina, San Diego, California). We successfully sequenced 97.6% (9825 of 10 067) of the targeted exons to obtain a minimum coverage of 20× at all bases. We demonstrated 100% concordance in detecting 19 pathogenic single-nucleotide variations and 11 pathogenic insertion-deletion mutations ranging in size from 1 to 18 base pairs across 18 samples that were previously characterized by Sanger sequencing. Using 4 pairs of blinded, duplicate samples, we demonstrated a high degree of concordance (>99%) among the blinded, duplicate pairs. We have successfully demonstrated the feasibility of using the NGS platform to multiplex genetic tests for several rare diseases and the use of cloud computing for bioinformatics analysis as a relatively low-cost solution for implementing NGS in clinical laboratories.
Barabaschi, Delfina; Tondelli, Alessandro; Desiderio, Francesca; Volante, Andrea; Vaccino, Patrizia; Valè, Giampiero; Cattivelli, Luigi
The genomic revolution of the past decade has greatly improved our understanding of the genetic make-up of living organisms. The sequencing of crop genomes has completely changed our vision and interpretation of genome organization and evolution. Re-sequencing allows the identification of an unlimited number of markers as well as the analysis of germplasm allelic diversity based on allele mining approaches. High throughput marker technologies coupled with advanced phenotyping platforms provide new opportunities for discovering marker-trait associations which can sustain genomic-assisted breeding. The availability of genome sequencing information is enabling genome editing (site-specific mutagenesis), to obtain gene sequences desired by breeders. This review illustrates how next generation sequencing-derived information can be used to tailor genomic tools for different breeders' needs to revolutionize crop improvement. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Full Text Available In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.
Watters, Kyle E; Lucks, Julius B
Mapping RNA structure with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry has proven to be a versatile method for characterizing RNA structure in a variety of contexts. SHAPE reagents covalently modify RNAs in a structure-dependent manner to create adducts at the 2'-OH group of the ribose backbone at nucleotides that are structurally flexible. The positions of these adducts are detected using reverse transcriptase (RT) primer extension, which stops one nucleotide before the modification, to create a pool of cDNAs whose lengths reflect the location of SHAPE modification. Quantification of the cDNA pools is used to estimate the "reactivity" of each nucleotide in an RNA molecule to the SHAPE reagent. High reactivities indicate nucleotides that are structurally flexible, while low reactivities indicate nucleotides that are inflexible. These SHAPE reactivities can then be used to infer RNA structures by restraining RNA structure prediction algorithms. Here, we provide a state-of-the-art protocol describing how to perform in vitro RNA structure probing with SHAPE chemistry using next-generation sequencing to quantify cDNA pools and estimate reactivities (SHAPE-Seq). The use of next-generation sequencing allows for higher throughput, more consistent data analysis, and multiplexing capabilities. The technique described herein, SHAPE-Seq v2.0, uses a universal reverse transcription priming site that is ligated to the RNA after SHAPE modification. The introduced priming site allows for the structural analysis of an RNA independent of its sequence.
Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Full Text Available Several transcription factors (TFs coordinate to regulate expression of specific genes at the transcriptional level. In Arabidopsis thaliana it is estimated that approximately 10% of all genes encode TFs or TF-like proteins. It is important to identify target genes that are directly regulated by TFs in order to understand the complete picture of a plant’s transcriptome profile. Here, we investigate the role of the LONG HYPOCOTYL5 (HY5 transcription factor that acts as a regulator of photomorphogenesis. We used an in vitro genomic DNA binding assay coupled with immunoprecipitation and next-generation sequencing (gDB-seq instead of the in vivo chromatin immunoprecipitation (ChIP-based methods. The results demonstrate that the HY5-binding motif predicted here was similar to the motif reported previously and that in vitro HY5-binding loci largely overlapped with the HY5-targeted candidate genes identified in previous ChIP-chip analysis. By combining these results with microarray analysis, we identified hundreds of HY5-binding genes that were differentially expressed in hy5. We also observed delayed induction of some transcripts of HY5-binding genes in hy5 mutants in response to blue-light exposure after dark treatment. Thus, an in vitro gDNA-binding assay coupled with sequencing is a convenient and powerful method to bridge the gap between identifying TF binding potential and establishing function.
Full Text Available Molecular characterization technology in genetically modified organisms, in addition to how transgenic biotechnologies are developed now require full transparency to assess the risk to living modified and non-modified organisms. Next generation sequencing (NGS methodology is suggested as an effective means in genome characterization and detection of transgenic insertion locations. In the present study, we applied NGS to insert transgenic loci, specifically the epidermal growth factor (EGF in genetically modified rice cells. A total of 29.3 Gb (~72× coverage was sequenced with a 2 × 150 bp paired end method by Illumina HiSeq2500, which was consecutively mapped to the rice genome and T-vector sequence. The compatible pairs of reads were successfully mapped to 10 loci on the rice chromosome and vector sequences were validated to the insertion location by polymerase chain reaction (PCR amplification. The EGF transgenic site was confirmed only on chromosome 4 by PCR. Results of this study demonstrated the success of NGS data to characterize the rice genome. Bioinformatics analyses must be developed in association with NGS data to identify highly accurate transgenic sites.
Full Text Available Novel DNA sequencing techniques, referred to as “next-generation” sequencing (NGS, provide high speed and throughput that can produce an enormous volume of sequences with many possible applications in research and diagnostic settings. In this article, we provide an overview of the many applications of NGS in diagnostic virology. NGS techniques have been used for high-throughput whole viral genome sequencing, such as sequencing of new influenza viruses, for detection of viral genome variability and evolution within the host, such as investigation of human immunodeficiency virus and human hepatitis C virus quasispecies, and monitoring of low-abundance antiviral drug-resistance mutations. NGS techniques have been applied to metagenomics-based strategies for the detection of unexpected disease-associated viruses and for the discovery of novel human viruses, including cancer-related viruses. Finally, the human virome in healthy and disease conditions has been described by NGS-based metagenomics.
Full Text Available Qing-Xuan Wang, En-Dong Chen, Ye-Feng Cai, Yi-Li Zhou, Zhou-Ci Zheng, Ying-Hao Wang, Yi-Xiang Jin, Wen-Xu Jin, Xiao-Hua Zhang, Ou-Chen Wang Department of Oncology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China Purpose: Thyroid cancer is the most frequent malignancies of the endocrine system, and it has became the fastest growing type of cancer worldwide. Much still remains unknown about the molecular mechanisms of thyroid cancer. Studies have found that some certain relationship between ARAP3 and human cancer. However, the role of ARAP3 in thyroid cancer has not been well explained. This study aimed to investigate the role of ARAP3 gene in papillary thyroid carcinoma. Methods: Whole exon sequence and whole genome sequence of primary papillary thyroid carcinoma (PTC samples and matched adjacent normal thyroid tissue samples were performed and then bioinformatics analysis was carried out. PTC cell lines (TPC1, BCPAP, and KTC-1 with transfection of small interfering RNA were used to investigate the functions of ARAP3 gene, including cell proliferation assay, colony formation assay, migration assay, and invasion assay. Results: Using next-generation sequence and bioinformatics analysis, we found ARAP3 genes may play an important role in thyroid cancer. Downregulation of ARAP3 significantly suppressed PTC cell lines (TPC1, BCPAP, and KTC-1, cell proliferation, colony formation, migration, and invasion. Conclusion: This study indicated that ARAP3 genes have important biological implications and may act as a potentially drugable target in PTC. Keywords: papillary thyroid carcinoma, next-generation sequence, ARAP3, oncogene
Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M
Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M
Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.
Full Text Available Microsatellites, or simple sequence repeats (SSRs, are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq and related tools for mining and development of microsatellites in plants.
Campopiano, Rosa; Ryskalin, Larisa; Giardina, Emiliano; Zampatti, Stefania; Busceti, Carla L; Biagioni, Francesca; Ferese, Rosangela; Storto, Marianna; Gambardella, Stefano; Fornai, Francesco
Amyotrophic lateral sclerosis (ALS) is fatal neurodegenerative disease clinically characterized by upper and lower motor neuron dysfunction resulting in rapidly progressive paralysis and death from respiratory failure. Most cases appear to be sporadic, but 5-10 % of cases have a family history of the disease, and over the last decade, identification of mutations in about 20 genes predisposing to these disorders has provided the means to better understand their pathogenesis. Next Generation sequencing (NGS) is an advanced high-throughput DNA sequencing technology which have rapidly contributed to an acceleration in the discovery of genetic risk factors for both familial and sporadic neurological and neurodegenerative diseases. These strategies allowed to rapidly identify disease-associated variants and genetic risk factors for both familial (fALS) and sporadic ALS (sALS), strongly contributing to the knowledge of the genetic architecture of ALS. Moreover, as the number of ALS genes grows, many of the proteins they encode are in intracellular processes shared with other known diseases, suggesting an overlapping of clinical and phatological features between different diseases. To emphasize this concept, the review focuses on genes coding for Valosin-containing protein (VPC) and two Heterogeneous nuclear RNA-binding proteins (HNRNPA1 and hnRNPA2B1), recently idefied through NGS, where different mutations have been associated in both ALS and other neurological and neurodegenerative diseases.
Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E
Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.
Full Text Available Abstract Background DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. Results We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. Conclusion The described assay outputs absolute copy number, outputs an error estimate (p-value, and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.
Vidal, Silvia; Brandi, Núria; Pacheco, Paola; Gerotina, Edgar; Blasco, Laura; Trotta, Jean-Rémi; Derdak, Sophia; Del Mar O'Callaghan, Maria; Garcia-Cazorla, Àngels; Pineda, Mercè; Armstrong, Judith
Rett syndrome (RTT) is an early-onset neurodevelopmental disorder that almost exclusively affects girls and is totally disabling. Three genes have been identified that cause RTT: MECP2, CDKL5 and FOXG1. However, the etiology of some of RTT patients still remains unknown. Recently, next generation sequencing (NGS) has promoted genetic diagnoses because of the quickness and affordability of the method. To evaluate the usefulness of NGS in genetic diagnosis, we present the genetic study of RTT-like patients using different techniques based on this technology. We studied 1577 patients with RTT-like clinical diagnoses and reviewed patients who were previously studied and thought to have RTT genes by Sanger sequencing. Genetically, 477 of 1577 patients with a RTT-like suspicion have been diagnosed. Positive results were found in 30% by Sanger sequencing, 23% with a custom panel, 24% with a commercial panel and 32% with whole exome sequencing. A genetic study using NGS allows the study of a larger number of genes associated with RTT-like symptoms simultaneously, providing genetic study of a wider group of patients as well as significantly reducing the response time and cost of the study.
Full Text Available Identification of driver mutations in lung adenocarcinoma has led to development of targeted agents that are already approved for clinical use or are in clinical trials. Therefore, the number of biomarkers that will be needed to assess is expected to rapidly increase. This calls for the implementation of methods probing the mutational status of multiple genes for inoperable cases, for which limited cytological or bioptic material is available. Cytology specimens from 38 lung adenocarcinomas were subjected to the simultaneous assessment of 504 mutational hotspots of 22 lung cancer-associated genes using 10 nanograms of DNA and Ion Torrent PGM next-generation sequencing. Thirty-six cases were successfully sequenced (95%. In 24/36 cases (67% at least one mutated gene was observed, including EGFR, KRAS, PIK3CA, BRAF, TP53, PTEN, MET, SMAD4, FGFR3, STK11, MAP2K1. EGFR and KRAS mutations, respectively found in 6/36 (16% and 10/36 (28% cases, were mutually exclusive. Nine samples (25% showed concurrent alterations in different genes. The next-generation sequencing test used is superior to current standard methodologies, as it interrogates multiple genes and requires limited amounts of DNA. Its applicability to routine cytology samples might allow a significant increase in the fraction of lung cancer patients eligible for personalized therapy.
Gomez-Escribano, Juan Pablo; Alt, Silke; Bibb, Mervyn J.
Like many fields of the biosciences, actinomycete natural products research has been revolutionised by next-generation DNA sequencing (NGS). Hundreds of new genome sequences from actinobacteria are made public every year, many of them as a result of projects aimed at identifying new natural products and their biosynthetic pathways through genome mining. Advances in these technologies in the last five years have meant not only a reduction in the cost of whole genome sequencing, but also a substantial increase in the quality of the data, having moved from obtaining a draft genome sequence comprised of several hundred short contigs, sometimes of doubtful reliability, to the possibility of obtaining an almost complete and accurate chromosome sequence in a single contig, allowing a detailed study of gene clusters and the design of strategies for refactoring and full gene cluster synthesis. The impact that these technologies are having in the discovery and study of natural products from actinobacteria, including those from the marine environment, is only starting to be realised. In this review we provide a historical perspective of the field, analyse the strengths and limitations of the most relevant technologies, and share the insights acquired during our genome mining projects. PMID:27089350
Romão, Daniela; Staley, Christopher; Ferreira, Filipa; Rodrigues, Raquel; Sabino, Raquel; Veríssimo, Cristina; Wang, Ping; Sadowsky, Michael; Brandão, João
A next-generation sequencing (NGS) approach, in conjunction with culture-based methods, was used to examine fungal and prokaryotic communities for the presence of potential pathogens in beach sands throughout Portugal. Culture-based fungal enumeration revealed low and variable concentrations of the species targeted (yeasts and dermatophytes), which were underrepresented in the community characterized by NGS targeting the ITS1 region. Conversely, NGS indicated that the potentially pathogenic species Purpureocillium liliacinum comprised nearly the entire fungal community. Culturable fecal indicator bacterial concentrations were low throughout the study and unrelated to communities characterized by NGS. Notably, the prokaryotic communities characterized revealed a considerable abundance of archaea. Results highlight differences in communities between methods in beach sand monitoring but indicate the techniques offer complementary insights. Thus, there is a need to leverage culture-based methods with NGS methods, using a toolbox approach, to determine appropriate targets and metrics for beach sand monitoring to adequately protect public health. Copyright © 2017. Published by Elsevier Ltd.
Full Text Available Retinal dystrophies (RD constitute a group of blinding diseases that are characterized by clinical variability and pronounced genetic heterogeneity. The different nonsyndromic and syndromic forms of RD can be attributed to mutations in more than 200 genes. Consequently, next generation sequencing (NGS technologies are among the most promising approaches to identify mutations in RD. We screened a large cohort of patients comprising 89 independent cases and families with various subforms of RD applying different NGS platforms. While mutation screening in 50 cases was performed using a RD gene capture panel, 47 cases were analyzed using whole exome sequencing. One family was analyzed using whole genome sequencing. A detection rate of 61% was achieved including mutations in 34 known and two novel RD genes. A total of 69 distinct mutations were identified, including 39 novel mutations. Notably, genetic findings in several families were not consistent with the initial clinical diagnosis. Clinical reassessment resulted in refinement of the clinical diagnosis in some of these families and confirmed the broad clinical spectrum associated with mutations in RD genes.
Jespersen, Jakob S.; Petersen, Bent; Seguin-Orlando, Andaine
at identifying PfEMP1 features associated with high virulence. Here we present the first effective method for sequence analysis of var genes expressed in field samples: a sequential PCR and next generation sequencing based technique applied on expressed var sequence tags and subsequently on long range PCR......, encoded by ~60 highly variable 'var' genes per haploid genome. PfEMP1 is exported to the surface of infected erythrocytes and is thought to be fundamental to immune evasion by adhesion to host and parasite factors. The highly variable nature has constituted a roadblock in var expression studies aimed...
Full Text Available Abstract Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454. The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%. Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Full Text Available USH2A mutations have been implicated in the disease etiology of several inherited diseases, including Usher syndrome type 2 (USH2, nonsyndromic retinitis pigmentosa (RP, and nonsyndromic deafness. The complex genetic and phenotypic spectrums relevant to USH2A defects make it difficult to manage patients with such mutations. In the present study, we aim to determine the genetic etiology and to characterize the correlated clinical phenotypes for three Chinese pedigrees with nonsyndromic RP, one with RP sine pigmento (RPSP, and one with USH2. Family histories and clinical details for all included patients were reviewed. Ophthalmic examinations included best corrected visual acuities, visual field measurements, funduscopy, and electroretinography. Targeted next-generation sequencing (NGS was applied using two sequence capture arrays to reveal the disease causative mutations for each family. Genotype-phenotype correlations were also annotated. Seven USH2A mutations, including four missense substitutions (p.P2762A, p.G3320C, p.R3719H, and p.G4763R, two splice site variants (c.8223+1G>A and c.8559-2T>C, and a nonsense mutation (p.Y3745*, were identified as disease causative in the five investigated families, of which three reported to have consanguineous marriage. Among all seven mutations, six were novel, and one was recurrent. Two homozygous missense mutations (p.P2762A and p.G3320C were found in one individual family suggesting a potential double hit effect. Significant phenotypic divergences were revealed among the five families. Three families of the five families were affected with early, moderated, or late onset RP, one with RPSP, and the other one with USH2. Our study expands the genotypic and phenotypic variability relevant to USH2A mutations, which would help with a clear insight into the complex genetic and phenotypic spectrums relevant to USH2A defects, and is complementary for a better management of patients with such mutations. We have
Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko
Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available
Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The
Roh, Seong Woon; Abell, Guy C J; Kim, Kyoung-Ho; Nam, Young-Do; Bae, Jin-Woo
Recent advances in molecular biology have resulted in the application of DNA microarrays and next-generation sequencing (NGS) technologies to the field of microbial ecology. This review aims to examine the strengths and weaknesses of each of the methodologies, including depth and ease of analysis, throughput and cost-effectiveness. It also intends to highlight the optimal application of each of the individual technologies toward the study of a particular environment and identify potential synergies between the two main technologies, whereby both sample number and coverage can be maximized. We suggest that the efficient use of microarray and NGS technologies will allow researchers to advance the field of microbial ecology, and importantly, improve our understanding of the role of microorganisms in their various environments.
Rieneck, Klaus; Bak, Mads; Jønson, Lars
, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...... information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity...
Farhat, Maha; Shaheed, Raja A; Al-Ali, Haider H; Al-Ghamdi, Abdullah S; Al-Hamaqi, Ghadeer M; Maan, Hawraa S; Al-Mahfoodh, Zainab A; Al-Seba, Hussain Z
To investigate the presence of Legionella spp in cooling tower water. Legionella proliferation in cooling tower water has serious public health implications as it can be transmitted to humans via aerosols and cause Legionnaires' disease. Samples of cooling tower water were collected from King Fahd Hospital of the University (KFHU) (Imam Abdulrahman Bin Faisal University, 2015/2016). The water samples were analyzed by a standard Legionella culture method, real-time polymerase chain reaction (RT-PCR), and 16S rRNA next-generation sequencing. In addition, the bacterial community composition was evaluated. All samples were negative by conventional Legionella culture. In contrast, all water samples yielded positive results by real-time PCR (105 to 106 GU/L). The results of 16S rRNA next generation sequencing showed high similarity and reproducibility among the water samples. The majority of sequences were Alpha-, Beta-, and Gamma-proteobacteria, and Legionella was the predominant genus. The hydrogen-oxidizing gram-negative bacterium Hydrogenophaga was present at high abundance, indicating high metabolic activity. Sphingopyxis, which is known for its resistance to antimicrobials and as a pioneer in biofilm formation, was also detected. Our findings indicate that monitoring of Legionella in cooling tower water would be enhanced by use of both conventional culturing and molecular methods.
Full Text Available Penile cancer (PeCa is a relatively rare tumor entity but possesses higher morbidity and mortality rates especially in developing countries. To date, the concrete pathogenic signaling pathways and core machineries involved in tumorigenesis and progression of PeCa remain to be elucidated. Several studies suggested miRNAs, which modulate gene expression at posttranscriptional level, were frequently mis-regulated and aberrantly expressed in human cancers. However, the miRNA profile in human PeCa has not been reported before. In this present study, the miRNA profile was obtained from 10 fresh penile cancerous tissues and matched adjacent non-cancerous tissues via next-generation sequencing. As a result, a total of 751 and 806 annotated miRNAs were identified in normal and cancerous penile tissues, respectively. Among which, 56 miRNAs with significantly different expression levels between paired tissues were identified. Subsequently, several annotated miRNAs were selected randomly and validated using quantitative real-time PCR. Compared with the previous publications regarding to the altered miRNAs expression in various cancers and especially genitourinary (prostate, bladder, kidney, testis cancers, the most majority of deregulated miRNAs showed the similar expression pattern in penile cancer. Moreover, the bioinformatics analyses suggested that the putative target genes of differentially expressed miRNAs between cancerous and matched normal penile tissues were tightly associated with cell junction, proliferation, growth as well as genomic instability and so on, by modulating Wnt, MAPK, p53, PI3K-Akt, Notch and TGF-β signaling pathways, which were all well-established to participate in cancer initiation and progression. Our work presents a global view of the differentially expressed miRNAs and potentially regulatory networks of their target genes for clarifying the pathogenic transformation of normal penis to PeCa, which research resource also
The discovery of genetic factors behind increasing number of human diseases and the growth of education of genetic knowledge to the public make demands for genetic testing increase rapidly. However, traditional genetic testing methods cannot meet all kinds of the requirements. Next generation seq...
Warnke-Sommer, Julia; Ali, Hesham
The assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. Focus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn's disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn's disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers. Mining the hybrid graph can reveal biological phenomena captured
Schulz, Wade L; Tormey, Christopher A; Torres, Richard
Next generation sequencing (NGS) has become a common technology in the clinical laboratory, particularly for the analysis of malignant neoplasms. However, most mutations identified by NGS are variants of unknown clinical significance (VOUS). Although the approach to define these variants differs by institution, software algorithms that predict variant effect on protein function may be used. However, these algorithms commonly generate conflicting results, potentially adding uncertainty to interpretation. In this review, we examine several computational tools used to predict whether a variant has clinical significance. In addition to describing the role of these tools in clinical diagnostics, we assess their efficacy in analyzing known pathogenic and benign variants in hematologic malignancies. Copyright© by the American Society for Clinical Pathology (ASCP).
Xiao, Jianping; Guo, Xueqin; Wang, Yong
Purpose: To identify disease-causing mutations in a Chinese patient with retinitis pigmentosa (RP). Methods: A detailed clinical examination was performed on the proband. Targeted next-generation sequencing (NGS) combined with bioinformatics analysis was performed on the proband to detect candidate...
Quail Michael A
Full Text Available Abstract Background Next generation sequencing (NGS technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Results Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. Conclusions All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Lyu, Yuqiang; Huang, Jing; Zhang, Kaihui; Liu, Guohua; Gao, Min; Gai, Zhongtao; Liu, Yi
To explore the clinical and genetic features of a Chinese boy with oculocutaneous albinism. The clinical features of the patient were analyzed. The DNA of the patient and his parents was extracted and sequenced by next generation exome capture sequencing. The nature and impact of detected mutation were predicted and validated. The child has displayed strabismus, poor vision, nystagmus and brown hair. DNA sequencing showed that the patient has carried compound heterozygous mutations of the TYRP1 gene, namely c.1214C>A (p.T405N) and c.1333dupG, which were inherited from his mother and father, respectively. Neither mutation was reported previously. The child has suffered from oculocutaneous albinism type Ⅲ caused by mutations of the TYRP1 gene.
Full Text Available The development of next generation sequencing (NGS techniques has enabled researchers to study and understand the world of microorganisms from broader and deeper perspectives. The contemporary advances in DNA sequencing technologies have not only enabled finer characterization of bacterial genomes but also provided deeper taxonomic identification of complex microbiomes which in its genomic essence is the combined genetic material of the microorganisms inhabiting an environment, whether the environment be a particular body econiche (e.g., human intestinal contents or a food manufacturing facility econiche (e.g., floor drain. To date, 16S rDNA sequencing, metagenomics and metatranscriptomics are the three basic sequencing strategies used in the taxonomic identification and characterization of food-related microbiomes. These sequencing strategies have used different NGS platforms for DNA and RNA sequence identification. Traditionally, 16S rDNA sequencing has played a key role in understanding the taxonomic composition of a food-related microbiome. Recently, metagenomic approaches have resulted in improved understanding of a microbiome by providing a species-level/strain-level characterization. Further, metatranscriptomic approaches have contributed to the functional characterization of the complex interactions between different microbial communities within a single microbiome. Many studies have highlighted the use of NGS techniques in investigating the microbiome of fermented foods. However, the utilization of NGS techniques in studying the microbiome of non-fermented foods are limited. This review provides a brief overview of the advances in DNA sequencing chemistries as the technology progressed from first, next and third generations and highlights how NGS provided a deeper understanding of food-related microbiomes with special focus on non-fermented foods.
Cao, Yu; Fanning, Séamus; Proos, Sinéad; Jordan, Kieran; Srikumar, Shabarinath
The development of next generation sequencing (NGS) techniques has enabled researchers to study and understand the world of microorganisms from broader and deeper perspectives. The contemporary advances in DNA sequencing technologies have not only enabled finer characterization of bacterial genomes but also provided deeper taxonomic identification of complex microbiomes which in its genomic essence is the combined genetic material of the microorganisms inhabiting an environment, whether the environment be a particular body econiche (e.g., human intestinal contents) or a food manufacturing facility econiche (e.g., floor drain). To date, 16S rDNA sequencing, metagenomics and metatranscriptomics are the three basic sequencing strategies used in the taxonomic identification and characterization of food-related microbiomes. These sequencing strategies have used different NGS platforms for DNA and RNA sequence identification. Traditionally, 16S rDNA sequencing has played a key role in understanding the taxonomic composition of a food-related microbiome. Recently, metagenomic approaches have resulted in improved understanding of a microbiome by providing a species-level/strain-level characterization. Further, metatranscriptomic approaches have contributed to the functional characterization of the complex interactions between different microbial communities within a single microbiome. Many studies have highlighted the use of NGS techniques in investigating the microbiome of fermented foods. However, the utilization of NGS techniques in studying the microbiome of non-fermented foods are limited. This review provides a brief overview of the advances in DNA sequencing chemistries as the technology progressed from first, next and third generations and highlights how NGS provided a deeper understanding of food-related microbiomes with special focus on non-fermented foods. PMID:29033905
Isabel A S Bonatelli
Full Text Available Microsatellite markers (also known as SSRs, Simple Sequence Repeats are widely used in plant science and are among the most informative molecular markers for population genetic investigations, but the development of such markers presents substantial challenges. In this report, we discuss how next generation sequencing can replace the cloning, Sanger sequencing, identification of polymorphic loci, and testing cross-amplification that were previously required to develop microsatellites. We report the development of a large set of microsatellite markers for five species of the Neotropical cactus genus Pilosocereus using a restriction-site-associated DNA sequencing (RAD-seq on a Roche 454 platform. We identified an average of 165 microsatellites per individual, with the absolute numbers across individuals proportional to the sequence reads obtained per individual. Frequency distribution of the repeat units was similar in the five species, with shorter motifs such as di- and trinucleotide being the most abundant repeats. In addition, we provide 72 microsatellites that could be potentially amplified in the sampled species and 22 polymorphic microsatellites validated in two populations of the species Pilosocereus machrisii. Although low coverage sequencing among individuals was observed for most of the loci, which we suggest to be more related to the nature of the microsatellite markers and the possible bias inserted by the restriction enzymes than to the genome size, our work demonstrates that an NGS approach is an efficient method to isolate multispecies microsatellites even in non-model organisms.
Bonatelli, Isabel A S; Carstens, Bryan C; Moraes, Evandro M
Microsatellite markers (also known as SSRs, Simple Sequence Repeats) are widely used in plant science and are among the most informative molecular markers for population genetic investigations, but the development of such markers presents substantial challenges. In this report, we discuss how next generation sequencing can replace the cloning, Sanger sequencing, identification of polymorphic loci, and testing cross-amplification that were previously required to develop microsatellites. We report the development of a large set of microsatellite markers for five species of the Neotropical cactus genus Pilosocereus using a restriction-site-associated DNA sequencing (RAD-seq) on a Roche 454 platform. We identified an average of 165 microsatellites per individual, with the absolute numbers across individuals proportional to the sequence reads obtained per individual. Frequency distribution of the repeat units was similar in the five species, with shorter motifs such as di- and trinucleotide being the most abundant repeats. In addition, we provide 72 microsatellites that could be potentially amplified in the sampled species and 22 polymorphic microsatellites validated in two populations of the species Pilosocereus machrisii. Although low coverage sequencing among individuals was observed for most of the loci, which we suggest to be more related to the nature of the microsatellite markers and the possible bias inserted by the restriction enzymes than to the genome size, our work demonstrates that an NGS approach is an efficient method to isolate multispecies microsatellites even in non-model organisms.
Pawlowski, Jan; Esling, Philippe; Lejzerowicz, Franck
This report presents the study of foraminiferal and metazoan benthic community based on next-generation sequencing (NGS) of environmental DNA and RNA (eDNA/RNA). The objective of this study was to test the application of NGS assays for benthic monitoring of salmon farms in Norway, in order to ove...
Sequence-Based Genotyping of Expressed Swine Leukocyte Antigen Class I Alleles by Next-Generation Sequencing Reveal Novel Swine Leukocyte Antigen Class I Haplotypes and Alleles in Belgian, Danish, and Kenyan Fattening Pigs and Göttingen Minipigs
Sørensen, Maria Rathmann; Ilsøe, Mette; Strube, Mikael Lenz
for the prediction of epitope binding in pigs. The low number of known SLA class I alleles and the limited knowledge of their prevalence in different pig breeds emphasizes the need for efficient SLA typing methods. This study utilizes an SLA class I-typing method based on next-generation sequencing of barcoded PCR...
Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H
Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.
Hye Suck An
Full Text Available Mytilus coruscus (family Mytilidae is one of the most important marine shellfish species in Korea. During the past few decades, this species has become endangered due to the loss of habitats and overfishing. Despite this species’ importance, information on its genetic background is scarce. In this study, we developed microsatellite markers for M. coruscus using next-generation sequencing. A total of 263,900 raw reads were obtained from a quarter-plate run on the 454 GS-FLX titanium platform, and 176,327 unique sequences were generated with an average length of 381 bp; 2569 (1.45% sequences contained a minimum of five di- to tetra-nucleotide repeat motifs. Of the 51 loci screened, 46 were amplified successfully, and 22 were polymorphic among 30 individuals, with seven of trinucleotide repeats and three of tetranucleotide repeats. All loci exhibited high genetic variability, with an average of 17.32 alleles per locus, and the mean observed and expected heterozygosities were 0.67 and 0.90, respectively. In addition, cross-amplification was tested for all 22 loci in another congener species, M. galloprovincialis. None of the primer pairs resulted in effective amplification, which might be due to their high mutation rates. Our work demonstrated the utility of next-generation 454 sequencing as a method for the rapid and cost-effective identification of microsatellites. The high degree of polymorphism exhibited by the 22 newly developed microsatellites will be useful in future conservation genetic studies of this species.
Ip, Hon S.; Wiley, Michael R.; Long, Renee; Gustavo, Palacios; Shearn-Bochsler, Valerie; Whitehouse, Chris A.
Advances in massively parallel DNA sequencing platforms, commonly termed next-generation sequencing (NGS) technologies, have greatly reduced time, labor, and cost associated with DNA sequencing. Thus, NGS has become a routine tool for new viral pathogen discovery and will likely become the standard for routine laboratory diagnostics of infectious diseases in the near future. This study demonstrated the application of NGS for the rapid identification and characterization of a virus isolated from the brain of an endangered Mississippi sandhill crane. This bird was part of a population restoration effort and was found in an emaciated state several days after Hurricane Isaac passed over the refuge in Mississippi in 2012. Post-mortem examination had identified trichostrongyliasis as the possible cause of death, but because a virus with morphology consistent with a togavirus was isolated from the brain of the bird, an arboviral etiology was strongly suspected. Because individual molecular assays for several known arboviruses were negative, unbiased NGS by Illumina MiSeq was used to definitively identify and characterize the causative viral agent. Whole genome sequencing and phylogenetic analysis revealed the viral isolate to be the Highlands J virus, a known avian pathogen. This study demonstrates the use of unbiased NGS for the rapid detection and characterization of an unidentified viral pathogen and the application of this technology to wildlife disease diagnostics and conservation medicine.
Giefing, M; Wierzbicka, M; Szyfter, K
of the discovery and functional impact of recurrent genetic lesions that are likely to influence the management of this disease in the near future. This manuscript integrates genetic data from publicly available array comparative genome hybridization (aCGH) and next-generation sequencing genetics databases...
O’Donovan, Brian D.; Gelfand, Jeffrey M.; Sample, Hannah A.; Chow, Felicia C.; Betjemann, John P.; Shah, Maulik P.; Richie, Megan B.; Gorman, Mark P.; Hajj-Ali, Rula A.; Calabrese, Leonard H.; Zorn, Kelsey C.; Chow, Eric D.; Greenlee, John E.; Blum, Jonathan H.; Green, Gary; Khan, Lillian M.; Banerji, Debarko; Langelier, Charles; Bryson-Cahn, Chloe; Harrington, Whitney; Lingappa, Jairam R.; Shanbhag, Niraj M.; Green, Ari J.; Brew, Bruce J.; Soldatos, Ariane; Strnad, Luke; Doernberg, Sarah B.; Jay, Cheryl A.; Douglas, Vanja; Josephson, S. Andrew; DeRisi, Joseph L.
Importance Identifying infectious causes of subacute or chronic meningitis can be challenging. Enhanced, unbiased diagnostic approaches are needed. Objective To present a case series of patients with diagnostically challenging subacute or chronic meningitis using metagenomic next-generation sequencing (mNGS) of cerebrospinal fluid (CSF) supported by a statistical framework generated from mNGS of control samples from the environment and from patients who were noninfectious. Design, Setting, and Participants In this case series, mNGS data obtained from the CSF of 94 patients with noninfectious neuroinflammatory disorders and from 24 water and reagent control samples were used to develop and implement a weighted scoring metric based on z scores at the species and genus levels for both nucleotide and protein alignments to prioritize and rank the mNGS results. Total RNA was extracted for mNGS from the CSF of 7 participants with subacute or chronic meningitis who were recruited between September 2013 and March 2017 as part of a multicenter study of mNGS pathogen discovery among patients with suspected neuroinflammatory conditions. The neurologic infections identified by mNGS in these 7 participants represented a diverse array of pathogens. The patients were referred from the University of California, San Francisco Medical Center (n = 2), Zuckerberg San Francisco General Hospital and Trauma Center (n = 2), Cleveland Clinic (n = 1), University of Washington (n = 1), and Kaiser Permanente (n = 1). A weighted z score was used to filter out environmental contaminants and facilitate efficient data triage and analysis. Main Outcomes and Measures Pathogens identified by mNGS and the ability of a statistical model to prioritize, rank, and simplify mNGS results. Results The 7 participants ranged in age from 10 to 55 years, and 3 (43%) were female. A parasitic worm (Taenia solium, in 2 participants), a virus (HIV-1), and 4 fungi (Cryptococcus neoformans
Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.
Zhou, Wei; Hu, Yiyi; Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin
Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon.
Sui, Zhenghong; Fu, Feng; Wang, Jinguo; Chang, Lianpeng; Guo, Weihua; Li, Binbin
Gracilariopsis lemaneiformis has a high economic value and is one of the most important aquaculture species in China. Despite it is economic importance, it has remained largely unstudied at the genomic level. In this study, we conducted a genome survey of Gp. lemaneiformis using next-generation sequencing (NGS) technologies. In total, 18.70 Gb of high-quality sequence data with an estimated genome size of 97 Mb were obtained by HiSeq 2000 sequencing for Gp. lemaneiformis. These reads were assembled into 160,390 contigs with a N50 length of 3.64 kb, which were further assembled into 125,685 scaffolds with a total length of 81.17 Mb. Genome analysis predicted 3490 genes and a GC% content of 48%. The identified genes have an average transcript length of 1,429 bp, an average coding sequence size of 1,369 bp, 1.36 exons per gene, exon length of 1,008 bp, and intron length of 191 bp. From the initial assembled scaffold, transposable elements constituted 54.64% (44.35 Mb) of the genome, and 7737 simple sequence repeats (SSRs) were identified. Among these SSRs, the trinucleotide repeat type was the most abundant (up to 73.20% of total SSRs), followed by the di- (17.41%), tetra- (5.49%), hexa- (2.90%), and penta- (1.00%) nucleotide repeat type. These characteristics suggest that Gp. lemaneiformis is a model organism for genetic study. This is the first report of genome-wide characterization within this taxon. PMID:23875008
Khandelwal, Garima; Girotti, María Romina; Smowton, Christopher; Taylor, Sam; Wirth, Christopher; Dynowski, Marek; Frese, Kristopher K; Brady, Ged; Dive, Caroline; Marais, Richard; Miller, Crispin
Patient-derived xenograft (PDX) and circulating tumor cell-derived explant (CDX) models are powerful methods for the study of human disease. In cancer research, these methods have been applied to multiple questions, including the study of metastatic progression, genetic evolution, and therapeutic drug responses. As PDX and CDX models can recapitulate the highly heterogeneous characteristics of a patient tumor, as well as their response to chemotherapy, there is considerable interest in combining them with next-generation sequencing to monitor the genomic, transcriptional, and epigenetic changes that accompany oncogenesis. When used for this purpose, their reliability is highly dependent on being able to accurately distinguish between sequencing reads that originate from the host, and those that arise from the xenograft itself. Here, we demonstrate that failure to correctly identify contaminating host reads when analyzing DNA- and RNA-sequencing (DNA-Seq and RNA-Seq) data from PDX and CDX models is a major confounding factor that can lead to incorrect mutation calls and a failure to identify canonical mutation signatures associated with tumorigenicity. In addition, a highly sensitive algorithm and open source software tool for identifying and removing contaminating host sequences is described. Importantly, when applied to PDX and CDX models of melanoma, these data demonstrate its utility as a sensitive and selective tool for the correction of PDX- and CDX-derived whole-exome and RNA-Seq data. Implications: This study describes a sensitive method to identify contaminating host reads in xenograft and explant DNA- and RNA-Seq data and is applicable to other forms of deep sequencing. Mol Cancer Res; 15(8); 1012-6. ©2017 AACR . ©2017 American Association for Cancer Research.
Fuentes-Pananá, Ezequiel M; Larios-Serrato, Violeta; Méndez-Tenorio, Alfonso; Morales-Sánchez, Abigail; Arias, Carlos F; Torres, Javier
Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline. PMID:26910355
Straub, Shannon C K; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C; Liston, Aaron
Just as Sanger sequencing did more than 20 years ago, next-generation sequencing (NGS) is poised to revolutionize plant systematics. By combining multiplexing approaches with NGS throughput, systematists may no longer need to choose between more taxa or more characters. Here we describe a genome skimming (shallow sequencing) approach for plant systematics. Through simulations, we evaluated optimal sequencing depth and performance of single-end and paired-end short read sequences for assembly of nuclear ribosomal DNA (rDNA) and plastomes and addressed the effect of divergence on reference-guided plastome assembly. We also used simulations to identify potential phylogenetic markers from low-copy nuclear loci at different sequencing depths. We demonstrated the utility of genome skimming through phylogenetic analysis of the Sonoran Desert clade (SDC) of Asclepias (Apocynaceae). Paired-end reads performed better than single-end reads. Minimum sequencing depths for high quality rDNA and plastome assemblies were 40× and 30×, respectively. Divergence from the reference significantly affected plastome assembly, but relatively similar references are available for most seed plants. Deeper rDNA sequencing is necessary to characterize intragenomic polymorphism. The low-copy fraction of the nuclear genome was readily surveyed, even at low sequencing depths. Nearly 160000 bp of sequence from three organelles provided evidence of phylogenetic incongruence in the SDC. Adoption of NGS will facilitate progress in plant systematics, as whole plastome and rDNA cistrons, partial mitochondrial genomes, and low-copy nuclear markers can now be efficiently obtained for molecular phylogenetics studies.
Aguilar, Maria; Richardson, Elisabeth; Tan, BoonFei; Walker, Giselle; Dunfield, Peter F; Bass, David; Nesbø, Camilla; Foght, Julia; Dacks, Joel B
Tailings ponds in the Athabasca oil sands (Canada) contain fluid wastes, generated by the extraction of bitumen from oil sands ores. Although the autochthonous prokaryotic communities have been relatively well characterized, almost nothing is known about microbial eukaryotes living in the anoxic soft sediments of tailings ponds or in the thin oxic layer of water that covers them. We carried out the first next-generation sequencing study of microbial eukaryotic diversity in oil sands tailings ponds. In metagenomes prepared from tailings sediment and surface water, we detected very low numbers of sequences encoding eukaryotic small subunit ribosomal RNA representing seven major taxonomic groups of protists. We also produced and analysed three amplicon-based 18S rRNA libraries prepared from sediment samples. These revealed a more diverse set of taxa, 169 different OTUs encompassing up to eleven higher order groups of eukaryotes, according to detailed classification using homology searching and phylogenetic methods. The 10 most abundant OTUs accounted for > 90% of the total of reads, vs. large numbers of rare OTUs (< 1% abundance). Despite the anoxic and hydrocarbon-enriched nature of the environment, the tailings ponds harbour complex communities of microbial eukaryotes indicating that these organisms should be taken into account when studying the microbiology of the oil sands. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.
Tawari, Nilesh R; Seow, Justine Jia Wen; Perumal, Dharuman; Ow, Jack L; Ang, Shimin; Devasia, Arun George; Ng, Pauline C
ChronQC is a quality control (QC) tracking system for clinical implementation of next-generation sequencing (NGS). ChronQC generates time series plots for various QC metrics to allow comparison of current runs to historical runs. ChronQC has multiple features for tracking QC data including Westgard rules for clinical validity, laboratory-defined thresholds and historical observations within a specified time period. Users can record their notes and corrective actions directly onto the plots for long-term recordkeeping. ChronQC facilitates regular monitoring of clinical NGS to enable adherence to high quality clinical standards. ChronQC is freely available on GitHub (https://github.com/nilesh-tawari/ChronQC), Docker (https://hub.docker.com/r/nileshtawari/chronqc/) and the Python Package Index. ChronQC is implemented in Python and runs on all common operating systems (Windows, Linux and Mac OS X). email@example.com or firstname.lastname@example.org. Supplementary data are available at Bioinformatics online.
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Dilliott, Allison A; Farhan, Sali M K; Ghani, Mahdi; Sato, Christine; Liang, Eric; Zhang, Ming; McIntyre, Adam D; Cao, Henian; Racacho, Lemuel; Robinson, John F; Strong, Michael J; Masellis, Mario; Bulman, Dennis E; Rogaeva, Ekaterina; Lang, Anthony; Tartaglia, Carmela; Finger, Elizabeth; Zinman, Lorne; Turnbull, John; Freedman, Morris; Swartz, Rick; Black, Sandra E; Hegele, Robert A
Next-generation sequencing (NGS) is quickly revolutionizing how research into the genetic determinants of constitutional disease is performed. The technique is highly efficient with millions of sequencing reads being produced in a short time span and at relatively low cost. Specifically, targeted NGS is able to focus investigations to genomic regions of particular interest based on the disease of study. Not only does this further reduce costs and increase the speed of the process, but it lessens the computational burden that often accompanies NGS. Although targeted NGS is restricted to certain regions of the genome, preventing identification of potential novel loci of interest, it can be an excellent technique when faced with a phenotypically and genetically heterogeneous disease, for which there are previously known genetic associations. Because of the complex nature of the sequencing technique, it is important to closely adhere to protocols and methodologies in order to achieve sequencing reads of high coverage and quality. Further, once sequencing reads are obtained, a sophisticated bioinformatics workflow is utilized to accurately map reads to a reference genome, to call variants, and to ensure the variants pass quality metrics. Variants must also be annotated and curated based on their clinical significance, which can be standardized by applying the American College of Medical Genetics and Genomics Pathogenicity Guidelines. The methods presented herein will display the steps involved in generating and analyzing NGS data from a targeted sequencing panel, using the ONDRISeq neurodegenerative disease panel as a model, to identify variants that may be of clinical significance.
Olejnik, Michael; Steuwer, Michel; Gorlatch, Sergei; Heider, Dominik
Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. The source code can be downloaded at http://www.heiderlab.de email@example.com. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Lo, Chien-Chi; Chain, Patrick S G
Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
Full Text Available Next-generation sequencing (NGS technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology’s flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics.
Full Text Available Abstract Background In humans, copies of the Long Interspersed Nuclear Element 1 (LINE-1 retrotransposon comprise 21% of the reference genome, and have been shown to modulate expression and produce novel splice isoforms of transcripts from genes that span or neighbor the LINE-1 insertion site. Results In this work, newly released pilot data from the 1000 Genomes Project is analyzed to detect previously unreported full length insertions of the retrotransposon LINE-1. By direct analysis of the sequence data, we have identified 22 previously unreported LINE-1 insertion sites within the sequence data reported for a mother/father/daughter trio. Conclusions It is demonstrated here that next generation sequencing data, as well as emerging high quality datasets from individual genome projects allow us to assess the amount of heterogeneity with respect to the LINE-1 retrotransposon amongst humans, and provide us with a wealth of testable hypotheses as to the impact that this diversity may have on the health of individuals and populations.
Vanni, Irene; Coco, Simona; Truini, Anna; Rusmini, Marta; Dal Bello, Maria Giovanna; Alama, Angela; Banelli, Barbara; Mora, Marco; Rijavec, Erika; Barletta, Giulia; Genova, Carlo; Biello, Federica; Maggioni, Claudia; Grossi, Francesco
Next-generation sequencing (NGS) is a cost-effective technology capable of screening several genes simultaneously; however, its application in a clinical context requires an established workflow to acquire reliable sequencing results. Here, we report an optimized NGS workflow analyzing 22 lung cancer-related genes to sequence critical samples such as DNA from formalin-fixed paraffin-embedded (FFPE) blocks and circulating free DNA (cfDNA). Snap frozen and matched FFPE gDNA from 12 non-small cell lung cancer (NSCLC) patients, whose gDNA fragmentation status was previously evaluated using a multiplex PCR-based quality control, were successfully sequenced with Ion Torrent PGM™. The robust bioinformatic pipeline allowed us to correctly call both Single Nucleotide Variants (SNVs) and indels with a detection limit of 5%, achieving 100% specificity and 96% sensitivity. This workflow was also validated in 13 FFPE NSCLC biopsies. Furthermore, a specific protocol for low input gDNA capable of producing good sequencing data with high coverage, high uniformity, and a low error rate was also optimized. In conclusion, we demonstrate the feasibility of obtaining gDNA from FFPE samples suitable for NGS by performing appropriate quality controls. The optimized workflow, capable of screening low input gDNA, highlights NGS as a potential tool in the detection, disease monitoring, and treatment of NSCLC.
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
Boel, Annekatrien; Steyaert, Woutert; De Rocker, Nina; Menten, Björn; Callewaert, Bert; De Paepe, Anne; Coucke, Paul; Willaert, Andy
Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome. PMID:27461955
Full Text Available Bradysia odoriphaga (Diptera: Sciaridae is the most important pest of Chinese chive. Insecticides are used widely and frequently to control B. odoriphaga in China. However, the performance of the insecticides chlorpyrifos and clothianidin in controlling the Chinese chive maggot is quite different. Using next generation sequencing technology, different expression unigenes (DEUs in B. odoriphaga were detected after treatment with chlorpyrifos and clothianidin for 6 and 48 h in comparison with control. The number of DEUs ranged between 703 and 1161 after insecticide treatment. In these DEUs, 370–863 unigenes can be classified into 41–46 categories of gene ontology (GO, and 354–658 DEUs can be mapped into 987–1623 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. The expressions of DEUs related to insecticide-metabolism-related genes were analyzed. The cytochrome P450-like unigene group was the largest group in DEUs. Most glutathione S-transferase-like unigenes were down-regulated and most sodium channel-like unigenes were up-regulated after insecticide treatment. Finally, 14 insecticide-metabolism-related unigenes were chosen to confirm the relative expression in each treatment by quantitative Real Time Polymerase Chain Reaction (qRT-PCR. The results of qRT-PCR and RNA Sequencing (RNA-Seq are fairly well-established. Our results demonstrate that a next-generation sequencing tool facilitates the identification of insecticide-metabolism-related genes and the illustration of the insecticide mechanisms of chlorpyrifos and clothianidin.
Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der
In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.
Simbolo, Michele; Mafficini, Andrea; Agostini, Marco; Pedrazzani, Corrado; Bedin, Chiara; Urso, Emanuele D; Nitti, Donato; Turri, Giona; Scardoni, Maria; Fassan, Matteo; Scarpa, Aldo
Genetic screening in families with high risk to develop colorectal cancer (CRC) prevents incurable disease and permits personalized therapeutic and follow-up strategies. The advancement of next-generation sequencing (NGS) technologies has revolutionized the throughput of DNA sequencing. A series of 16 probands for either familial adenomatous polyposis (FAP; 8 cases) or hereditary nonpolyposis colorectal cancer (HNPCC; 8 cases) were investigated for intragenic mutations in five CRC familial syndromes-associated genes (APC, MUTYH, MLH1, MSH2, MSH6) applying both a custom multigene Ion AmpliSeq NGS panel and conventional Sanger sequencing. Fourteen pathogenic variants were detected in 13/16 FAP/HNPCC probands (81.3 %); one FAP proband presented two co-existing pathogenic variants, one in APC and one in MUTYH. Thirteen of these 14 pathogenic variants were detected by both NGS and Sanger, while one MSH2 mutation (L280FfsX3) was identified only by Sanger sequencing. This is due to a limitation of the NGS approach in resolving sequences close or within homopolymeric stretches of DNA. To evaluate the performance of our NGS custom panel we assessed its capability to resolve the DNA sequences corresponding to 2225 pathogenic variants reported in the COSMIC database for APC, MUTYH, MLH1, MSH2, MSH6. Our NGS custom panel resolves the sequences where 2108 (94.7 %) of these variants occur. The remaining 117 mutations reside inside or in close proximity to homopolymer stretches; of these 27 (1.2 %) are imprecisely identified by the software but can be resolved by visual inspection of the region, while the remaining 90 variants (4.0 %) are blind spots. In summary, our custom panel would miss 4 % (90/2225) of pathogenic variants that would need a small set of Sanger sequencing reactions to be solved. The multiplex NGS approach has the advantage of analyzing multiple genes in multiple samples simultaneously, requiring only a reduced number of Sanger sequences to resolve
Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin
We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Ranbir Singh Fougat
Full Text Available Isabgol (Plantago ovata Forsk. is an important medicinal plant having high pharmacological activity in its seed husk, which is substantially used in the food, beverages and packaging industries. Nevertheless, isabgol lags behind in research, particularly for genomic resources, like molecular markers, genetic maps, etc. Presently, molecular markers can be easily developed through next generation sequencing technologies, more efficiently, cost effectively and in less time than ever before. This study was framed keeping in view the need to develop molecular markers for this economically important crop by employing a microsatellite enrichment protocol using a next generation sequencing platform (ion torrent PGM™ to obtain simple sequence repeats (SSRs for Plantago ovata for the very first time. A total of 3447 contigs were assembled, which contained 249 SSRs. Thirty seven loci were randomly selected for primer development; of which, 30 loci were successfully amplified. The developed microsatellite markers showed the amplification of the expected size and cross-amplification in another six species of Plantago. The SSR markers were unable to show polymorphism within P. ovata, suggesting that low variability exists within genotypes of P. ovata. This study suggests that PGM™ sequencing is a rapid and cost-effective tool for developing SSR markers for non-model species, and the markers so-observed could be useful in the molecular breeding of P. ovata.
Fumagalli, Caterina; Vacirca, Davide; Rappa, Alessandra; Passaro, Antonio; Guarize, Juliana; Rafaniello Raviele, Paola; de Marinis, Filippo; Spaggiari, Lorenzo; Casadio, Chiara; Viale, Giuseppe; Barberis, Massimo; Guerini-Rocco, Elena
Molecular profiling of advanced non-small cell lung cancers (NSCLC) is essential to identify patients who may benefit from targeted treatments. In the last years, the number of potentially actionable molecular alterations has rapidly increased. Next-generation sequencing allows for the analysis of multiple genes simultaneously. To evaluate the feasibility and the throughput of next-generation sequencing in clinical molecular diagnostics of advanced NSCLC. A single-institution cohort of 535 non-squamous NSCLC was profiled using a next-generation sequencing panel targeting 22 actionable and cancer-related genes. 441 non-squamous NSCLC (82.4%) harboured at least one gene alteration, including 340 cases (63.6%) with clinically relevant molecular aberrations. Mutations have been detected in all but one gene ( FGFR1 ) of the panel. Recurrent alterations were observed in KRAS , TP53 , EGFR , STK11 and MET genes, whereas the remaining genes were mutated in <5% of the cases. Concurrent mutations were detected in 183 tumours (34.2%), mostly impairing KRAS or EGFR in association with TP53 alterations. The study highlights the feasibility of targeted next-generation sequencing in clinical setting. The majority of NSCLC harboured mutations in clinically relevant genes, thus identifying patients who might benefit from different targeted therapies. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.
Wei, Xiaoming; Sun, Yan; Xie, Jiansheng; Shi, Quan; Qu, Ning; Yang, Guanghui; Cai, Jun; Yang, Yi; Liang, Yu; Wang, Wei; Yi, Xin
Targeted enrichment and next-generation sequencing (NGS) have been employed for detection of genetic diseases. The purpose of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection of hereditary hearing loss, and identify inherited mutations involved in human deafness accurately and economically. To make genetic diagnosis of hereditary hearing loss simple and timesaving, we designed a 0.60 MB array-based chip containing 69 nuclear genes and mitochondrial genome responsible for human deafness and conducted NGS toward ten patients with five known mutations and a Chinese family with hearing loss (never genetically investigated). Ten patients with five known mutations were sequenced using next-generation sequencing to validate the sensitivity of the method. We identified four known mutations in two nuclear deafness causing genes (GJB2 and SLC26A4), one in mitochondrial DNA. We then performed this method to analyze the variants in a Chinese family with hearing loss and identified compound heterozygosity for two novel mutations in gene MYO7A. The compound heterozygosity identified in gene MYO7A causes Usher Syndrome 1B with severe phenotypes. The results support that the combination of enrichment of targeted genes and next-generation sequencing is a valuable molecular diagnostic tool for hereditary deafness and suitable for clinical application. Copyright © 2012 Elsevier B.V. All rights reserved.
Hermann-Bank, Marie Louise; Skovgaard, Kerstin; Stockmarr, Anders
®) followed by next generation sequencing. Primers were designed if necessary and all primer sets were screened against DNA extracted from pure cultures of 15 representative bacterial species. Subsequently the setup was tested on DNA extracted from small and large intestinal content from piglets...
Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel
BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310
Full Text Available BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version - which is developed in Java, takes advantage of Amazon Web Services (AWS cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.
Zhao, Yue; Zhang, Hong; Xia, Xue-shan
Inherited cardiomyopathy is the most common hereditary cardiac disease. It also causes a significant proportion of sudden cardiac deaths in young adults and athletes. So far, approximately one hundred genes have been reported to be involved in cardiomyopathies through different mechanisms. Therefore, the identification of the genetic basis and disease mechanisms of cardiomyopathies are important for establishing a clinical diagnosis and genetic testing. Next-generation semiconductor sequencing (NGSS) technology platform is a high-throughput sequencer capable of analyzing clinically derived genomes with high productivity, sensitivity and specificity. It was launched in 2010 by Life Technologies of USA, and it is based on a high density semiconductor chip, which was covered with tens of thousands of wells. NGSS has been successfully used in candidate gene mutation screening to identify hereditary disease. In this review, we summarize these genetic variations, challenge and application of NGSS in inherited cardiomyopathy, and its value in disease diagnosis, prevention and treatment.
Lalonde, Emilie; Albrecht, Steffen; Ha, Kevin C H; Jacob, Karine; Bolduc, Nathalie; Polychronakos, Constantin; Dechelotte, Pierre; Majewski, Jacek; Jabado, Nada
Protein coding genes constitute approximately 1% of the human genome but harbor 85% of the mutations with large effects on disease-related traits. Therefore, efficient strategies for selectively sequencing complete coding regions (i.e., "whole exome") have the potential to contribute our understanding of human diseases. We used a method for whole-exome sequencing coupling Agilent whole-exome capture to the Illumina DNA-sequencing platform, and investigated two unrelated fetuses from nonconsanguineous families with Fowler Syndrome (FS), a stereotyped phenotype lethal disease. We report novel germline mutations in feline leukemia virus subgroup C cellular-receptor-family member 2, FLVCR2, which has recently been shown to cause FS. Using this technology, we identified three types of genetic abnormalities: point-mutations, insertions-deletions, and intronic splice-site changes (first pathogenic report using this technology), in the fetuses who both were compound heterozygotes for the disease. Although revealing a high level of allelic heterogeneity and mutational spectrum in FS, this study further illustrates the successful application of whole-exome sequencing to uncover genetic defects in rare Mendelian disorders. Of importance, we show that we can identify genes underlying rare, monogenic and recessive diseases using a limited number of patients (n=2), in the absence of shared genetic heritage and in the presence of allelic heterogeneity.
Yang, Lei; Naylor, Gavin J P
We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.
Full Text Available BACKGROUND: The concept of the utilization of rearranged ends for development of personalized biomarkers has attracted much attention owing to its clinical applicability. Although targeted next-generation sequencing (NGS for recurrent rearrangements has been successful in hematologic malignancies, its application to solid tumors is problematic due to the paucity of recurrent translocations. However, copy-number breakpoints (CNBs, which are abundant in solid tumors, can be utilized for identification of rearranged ends. METHOD: As a proof of concept, we performed targeted next-generation sequencing at copy-number breakpoints (TNGS-CNB in nine colon cancer cases including seven primary cancers and two cell lines, COLO205 and SW620. For deduction of CNBs, we developed a novel competitive single-nucleotide polymorphism (cSNP microarray method entailing CNB-region refinement by competitor DNA. RESULT: Using TNGS-CNB, 19 specific rearrangements out of 91 CNBs (20.9% were identified, and two polymerase chain reaction (PCR-amplifiable rearrangements were obtained in six cases (66.7%. And significantly, TNGS-CNB, with its high positive identification rate (82.6% of PCR-amplifiable rearrangements at candidate sites (19/23, just from filtering of aligned sequences, requires little effort for validation. CONCLUSION: Our results indicate that TNGS-CNB, with its utility for identification of rearrangements in solid tumors, can be successfully applied in the clinical laboratory for cancer-relapse and therapy-response monitoring.
Kim, Hyun-Kyoung; Park, Won Cheol; Lee, Kwang Man; Hwang, Hai-Li; Park, Seong-Yeol; Sorn, Sungbin; Chandra, Vishal; Kim, Kwang Gi; Yoon, Woong-Bae; Bae, Joon Seol; Shin, Hyoung Doo; Shin, Jong-Yeon; Seoh, Ju-Young; Kim, Jong-Il; Hong, Kyeong-Man
The concept of the utilization of rearranged ends for development of personalized biomarkers has attracted much attention owing to its clinical applicability. Although targeted next-generation sequencing (NGS) for recurrent rearrangements has been successful in hematologic malignancies, its application to solid tumors is problematic due to the paucity of recurrent translocations. However, copy-number breakpoints (CNBs), which are abundant in solid tumors, can be utilized for identification of rearranged ends. As a proof of concept, we performed targeted next-generation sequencing at copy-number breakpoints (TNGS-CNB) in nine colon cancer cases including seven primary cancers and two cell lines, COLO205 and SW620. For deduction of CNBs, we developed a novel competitive single-nucleotide polymorphism (cSNP) microarray method entailing CNB-region refinement by competitor DNA. Using TNGS-CNB, 19 specific rearrangements out of 91 CNBs (20.9%) were identified, and two polymerase chain reaction (PCR)-amplifiable rearrangements were obtained in six cases (66.7%). And significantly, TNGS-CNB, with its high positive identification rate (82.6%) of PCR-amplifiable rearrangements at candidate sites (19/23), just from filtering of aligned sequences, requires little effort for validation. Our results indicate that TNGS-CNB, with its utility for identification of rearrangements in solid tumors, can be successfully applied in the clinical laboratory for cancer-relapse and therapy-response monitoring.
Hoffman, Jodi D; Greger, Valerie; Strovel, Erin T; Blitzer, Miriam G; Umbarger, Mark A; Kennedy, Caleb; Bishop, Brian; Saunders, Patrick; Porreca, Gregory J; Schienda, Jaclyn; Davie, Jocelyn; Hallam, Stephanie; Towne, Charles
Tay-Sachs disease (TSD) is the prototype for ethnic-based carrier screening, with a carrier rate of ∼1/27 in Ashkenazi Jews and French Canadians. HexA enzyme analysis is the current gold standard for TSD carrier screening (detection rate ∼98%), but has technical limitations. We compared DNA analysis by next-generation DNA sequencing (NGS) plus an assay for the 7.6 kb deletion to enzyme analysis for TSD carrier screening using 74 samples collected from participants at a TSD family conference. ...
Qu, Ling-Hui; Jin, Xin; Xu, Hai-Wei; Li, Shi-Ying; Yin, Zheng-Qin
Usher syndrome (USH) is the most common cause of combined blindness and deafness inherited in an autosomal recessive mode. Molecular diagnosis is of great significance in revealing the molecular pathogenesis and aiding the clinical diagnosis of this disease. However, molecular diagnosis remains a challenge due to high phenotypic and genetic heterogeneity in USH. This study explored an approach for detecting disease-causing genetic mutations in candidate genes in five index cases from unrelated USH families based on targeted next-generation sequencing (NGS) technology. Through systematic data analysis using an established bioinformatics pipeline and segregation analysis, 10 pathogenic mutations in the USH disease genes were identified in the five USH families. Six of these mutations were novel: c.4398G > A and EX38-49del in MYO7A, c.988_989delAT in USH1C, c.15104_15105delCA and c.6875_6876insG in USH2A. All novel variations segregated with the disease phenotypes in their respective families and were absent from ethnically matched control individuals. This study expanded the mutation spectrum of USH and revealed the genotype-phenotype relationships of the novel USH mutations in Chinese patients. Moreover, this study proved that targeted NGS is an accurate and effective method for detecting genetic mutations related to USH. The identification of pathogenic mutations is of great significance for elucidating the underlying pathophysiology of USH.
Hertel, Robert; Rodríguez, David Pintor; Hollensteiner, Jacqueline; Dietrich, Sascha; Leimbach, Andreas; Hoppert, Michael; Liesegang, Heiko; Volland, Sonja
Prophages are viruses, which have integrated their genomes into the genome of a bacterial host. The status of the prophage genome can vary from fully intact with the potential to form infective particles to a remnant state where only a few phage genes persist. Prophages have impact on the properties of their host and are therefore of great interest for genomic research and strain design. Here we present a genome- and next generation sequencing (NGS)-based approach for identification and activity evaluation of prophage regions. Seven prophage or prophage-like regions were identified in the genome of Bacillus licheniformis DSM13. Six of these regions show similarity to members of the Siphoviridae phage family. The remaining region encodes the B. licheniformis orthologue of the PBSX prophage from Bacillus subtilis. Analysis of isolated phage particles (induced by mitomycin C) from the wild-type strain and prophage deletion mutant strains revealed activity of the prophage regions BLi_Pp2 (PBSX-like), BLi_Pp3 and BLi_Pp6. In contrast to BLi_Pp2 and BLi_Pp3, neither phage DNA nor phage particles of BLi_Pp6 could be visualized. However, the ability of prophage BLi_Pp6 to generate particles could be confirmed by sequencing of particle-protected DNA mapping to prophage locus BLi_Pp6. The introduced NGS-based approach allows the investigation of prophage regions and their ability to form particles. Our results show that this approach increases the sensitivity of prophage activity analysis and can complement more conventional approaches such as transmission electron microscopy (TEM). PMID:25811873
Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang
Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.
Gil, Jinsu; Um, Yurry; Kim, Serim; Kim, Ok Tae; Koo, Sung Cheol; Reddy, Chinreddy Subramanyam; Kim, Seong-Cheol; Hong, Chang Pyo; Park, Sin-Gi; Kim, Ho Bang; Lee, Dong Hoon; Jeong, Byung-Hoon; Chung, Jong-Wook; Lee, Yi
Angelica gigas Nakai is an important medicinal herb, widely utilized in Asian countries especially in Korea, Japan, and China. Although it is a vital medicinal herb, the lack of sequencing data and efficient molecular markers has limited the application of a genetic approach for horticultural improvements. Simple sequence repeats (SSRs) are universally accepted molecular markers for population structure study. In this study, we found over 130,000 SSRs, ranging from di- to deca-nucleotide motifs, using the genome sequence of Manchu variety (MV) of A. gigas, derived from next generation sequencing (NGS). From the putative SSR regions identified, a total of 16,496 primer sets were successfully designed. Among them, we selected 848 SSR markers that showed polymorphism from in silico analysis and contained tri- to hexa-nucleotide motifs. We tested 36 SSR primer sets for polymorphism in 16 A. gigas accessions. The average polymorphism information content (PIC) was 0.69; the average observed heterozygosity ( H O ) values, and the expected heterozygosity ( H E ) values were 0.53 and 0.73, respectively. These newly developed SSR markers would be useful tools for molecular genetics, genotype identification, genetic mapping, molecular breeding, and studying species relationships of the Angelica genus.
Chen, Hui; Luthra, Rajyalakshmi; Goswami, Rashmi S.; Singh, Rajesh R.; Roy-Chowdhuri, Sinchita
Application of next-generation sequencing (NGS) technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM) (Life Technologies), a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects
Chen, Hui [Department of Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77030 (United States); Luthra, Rajyalakshmi, E-mail: email@example.com; Goswami, Rashmi S.; Singh, Rajesh R. [Department of Hematopathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77030 (United States); Roy-Chowdhuri, Sinchita [Department of Pathology, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77030 (United States)
Application of next-generation sequencing (NGS) technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM) (Life Technologies), a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
Full Text Available Application of next-generation sequencing (NGS technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM (Life Technologies, a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
Mollerup, Sarah; Friis-Nielsen, Jens; Vinner, Lasse
Propionibacterium acnes is the most abundant bacterium on human skin, particularly in sebaceous areas. P. acnes is suggested to be an opportunistic pathogen involved in the development of diverse medical conditions, but is also a proven contaminant of human samples and surgical wounds. Its...... significance as a pathogen is consequently a matter of debate.In the present study we investigated the presence of P. acnes DNA in 250 next generation sequencing datasets generated from 180 samples of 20 different sample types, mostly of cancerous origin. The samples were either subjected to microbial...... enrichment, involving nuclease treatment to reduce the amount of host nucleic acids, or shotgun-sequenced.We detected high proportions of P. acnes in enriched samples, particularly skin derived and other tissue samples, with levels being higher in enriched compared to shotgun-sequenced samples. P. acnes...
Richard Cronn; Brian J. Knaus; Aaron Liston; Peter J. Maughan; Matthew Parks; John V. Syring; Joshua. Udall
The dramatic advances offered by modem DNA sequencers continue to redefine the limits of what can be accomplished in comparative plant biology. Even with recent achievements, however, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms. Factors like large genome...
Hasan, Mohammad R.; Rawat, Arun; Tang, Patrick; Jithesh, Puthen V.; Thomas, Eva; Tan, Rusung; Tilley, Peter
Next-generation sequencing (NGS) technology has shown promise for the detection of human pathogens from clinical samples. However, one of the major obstacles to the use of NGS in diagnostic microbiology is the low ratio of pathogen DNA to human DNA in most clinical specimens. In this study, we aimed to develop a specimen-processing protocol to remove human DNA and enrich specimens for bacterial and viral DNA for shotgun metagenomic sequencing. Cerebrospinal fluid (CSF) and nasopharyngeal aspi...
Tindall Elizabeth A
Full Text Available Abstract Background High-throughput custom designed genotyping arrays are a valuable resource for biologically focused research studies and increasingly for validation of variation predicted by next-generation sequencing (NGS technologies. We investigate the Illumina GoldenGate chemistry using custom designed VeraCode and sentrix array matrix (SAM assays for each of these applications, respectively. We highlight applications for interpretation of Illumina generated genotype cluster plots to maximise data inclusion and reduce genotyping errors. Findings We illustrate the dramatic effect of outliers in genotype calling and data interpretation, as well as suggest simple means to avoid genotyping errors. Furthermore we present this platform as a successful method for two-cluster rare or non-autosomal variant calling. The success of high-throughput technologies to accurately call rare variants will become an essential feature for future association studies. Finally, we highlight additional advantages of the Illumina GoldenGate chemistry in generating unusually segregated cluster plots that identify potential NGS generated sequencing error resulting from minimal coverage. Conclusions We demonstrate the importance of visually inspecting genotype cluster plots generated by the Illumina software and issue warnings regarding commonly accepted quality control parameters. In addition to suggesting applications to minimise data exclusion, we propose that the Illumina cluster plots may be helpful in identifying potential in-put sequence errors, particularly important for studies to validate NGS generated variation.
Seo, Dong-Won; Oh, Jae-Don; Jin, Shil; Song, Ki-Duk; Park, Hee-Bok; Heo, Kang-Nyeong; Shin, Younhee; Jung, Myunghee; Park, Junhyung; Jo, Cheorun; Lee, Hak-Kyo; Lee, Jun-Heon
There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.
Cahill, Matt J.
Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.
Matt J Cahill
Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.
Cahill, Matt J.; Kö ser, Claudio U.; Ross, Nicholas E.; Archer, John A.C.
Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.
Ruffalo, Matthew; Koyutürk, Mehmet; Ray, Soumya; LaFramboise, Thomas
Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment—in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. Approach: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. Results: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can ‘resurrect’ many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. Availability: LoQuM is available as open source at http://compbio.case.edu/loqum/. Contact: firstname.lastname@example.org. PMID:22962451
Full Text Available To assess the clinical utility of targeted Next-Generation Sequencing (NGS for the diagnosis of Inherited Retinal Dystrophies (IRDs, a total of 109 subjects were enrolled in the study, including 88 IRD affected probands and 21 healthy relatives. Clinical diagnoses included Retinitis Pigmentosa (RP, Leber Congenital Amaurosis (LCA, Stargardt Disease (STGD, Best Macular Dystrophy (BMD, Usher Syndrome (USH, and other IRDs with undefined clinical diagnosis. Participants underwent a complete ophthalmologic examination followed by genetic counseling. A custom AmpliSeq™ panel of 72 IRD-related genes was designed for the analysis and tested using Ion semiconductor Next-Generation Sequencing (NGS. Potential disease-causing mutations were identified in 59.1% of probands, comprising mutations in 16 genes. The highest diagnostic yields were achieved for BMD, LCA, USH, and STGD patients, whereas RP confirmed its high genetic heterogeneity. Causative mutations were identified in 17.6% of probands with undefined diagnosis. Revision of the initial diagnosis was performed for 9.6% of genetically diagnosed patients. This study demonstrates that NGS represents a comprehensive cost-effective approach for IRDs molecular diagnosis. The identification of the genetic alterations underlying the phenotype enabled the clinicians to achieve a more accurate diagnosis. The results emphasize the importance of molecular diagnosis coupled with clinic information to unravel the extensive phenotypic heterogeneity of these diseases.
Tabatabaiefar, Mohammad Amin; Alipour, Paria; Pourahmadiyan, Azam; Fattahi, Najmeh; Shariati, Laleh; Golchin, Neda; Mohammadi-Asl, Javad
Ataxia telangiectasia (A-T) is a neurodegenerative autosomal recessive disorder with the main characteristics of progressive cerebellar degeneration, sensitivity to ionizing radiation, immunodeficiency, telangiectasia, premature aging, recurrent sinopulmonary infections, and increased risk of malignancy, especially of lymphoid origin. Ataxia Telangiectasia Mutated gene, ATM, as a causative gene for the A-T disorder, encodes the ATM protein, which plays an important role in the activation of cell-cycle checkpoints and initiation of DNA repair in response to DNA damage. Targeted next-generation sequencing (NGS) was performed on an Iranian 5-year-old boy presented with truncal and limb ataxia, telangiectasia of the eye, Hodgkin lymphoma, hyper pigmentation, total alopecia, hepatomegaly, and dysarthria. Sanger sequencing was used to confirm the candidate pathogenic variants. Computational docking was done using the HEX software to examine how this change affects the interactions of ATM with the upstream and downstream proteins. Three different variants were identified comprising two homozygous SNPs and one novel homozygous frameshift variant (c.80468047delTA, p.Thr2682ThrfsX5), which creates a stop codon in exon 57 leaving the protein truncated at its C-terminal portion. Therefore, the activation and phosphorylation of target proteins are lost. Moreover, the HEX software confirmed that the mutated protein lost its interaction with upstream and downstream proteins. The variant was classified as pathogenic based on the American College of Medical Genetics and Genomics guideline. This study expands the spectrum of ATM pathogenic variants in Iran and demonstrates the utility of targeted NGS in genetic diagnostics. Copyright © 2017. Published by Elsevier B.V.
Full Text Available RNA-sequencing is a powerful tool in studying RNomics. However, the highly abundance of ribosomal RNAs (rRNA and transfer RNA (tRNA have predominated in the sequencing reads, thereby hindering the study of lowly expressed genes. Therefore, rRNA depletion prior to sequencing is often performed in order to preserve the subtle alteration in gene expression especially those at relatively low expression levels. One of the commercially available methods is to use DNA or RNA probes to hybridize to the target RNAs. However, there is always a concern with the non-specific binding and unintended removal of messenger RNA (mRNA when the same set of probes is applied to different organisms. The degree of such unintended mRNA removal varies among organisms due to organism-specific genomic variation. We developed a computer-based method to design probes to deplete rRNA in an organism-specific manner. Based on the computation results, biotinylated-RNA-probes were produced by in vitro transcription and were used to perform rRNA depletion with subtractive hybridization. We demonstrated that the designed probes of 16S rRNAs and 23S rRNAs can efficiently remove rRNAs from Mycobacterium smegmatis. In comparison with a commercial subtractive hybridization-based rRNA removal kit, using organism-specific probes is better in preserving the RNA integrity and abundance. We believe the computer-based design approach can be used as a generic method in preparing RNA of any organisms for next-generation sequencing, particularly for the transcriptome analysis of microbes.
Zhang, Liangxuan; Chen, Liangjing; Sah, Sachin; Latham, Gary J; Patel, Rajesh; Song, Qinghua; Koeppen, Hartmut; Tam, Rachel; Schleifman, Erica; Mashhedi, Haider; Chalasani, Sreedevi; Fu, Ling; Sumiyoshi, Teiko; Raja, Rajiv; Forrest, William; Hampton, Garret M; Lackner, Mark R; Hegde, Priti; Jia, Shidong
The success of precision oncology relies on accurate and sensitive molecular profiling. The Ion AmpliSeq Cancer Panel, a targeted enrichment method for next-generation sequencing (NGS) using the Ion Torrent platform, provides a fast, easy, and cost-effective sequencing workflow for detecting genomic "hotspot" regions that are frequently mutated in human cancer genes. Most recently, the U.K. has launched the AmpliSeq sequencing test in its National Health Service. This study aimed to evaluate the clinical application of the AmpliSeq methodology. We used 10 ng of genomic DNA from formalin-fixed, paraffin-embedded human colorectal cancer (CRC) tumor specimens to sequence 46 cancer genes using the AmpliSeq platform. In a validation study, we developed an orthogonal NGS-based resequencing approach (SimpliSeq) to assess the AmpliSeq variant calls. Validated mutational analyses revealed that AmpliSeq was effective in profiling gene mutations, and that the method correctly pinpointed "true-positive" gene mutations with variant frequency >5% and demonstrated high-level molecular heterogeneity in CRC. However, AmpliSeq enrichment and NGS also produced several recurrent "false-positive" calls in clinically druggable oncogenes such as PIK3CA. AmpliSeq provided highly sensitive and quantitative mutation detection for most of the genes on its cancer panel using limited DNA quantities from formalin-fixed, paraffin-embedded samples. For those genes with recurrent "false-positive" variant calls, caution should be used in data interpretation, and orthogonal verification of mutations is recommended for clinical decision making.
Cefalù, Angelo B; Spina, Rossella; Noto, Davide; Ingrassia, Valeria; Valenti, Vincenza; Giammanco, Antonina; Fayer, Francesca; Misiano, Gabriella; Cocorullo, Gianfranco; Scrimali, Chiara; Palesano, Ornella; Altieri, Grazia I; Ganci, Antonina; Barbagallo, Carlo M; Averna, Maurizio R
Severe hypertriglyceridemia (HTG) may result from mutations in genes affecting the intravascular lipolysis of triglyceride (TG)-rich lipoproteins. The aim of this study was to develop a targeted next-generation sequencing panel for the molecular diagnosis of disorders characterized by severe HTG. We developed a targeted customized panel for next-generation sequencing Ion Torrent Personal Genome Machine to capture the coding exons and intron/exon boundaries of 18 genes affecting the main pathways of TG synthesis and metabolism. We sequenced 11 samples of patients with severe HTG (TG>885 mg/dL-10 mmol/L): 4 positive controls in whom pathogenic mutations had previously been identified by Sanger sequencing and 7 patients in whom the molecular defect was still unknown. The customized panel was accurate, and it allowed to confirm genetic variants previously identified in all positive controls with primary severe HTG. Only 1 patient of 7 with HTG was found to be carrier of a homozygous pathogenic mutation of the third novel mutation of LMF1 gene (c.1380C>G-p.Y460X). The clinical and molecular familial cascade screening allowed the identification of 2 additional affected siblings and 7 heterozygous carriers of the mutation. We showed that our targeted resequencing approach for genetic diagnosis of severe HTG appears to be accurate, less time consuming, and more economical compared with traditional Sanger resequencing. The identification of pathogenic mutations in candidate genes remains challenging and clinical resequencing should mainly intended for patients with strong clinical criteria for monogenic severe HTG. Copyright © 2017 National Lipid Association. Published by Elsevier Inc. All rights reserved.
Thaitrong, Numrin; Kim, Hanyoup; Renzi, Ronald F; Bartsch, Michael S; Meagher, Robert J; Patel, Kamlesh D
We have developed an automated quality control (QC) platform for next-generation sequencing (NGS) library characterization by integrating a droplet-based digital microfluidic (DMF) system with a capillary-based reagent delivery unit and a quantitative CE module. Using an in-plane capillary-DMF interface, a prepared sample droplet was actuated into position between the ground electrode and the inlet of the separation capillary to complete the circuit for an electrokinetic injection. Using a DNA ladder as an internal standard, the CE module with a compact LIF detector was capable of detecting dsDNA in the range of 5-100 pg/μL, suitable for the amount of DNA required by the Illumina Genome Analyzer sequencing platform. This DMF-CE platform consumes tenfold less sample volume than the current Agilent BioAnalyzer QC technique, preserving precious sample while providing necessary sensitivity and accuracy for optimal sequencing performance. The ability of this microfluidic system to validate NGS library preparation was demonstrated by examining the effects of limited-cycle PCR amplification on the size distribution and the yield of Illumina-compatible libraries, demonstrating that as few as ten cycles of PCR bias the size distribution of the library toward undesirable larger fragments. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Yamamoto, F; Höglund, B; Fernandez-Vina, M; Tyan, D; Rastrou, M; Williams, T; Moonsamy, P; Goodridge, D; Anderson, M; Erlich, H A; Holcomb, C L
Compared to Sanger sequencing, next-generation sequencing offers advantages for high resolution HLA genotyping including increased throughput, lower cost, and reduced genotype ambiguity. Here we describe an enhancement of the Roche 454 GS GType HLA genotyping assay to provide very high resolution (VHR) typing, by the addition of 8 primer pairs to the original 14, to genotype 11 HLA loci. These additional amplicons help resolve common and well-documented alleles and exclude commonly found null alleles in genotype ambiguity strings. Simplification of workflow to reduce the initial preparation effort using early pooling of amplicons or the Fluidigm Access Array™ is also described. Performance of the VHR assay was evaluated on 28 well characterized cell lines using Conexio Assign MPS software which uses genomic, rather than cDNA, reference sequence. Concordance was 98.4%; 1.6% had no genotype assignment. Of concordant calls, 53% were unambiguous. To further assess the assay, 59 clinical samples were genotyped and results compared to unambiguous allele assignments obtained by prior sequence-based typing supplemented with SSO and/or SSP. Concordance was 98.7% with 58.2% as unambiguous calls; 1.3% could not be assigned. Our results show that the amplicon-based VHR assay is robust and can replace current Sanger methodology. Together with software enhancements, it has the potential to provide even higher resolution HLA typing. Copyright © 2015. Published by Elsevier Inc.
Applying Unique Molecular Identifiers in Next Generation Sequencing Reveals a Constrained Viral Quasispecies Evolution under Cross-Reactive Antibody Pressure Targeting Long Alpha Helix of Hemagglutinin
Hauck, Nastasja C.; Kirpach, Josiane; Kiefer, Christina; Farinelle, Sophie; Morris, Stephen A.; Muller, Claude P.; Lu, I-Na
To overcome yearly efforts and costs for the production of seasonal influenza vaccines, new approaches for the induction of broadly protective and long-lasting immune responses have been developed in the past decade. To warrant safety and efficacy of the emerging crossreactive vaccine candidates, it is critical to understand the evolution of influenza viruses in response to these new immune pressures. Here we applied unique molecular identifiers in next generation sequencing to analyze the evolution of influenza quasispecies under in vivo antibody pressure targeting the hemagglutinin (HA) long alpha helix (LAH). Our vaccine targeting LAH of hemagglutinin elicited significant seroconversion and protection against homologous and heterologous influenza virus strains in mice. The vaccine not only significantly reduced lung viral titers, but also induced a well-known bottleneck effect by decreasing virus diversity. In contrast to the classical bottleneck effect, here we showed a significant increase in the frequency of viruses with amino acid sequences identical to that of vaccine targeting LAH domain. No escape mutant emerged after vaccination. These results not only support the potential of a universal influenza vaccine targeting the conserved LAH domains, but also clearly demonstrate that the well-established bottleneck effect on viral quasispecies evolution does not necessarily generate escape mutants. PMID:29587397
Applying Unique Molecular Identifiers in Next Generation Sequencing Reveals a Constrained Viral Quasispecies Evolution under Cross-Reactive Antibody Pressure Targeting Long Alpha Helix of Hemagglutinin
Nastasja C. Hauck
Full Text Available To overcome yearly efforts and costs for the production of seasonal influenza vaccines, new approaches for the induction of broadly protective and long-lasting immune responses have been developed in the past decade. To warrant safety and efficacy of the emerging crossreactive vaccine candidates, it is critical to understand the evolution of influenza viruses in response to these new immune pressures. Here we applied unique molecular identifiers in next generation sequencing to analyze the evolution of influenza quasispecies under in vivo antibody pressure targeting the hemagglutinin (HA long alpha helix (LAH. Our vaccine targeting LAH of hemagglutinin elicited significant seroconversion and protection against homologous and heterologous influenza virus strains in mice. The vaccine not only significantly reduced lung viral titers, but also induced a well-known bottleneck effect by decreasing virus diversity. In contrast to the classical bottleneck effect, here we showed a significant increase in the frequency of viruses with amino acid sequences identical to that of vaccine targeting LAH domain. No escape mutant emerged after vaccination. These results not only support the potential of a universal influenza vaccine targeting the conserved LAH domains, but also clearly demonstrate that the well-established bottleneck effect on viral quasispecies evolution does not necessarily generate escape mutants.
Full Text Available The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.
Łopacińska-Jørgensen, Joanna M; Pedersen, Jonas Nyvold; Bak, Mads
Next-generation sequencing (NGS) has caused a revolution, yet left a gap: long-range genetic information from native, non-amplified DNA fragments is unavailable. It might be obtained by optical mapping of megabase-sized DNA molecules. Frequently only a specific genomic region is of interest, so......-megabase- to megabase-sized DNA molecules were recovered from the gel and analysed by denaturation-renaturation optical mapping. Size-selected molecules from the same gel were sequenced by NGS. The optically mapped molecules and the NGS reads showed enrichment from regions defined by NotI restriction sites. We...... demonstrate that the unannotated genome can be characterized in a locus-specific manner via molecules partially overlapping with the annotated genome. The method is a promising tool for investigation of structural variants in enriched human genomic regions for both research and diagnostic purposes. Our...
Full Text Available Barrett's esophagus (BE is transition from squamous to columnar mucosa as a result of gastroesophageal reflux disease (GERD. The role of microRNA during this transition has not been systematically studied.For initial screening, total RNA from 5 GERD and 6 BE patients was size fractionated. RNA <70 nucleotides was subjected to SOLiD 3 library preparation and next generation sequencing (NGS. Bioinformatics analysis was performed using R package "DEseq". A p value<0.05 adjusted for a false discovery rate of 5% was considered significant. NGS-identified miRNA were validated using qRT-PCR in an independent group of 40 GERD and 27 BE patients. MicroRNA expression of human BE tissues was also compared with three BE cell lines.NGS detected 19.6 million raw reads per sample. 53.1% of filtered reads mapped to miRBase version 18. NGS analysis followed by qRT-PCR validation found 10 differentially expressed miRNA; several are novel (-708-5p, -944, -224-5p and -3065-5p. Up- or down- regulation predicted by NGS was matched by qRT-PCR in every case. Human BE tissues and BE cell lines showed a high degree of concordance (70-80% in miRNA expression. Prediction analysis identified targets that mapped to developmental signaling pathways such as TGFβ and Notch and inflammatory pathways such as toll-like receptor signaling and TGFβ. Cluster analysis found similarly regulated (up or down miRNA to share common targets suggesting coordination between miRNA.Using highly sensitive next-generation sequencing, we have performed a comprehensive genome wide analysis of microRNA in BE and GERD patients. Differentially expressed miRNA between BE and GERD have been further validated. Expression of miRNA between BE human tissues and BE cell lines are highly correlated. These miRNA should be studied in biological models to further understand BE development.
Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan
PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored.
Wycliff M Kinoti
Full Text Available PCR amplicon next generation sequencing (NGS analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored.
van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y
Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to
Vermeulen, Elke T; Lott, Matthew J; Eldridge, Mark D B; Power, Michelle L
Next-generation sequencing (NGS) techniques are well-established for studying bacterial communities but not yet for microbial eukaryotes. Parasite communities remain poorly studied, due in part to the lack of reliable and accessible molecular methods to analyse eukaryotic communities. We aimed to develop and evaluate a methodology to analyse communities of the protozoan parasite Eimeria from populations of the Australian marsupial Petrogale penicillata (brush-tailed rock-wallaby) using NGS. An oocyst purification method for small sample sizes and polymerase chain reaction (PCR) protocol for the 18S rRNA locus targeting Eimeria was developed and optimised prior to sequencing on the Illumina MiSeq platform. A data analysis approach was developed by modifying methods from bacterial metagenomics and utilising existing Eimeria sequences in GenBank. Operational taxonomic unit (OTU) assignment at a high similarity threshold (97%) was more accurate at assigning Eimeria contigs into Eimeria OTUs but at a lower threshold (95%) there was greater resolution between OTU consensus sequences. The assessment of two amplification PCR methods prior to Illumina MiSeq, single and nested PCR, determined that single PCR was more sensitive to Eimeria as more Eimeria OTUs were detected in single amplicons. We have developed a simple and cost-effective approach to a data analysis pipeline for community analysis of eukaryotic organisms using Eimeria communities as a model. The pipeline provides a basis for evaluation using other eukaryotic organisms and potential for diverse community analysis studies. Copyright © 2016 Elsevier B.V. All rights reserved.
Full Text Available At its core, the work of clinical microbiologists consists in the retrieving of a few bytes of information (species identification; metabolic capacities; staining and antigenic properties; antibiotic resistance profiles, etc. from pathogenic agents. The development of next generation sequencing technologies (NGS, and the possibility to determine the entire genome for bacterial pathogens, fungi and protozoans will likely introduce a breakthrough in the amount of information generated by clinical microbiology laboratories: from bytes to Megabytes of information, for a single isolate. In parallel, the development of novel informatics tools, designed for the management and analysis of the so-called Big Data, offers the possibility to search for patterns in databases collecting genomic and microbiological information on the pathogens, as well as epidemiological data and information on the clinical parameters of the patients. Nosocomial infections and antibiotic resistance will likely represent major challenges for clinical microbiologists, in the next decades. In this paper, we describe how bacterial genomics based on NGS, integrated with novel informatic tools, could contribute to the control of hospital infections and multi-drug resistant pathogens.
Maddock, Simon T.; Briscoe, Andrew G.; Wilkinson, Mark; Waeschenbach, Andrea; San Mauro, Diego; Day, Julia J.; Littlewood, D. Tim J.; Foster, Peter G.; Nussbaum, Ronald A.; Gower, David J.
Mitochondrial genome (mitogenome) sequences are being generated with increasing speed due to the advances of next-generation sequencing (NGS) technology and associated analytical tools. However, detailed comparisons to explore the utility of alternative NGS approaches applied to the same taxa have not been undertaken. We compared a ‘traditional’ Sanger sequencing method with two NGS approaches (shotgun sequencing and non-indexed, multiplex amplicon sequencing) on four different sequencing pla...
Hastreiter, Maximilian; Jeske, Tim; Hoser, Jonathan; Kluge, Michael; Ahomaa, Kaarin; Friedl, Marie-Sophie; Kopetzky, Sebastian J; Quell, Jan-Dominik; Mewes, H Werner; Küffner, Robert
Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules' executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME. See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license). email@example.com. Supplementary data are available at Bioinformatics online.
Full Text Available BACKGROUND: MicroRNAs (miRNAs are the class of small endogenous RNAs that play an important regulatory role in cells by negatively affecting gene expression at transcriptional and post-transcriptional levels. There have been extensive studies aiming to discover miRNAs and to analyze their functions in the cells from a variety of species. However, there are no published studies of miRNA profiles in human testis using next generation sequencing (NGS technology. RESULTS: We employed Solexa sequencing technology to profile miRNAs in normal human testis. Total 770 known and 5 novel human miRNAs, and 20121 piRNAs were detected, indicating that the human testis has a complex population of small RNAs. The expression of 15 known and 5 novel detected miRNAs was validated by qRT-PCR. We have also predicted the potential target genes of the abundant known and novel miRNAs, and subjected them to GO and pathway analysis, revealing the involvement of miRNAs in many important biological phenomenon including meiosis and p53-related pathways that are implicated in the regulation of spermatogenesis. CONCLUSIONS: This study reports the first genome-wide miRNA profiles in human testis using a NGS approach. The presence of large number of miRNAs and the nature of their target genes suggested that miRNAs play important roles in spermatogenesis. Here we provide a useful resource for further elucidation of the regulatory role of miRNAs and piRNAs in the spermatogenesis. It may also facilitate the development of prophylactic strategies for male infertility.
Jiménez, Cristina; Jara-Acevedo, María; Corchete, Luis A; Castillo, David; Ordóñez, Gonzalo R; Sarasquete, María E; Puig, Noemí; Martínez-López, Joaquín; Prieto-Conde, María I; García-Álvarez, María; Chillón, María C; Balanzategui, Ana; Alcoceba, Miguel; Oriol, Albert; Rosiñol, Laura; Palomera, Luis; Teruel, Ana I; Lahuerta, Juan J; Bladé, Joan; Mateos, María V; Orfão, Alberto; San Miguel, Jesús F; González, Marcos; Gutiérrez, Norma C; García-Sanz, Ramón
Identification and characterization of genetic alterations are essential for diagnosis of multiple myeloma and may guide therapeutic decisions. Currently, genomic analysis of myeloma to cover the diverse range of alterations with prognostic impact requires fluorescence in situ hybridization (FISH), single nucleotide polymorphism arrays, and sequencing techniques, which are costly and labor intensive and require large numbers of plasma cells. To overcome these limitations, we designed a targeted-capture next-generation sequencing approach for one-step identification of IGH translocations, V(D)J clonal rearrangements, the IgH isotype, and somatic mutations to rapidly identify risk groups and specific targetable molecular lesions. Forty-eight newly diagnosed myeloma patients were tested with the panel, which included IGH and six genes that are recurrently mutated in myeloma: NRAS, KRAS, HRAS, TP53, MYC, and BRAF. We identified 14 of 17 IGH translocations previously detected by FISH and three confirmed translocations not detected by FISH, with the additional advantage of breakpoint identification, which can be used as a target for evaluating minimal residual disease. IgH subclass and V(D)J rearrangements were identified in 77% and 65% of patients, respectively. Mutation analysis revealed the presence of missense protein-coding alterations in at least one of the evaluating genes in 16 of 48 patients (33%). This method may represent a time- and cost-effective diagnostic method for the molecular characterization of multiple myeloma. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Rami A Dalloul
Full Text Available A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo. Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
Dohrn, Maike F; Glöckle, Nicola; Mulahasanovic, Lejla; Heller, Corina; Mohr, Julia; Bauer, Christine; Riesch, Erik; Becker, Andrea; Battke, Florian; Hörtnagel, Konstanze; Hornemann, Thorsten; Suriyanarayanan, Saranya; Blankenburg, Markus; Schulz, Jörg B; Claeys, Kristl G; Gess, Burkhard; Katona, Istvan; Ferbert, Andreas; Vittore, Debora; Grimm, Alexander; Wolking, Stefan; Schöls, Ludger; Lerche, Holger; Korenke, G Christoph; Fischer, Dirk; Schrank, Bertold; Kotzaeridou, Urania; Kurlemann, Gerhard; Dräger, Bianca; Schirmacher, Anja; Young, Peter; Schlotter-Weigel, Beate; Biskup, Saskia
Hereditary neuropathies comprise a wide variety of chronic diseases associated to more than 80 genes identified to date. We herein examined 612 index patients with either a Charcot-Marie-Tooth phenotype, hereditary sensory neuropathy, familial amyloid neuropathy, or small fiber neuropathy using a customized multigene panel based on the next generation sequencing technique. In 121 cases (19.8%), we identified at least one putative pathogenic mutation. Of these, 54.4% showed an autosomal dominant, 33.9% an autosomal recessive, and 11.6% an X-linked inheritance. The most frequently affected genes were PMP22 (16.4%), GJB1 (10.7%), MPZ, and SH3TC2 (both 9.9%), and MFN2 (8.3%). We further detected likely or known pathogenic variants in HINT1, HSPB1, NEFL, PRX, IGHMBP2, NDRG1, TTR, EGR2, FIG4, GDAP1, LMNA, LRSAM1, POLG, TRPV4, AARS, BIC2, DHTKD1, FGD4, HK1, INF2, KIF5A, PDK3, REEP1, SBF1, SBF2, SCN9A, and SPTLC2 with a declining frequency. Thirty-four novel variants were considered likely pathogenic not having previously been described in association with any disorder in the literature. In one patient, two homozygous mutations in HK1 were detected in the multigene panel, but not by whole exome sequencing. A novel missense mutation in KIF5A was considered pathogenic because of the highly compatible phenotype. In one patient, the plasma sphingolipid profile could functionally prove the pathogenicity of a mutation in SPTLC2. One pathogenic mutation in MPZ was identified after being previously missed by Sanger sequencing. We conclude that panel based next generation sequencing is a useful, time- and cost-effective approach to assist clinicians in identifying the correct diagnosis and enable causative treatment considerations. © 2017 International Society for Neurochemistry.
Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar
Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.
Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.
Full Text Available Usher syndrome (USH is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II and Roche 454 (GS FLX for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified.
Licastro, Danilo; Mutarelli, Margherita; Peluso, Ivana; Neveling, Kornelia; Wieskamp, Nienke; Rispoli, Rossella; Vozzi, Diego; Athanasakis, Emmanouil; D'Eustacchio, Angela; Pizzo, Mariateresa; D'Amico, Francesca; Ziviello, Carmela; Simonelli, Francesca; Fabretto, Antonella; Scheffer, Hans; Gasparini, Paolo; Banfi, Sandro; Nigro, Vincenzo
Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified. PMID:22952768
Church, George M; Gao, Yuan; Kosuri, Sriram
Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing.
Full Text Available Abstract Background Next generation sequencing provides detailed insight into the variation present within viral populations, introducing the possibility of treatment strategies that are both reactive and predictive. Current software tools, however, need to be scaled up to accommodate for high-depth viral data sets, which are often temporally or spatially linked. In addition, due to the development of novel sequencing platforms and chemistries, each with implicit strengths and weaknesses, it will be helpful for researchers to be able to routinely compare and combine data sets from different platforms/chemistries. In particular, error associated with a specific sequencing process must be quantified so that true biological variation may be identified. Results Segminator II was developed to allow for the efficient comparison of data sets derived from different sources. We demonstrate its usage by comparing large data sets from 12 influenza H1N1 samples sequenced on both the 454 Life Sciences and Illumina platforms, permitting quantification of platform error. For mismatches median error rates at 0.10 and 0.12%, respectively, suggested that both platforms performed similarly. For insertions and deletions median error rates within the 454 data (at 0.3 and 0.2%, respectively were significantly higher than those within the Illumina data (0.004 and 0.006%, respectively. In agreement with previous observations these higher rates were strongly associated with homopolymeric stretches on the 454 platform. Outside of such regions both platforms had similar indel error profiles. Additionally, we apply our software to the identification of low frequency variants. Conclusion We have demonstrated, using Segminator II, that it is possible to distinguish platform specific error from biological variation using data derived from two different platforms. We have used this approach to quantify the amount of error present within the 454 and Illumina platforms in
Deurenberg, Ruud H; Bathoorn, Erik; Chlebowicz, Monika A; Couto, Natacha; Ferdous, Mithila; García-Cobos, Silvia; Kooistra-Smid, Anna M D; Raangs, Erwin C; Rosema, Sigrid; Veloo, Alida C M; Zhou, Kai; Friedrich, Alexander W; Rossen, John W A
Current molecular diagnostics of human pathogens provide limited information that is often not sufficient for outbreak and transmission investigation. Next generation sequencing (NGS) determines the DNA sequence of a complete bacterial genome in a single sequence run, and from these data, information on resistance and virulence, as well as information for typing is obtained, useful for outbreak investigation. The obtained genome data can be further used for the development of an outbreak-specific screening test. In this review, a general introduction to NGS is presented, including the library preparation and the major characteristics of the most common NGS platforms, such as the MiSeq (Illumina) and the Ion PGM™ (ThermoFisher). An overview of the software used for NGS data analyses used at the medical microbiology diagnostic laboratory in the University Medical Center Groningen in The Netherlands is given. Furthermore, applications of NGS in the clinical setting are described, such as outbreak management, molecular case finding, characterization and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic micro-organisms from animals to humans. Finally, we share our vision on the use of NGS in personalised microbiology in the near future, pointing out specific requirements. Copyright © 2017. Published by Elsevier B.V.
Novák, Petr; Neumann, Pavel; Pech, Jiří; Steinhaisl, J.; Macas, Jiří
Roč. 29, č. 6 (2013), s. 792-793 ISSN 1367-4803 R&D Projects: GA ČR GBP501/12/G090; GA MŠk(CZ) OC10037 Institutional support: RVO:60077344 Keywords : repetitiveDNA * computational analysis * next generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.621, year: 2013
Full Text Available The information from ancient DNA (aDNA provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome of two extinct passenger pigeons (Ectopistes migratorius using de novo assembly of massive short (90 bp, paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.
Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien
The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111
Archer, John; Weber, Jan; Henry, Kenneth; Winner, Dane; Gibson, Richard; Lee, Lawrence; Paxinos, Ellen; Arts, Eric J; Robertson, David L; Mimms, Larry; Quiñones-Mateu, Miguel E
HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.
Full Text Available HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5 viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences and genotypic (e.g., population sequencing linked to bioinformatic algorithms assays are the most widely used. Although several next-generation sequencing (NGS platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences, Illumina®, and Ion Torrent™ (Life Technologies. Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used, compared to Trofile (80% and population sequencing (70%. In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.
Thai, Binh Thanh; Tan, Mun Hua; Lee, Yin Peng; Gan, Han Ming; Tran, Trang Thi; Austin, Christopher M
The marine clam Lutraria rhynchaena is gaining popularity as an aquaculture species in Asia. Lutraria populations are present in the wild throughout Vietnam and several stocks have been established and translocated for breeding and aquaculture grow-out purposes. In this study, we demonstrate the feasibility of utilising Illumina next-generation sequencing technology to streamline the identification and genotyping of microsatellite loci from this clam species. Based on an initial partial genome scan, 48 microsatellite markers with similar melting temperatures were identified and characterised. The 12 most suitable polymorphic loci were then genotyped using 51 individuals from a population in Quang Ninh Province, North Vietnam. Genetic variation was low (mean number of alleles per locus = 2.6; mean expected heterozygosity = 0.41). Two loci showed significant deviation from Hardy-Weinberg equilibrium (HWE) and the presence of null alleles, but there was no evidence of linkage disequilibrium among loci. Three additional populations were screened (n = 7-36) to test the geographic utility of the 12 loci, which revealed 100 % successful genotyping in two populations from central Vietnam (Nha Trang). However, a second population from north Vietnam (Co To) could not be successfully genotyped and morphological evidence and mitochondrial variation suggests that this population represents a cryptic species of Lutraria. Comparisons of the Qang Ninh and Nha Trang populations, excluding the 2 loci out of HWE, revealed statistically significant allelic variation at 4 loci. We reported the first microsatellite loci set for the marine clam Lutraria rhynchaena and demonstrated its potential in differentiating clam populations. Additionally, a cryptic species population of Lutraria rhynchaena was identified during initial loci development, underscoring the overlooked diversity of marine clam species in Vietnam and the need to genetically characterise population representatives prior
Li, Haonan; Jin, Peng; Hao, Qian; Zhu, Wei; Chen, Xia; Wang, Ping
Waardenburg syndrome (WS) is a rare autosomal dominant disorder associated with pigmentation abnormalities and sensorineural hearing loss. In this study, we investigated the genetic cause of WSII in a patient and evaluated the reliability of the targeted next-generation exome sequencing method for the genetic diagnosis of WS. Clinical evaluations were conducted on the patient and targeted next-generation sequencing (NGS) was used to identify the candidate genes responsible for WSII. Multiplex ligation-dependent probe amplification (MLPA) and real-time quantitative polymerase chain reaction (qPCR) were performed to confirm the targeted NGS results. Targeted NGS detected the entire deletion of the coding sequence (CDS) of the SOX10 gene in the WSII patient. MLPA results indicated that all exons of the SOX10 heterozygous deletion were detected; no aberrant copy number in the PAX3 and microphthalmia-associated transcription factor (MITF) genes was found. Real-time qPCR results identified the mutation as a de novo heterozygous deletion. This is the first report of using a targeted NGS method for WS candidate gene sequencing; its accuracy was verified by using the MLPA and qPCR methods. Our research provides a valuable method for the genetic diagnosis of WS.
De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre
Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management.
Weiß, Clemens L; Pais, Marina; Cano, Liliana M; Kamoun, Sophien; Burbano, Hernán A
Intraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats. We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker's yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log- likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies. nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at https://github.com/clwgg/nQuire under the MIT license.
Full Text Available While gene knockout technology can reveal the roles of proteins in cellular functions, including in mast cells, fetal death due to gene manipulation frequently interrupts experimental analysis. We generated mast cells from mouse fetal liver (FLMC, and compared the fundamental functions of FLMC with those of bone marrow-derived mouse mast cells (BMMC. Under electron microscopy, numerous small and electron-dense granules were observed in FLMC. In FLMC, the expression levels of a subunit of the FcεRI receptor and degranulation by IgE cross-linking were comparable with BMMC. By flow cytometry we observed surface expression of c-Kit prior to that of FcεRI on FLMC, although on BMMC the expression of c-Kit came after FcεRI. The surface expression levels of Sca-1 and c-Kit, a marker of putative mast cell precursors, were slightly different between bone marrow cells and fetal liver cells, suggesting that differentiation stage or cell type are not necessarily equivalent between both lineages. Moreover, this indicates that phenotypically similar mast cells may not have undergone an identical process of differentiation. By comprehensive analysis using the next generation sequencer, the same frequency of gene expression was observed for 98.6% of all transcripts in both cell types. These results indicate that FLMC could represent a new and useful tool for exploring mast cell differentiation, and may help to elucidate the roles of individual proteins in the function of mast cells where gene manipulation can induce embryonic lethality in the mid to late stages of pregnancy.
Fukuishi, Nobuyuki; Igawa, Yuusuke; Kunimi, Tomoyo; Hamano, Hirofumi; Toyota, Masao; Takahashi, Hironobu; Kenmoku, Hiromichi; Yagi, Yasuyuki; Matsui, Nobuaki; Akagi, Masaaki
While gene knockout technology can reveal the roles of proteins in cellular functions, including in mast cells, fetal death due to gene manipulation frequently interrupts experimental analysis. We generated mast cells from mouse fetal liver (FLMC), and compared the fundamental functions of FLMC with those of bone marrow-derived mouse mast cells (BMMC). Under electron microscopy, numerous small and electron-dense granules were observed in FLMC. In FLMC, the expression levels of a subunit of the FcεRI receptor and degranulation by IgE cross-linking were comparable with BMMC. By flow cytometry we observed surface expression of c-Kit prior to that of FcεRI on FLMC, although on BMMC the expression of c-Kit came after FcεRI. The surface expression levels of Sca-1 and c-Kit, a marker of putative mast cell precursors, were slightly different between bone marrow cells and fetal liver cells, suggesting that differentiation stage or cell type are not necessarily equivalent between both lineages. Moreover, this indicates that phenotypically similar mast cells may not have undergone an identical process of differentiation. By comprehensive analysis using the next generation sequencer, the same frequency of gene expression was observed for 98.6% of all transcripts in both cell types. These results indicate that FLMC could represent a new and useful tool for exploring mast cell differentiation, and may help to elucidate the roles of individual proteins in the function of mast cells where gene manipulation can induce embryonic lethality in the mid to late stages of pregnancy.
Y.J. Kim (Young Jin); J. Lee (Juyoung); B.-J. Kim (Bong-Jo); T. Park (Taesung); G.R. Abecasis (Gonçalo); M.A.A. De Almeida (Marcio); D. Altshuler (David); J.L. Asimit (Jennifer L.); G. Atzmon (Gil); M. Barber (Mathew); A. Barzilai (Ari); N.L. Beer (Nicola L.); G.I. Bell (Graeme I.); J. Below (Jennifer); T. Blackwell (Tom); J. Blangero (John); M. Boehnke (Michael); D.W. Bowden (Donald W.); N.P. Burtt (Noël); J.C. Chambers (John); H. Chen (Han); P. Chen (Ping); P.S. Chines (Peter); S. Choi (Sungkyoung); C. Churchhouse (Claire); P. Cingolani (Pablo); B.K. Cornes (Belinda); N.J. Cox (Nancy); A.G. Day-Williams (Aaron); A. Duggirala (Aparna); J. Dupuis (Josée); T. Dyer (Thomas); S. Feng (Shuang); J. Fernandez-Tajes (Juan); T. Ferreira (Teresa); T.E. Fingerlin (Tasha E.); J. Flannick (Jason); J.C. Florez (Jose); P. Fontanillas (Pierre); T.M. Frayling (Timothy); C. Fuchsberger (Christian); E. Gamazon (Eric); K. Gaulton (Kyle); S. Ghosh (Saurabh); B. Glaser (Benjamin); A.L. Gloyn (Anna); R.L. Grossman (Robert L.); J. Grundstad (Jason); C. Hanis (Craig); A. Heath (Allison); H. Highland (Heather); M. Horikoshi (Momoko); I.-S. Huh (Ik-Soo); J.R. Huyghe (Jeroen R.); M.K. Ikram (Kamran); K.A. Jablonski (Kathleen); Y. Jun (Yang); N. Kato (Norihiro); J. Kim (Jayoun); Y.J. Kim (Young Jin); B.-J. Kim (Bong-Jo); J. Lee (Juyoung); C.R. King (C. Ryan); J.S. Kooner (Jaspal S.); M.-S. Kwon (Min-Seok); H.K. Im (Hae Kyung); M. Laakso (Markku); K.K.-Y. Lam (Kevin Koi-Yau); J. Lee (Jaehoon); S. Lee (Selyeong); S. Lee (Sungyoung); D.M. Lehman (Donna M.); H. Li (Heng); C.M. Lindgren (Cecilia); X. Liu (Xuanyao); O.E. Livne (Oren E.); A.E. Locke (Adam E.); A. Mahajan (Anubha); J.B. Maller (Julian B.); A.K. Manning (Alisa K.); T.J. Maxwell (Taylor J.); A. Mazoure (Alexander); M.I. McCarthy (Mark); J.B. Meigs (James B.); B. Min (Byungju); K.L. Mohlke (Karen); A.P. Morris (Andrew); S. Musani (Solomon); Y. Nagai (Yoshihiko); M.C.Y. Ng (Maggie C.Y.); D. Nicolae (Dan); S. Oh (Sohee); N.D. Palmer (Nicholette); T. Park (Taesung); T.I. Pollin (Toni I.); I. Prokopenko (Inga); D. Reich (David); M.A. Rivas (Manuel); L.J. Scott (Laura); M. Seielstad (Mark); Y.S. Cho (Yoon Shin); X. Sim (Xueling); R. Sladek (Rob); P. Smith (Philip); I. Tachmazidou (Ioanna); E.S. Tai (Shyong); Y.Y. Teo (Yik Ying); T.M. Teslovich (Tanya M.); J. Torres (Jason); V. Trubetskoy (Vasily); S.M. Willems (Sara); A.L. Williams (Amy L.); J.G. Wilson (James); S. Wiltshire (Steven); S. Won (Sungho); A.R. Wood (Andrew); W. Xu (Wang); J. Yoon (Joon); M. Zawistowski (Matthew); E. Zeggini (Eleftheria); W. Zhang (Weihua); S. Zöllner (Sebastian)
textabstractBackground: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the
Liu, Zhimei; Fang, Fang; Ding, Changhong; Zhang, Weihua; Li, Jiuwei; Yang, Xinying; Wang, Xiaohui; Wu, Yun; Wang, Hongmei; Liu, Liying; Han, Tongli; Wang, Xu; Chen, Chunhong; Lyu, Junlan; Wu, Husheng
To explore the application value of next generation sequencing (NGS) in the diagnosis of mitochondrial disorders. According to mitochondrial disease criteria, genomic DNA was extracted using standard procedure from peripheral venous blood of patients with suspected mitochondrial disease collected from neurological department of Beijing Children's Hospital Affiliated to Capital Medical University between October 2012 and February 2014. Targeted NGS to capture and sequence the entire mtDNA and exons of the 1 000 nuclear genes related to mitochondrial structure and function. Clinical data were collected from patients diagnosed at a molecular level, then clinical features and the relationship between genotype and phenotype were analyzed. Mutation was detected in 21 of 70 patients with suspected mitochondrial disease, in whom 10 harbored mtDNA mutation, while 11 nuclear DNA (nDNA) mutation. In 21 patients, 1 was diagnosed congenital myasthenic syndrome with episodic apnea due to CHAT gene p.I187T homozygous mutation, and 20 were diagnosed mitochondrial disease, in which 10 were Leigh syndrome, 4 were mitochondrial encephalomyopathy with lactic acidosis and stroke like episodes syndrome, 3 were Leber hereditary optic neuropathy (LHON) and LHON plus, 2 were mitochondrial DNA depletion syndrome and 1 was unknown. All the mtDNA mutations were point mutations, which contained A3243G, G3460A, G11778A, T14484C, T14502C and T14487C. Ten mitochondrial disease patients harbored homozygous or compound heterozygous mutations in 5 genes previously shown to cause disease: SURF1, PDHA1, NDUFV1, SUCLA2 and SUCLG1, which had 14 mutations, and 7 of the 14 mutations have not been reported. NGS has a certain application value in the diagnosis of mitochondrial diseases, especially in Leigh syndrome atypical mitochondrial syndrome and rare mitochondrial disorders.
Sarcey, Eric; Serres, Aurélie; Tindy, Fabrice; Chareyre, Audrey; Ng, Siemon; Nicolas, Marine; Vetter, Emmanuelle; Bonnevay, Thierry; Abachin, Eric; Mallet, Laurent
Spontaneous reversion to neurovirulence of live attenuated oral poliovirus vaccine (OPV) serotype 3 (chiefly involving the n.472U>C mutation), must be monitored during production to ensure vaccine safety and consistency. Mutant analysis by polymerase chain reaction and restriction enzyme cleavage (MAPREC) has long been endorsed by the World Health Organization as the preferred in vitro test for this purpose; however, it requires radiolabeling, which is no longer supported by many laboratories. We evaluated the performance and suitability of next generation sequencing (NGS) as an alternative to MAPREC. The linearity of NGS was demonstrated at revertant concentrations equivalent to the study range of 0.25%-1.5%. NGS repeatability and intermediate precision were comparable across all tested samples, and NGS was highly reproducible, irrespective of sequencing platform or analysis software used. NGS was performed on OPV serotype 3 working seed lots and monovalent bulks (n=21) that were previously tested using MAPREC, and which covered the representative range of vaccine production. Percentages of 472-C revertants identified by NGS and MAPREC were comparable and highly correlated (r≥0.80), with a Pearson correlation coefficient of 0.95585 (p<0.0001). NGS demonstrated statistically equivalent performance to that of MAPREC for quantifying low-frequency OPV serotype 3 revertants, and offers a valid alternative to MAPREC. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Lloyd Rhiannon E
-coding genes were shown to be under strong negative (purifying selection, with genes under the strongest pressure (Complex 4 also being the most highly expressed, highlighting their potentially crucial functions in the mitochondrial respiratory chain. Conclusions Next generation sequencing of long-PCR amplicons using single taxon or multi-taxon approaches enabled two new species of Xenopus mtDNA to be fully characterized. We anticipate our complete mitochondrial genome amplification methods to be applicable to other amphibians, helpful for identifying the most appropriate markers for differentiating species, populations and resolving phylogenies, a pressing need since amphibians are undergoing drastic global decline. Our mtDNAs also provide templates for conserved primer design and the assembly of RNA and DNA reads following high throughput “omic” techniques such as RNA- and ChIP-seq. These could help us better understand how processes such mitochondrial replication and gene expression influence xenopus growth and development, as well as how they evolved and are regulated.
Full Text Available High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the “funnel-like” model and the “hourglass” model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.
Wouters, Roel H P; Bijlsma, Rhodé M; Ausems, Margreet G E M; van Delden, Johannes J M; Voest, Emile E; Bredenoord, Annelien L
Ever since genetic testing is possible for specific mutations, ethical debate has sparked on the question of whether professionals have a duty to warn not only patients but also their relatives that might be at risk for hereditary diseases. As next-generation sequencing (NGS) swiftly finds its way into clinical practice, the question who is responsible for conveying unsolicited findings to family members becomes increasingly urgent. Traditionally, there is a strong emphasis on the duties of the professional in this debate. But what is the role of the patient and her family? In this article, we discuss the question of whose duty it is to convey relevant genetic risk information concerning hereditary diseases that can be cured or prevented to the relatives of patients undergoing NGS. We argue in favor of a shared responsibility for professionals and patients and present a strategy that reconciles these roles: a moral accountability nudge. Incorporated into informed consent and counseling services such as letters and online tools, this nudge aims to create awareness on specific patient responsibilities. Commitment of all parties is needed to ensure adequate dissemination of results in the NGS era. © 2016 WILEY PERIODICALS, INC.
Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca
Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450
Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.
Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji
We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.
Maltese, Paolo E; Iarossi, Giancarlo; Ziccardi, Lucia; Colombo, Leonardo; Buzzonetti, Luca; Crinò, Antonino; Tezzele, Silvia; Bertelli, Matteo
Obesity phenotype can be manifested as an isolated trait or accompanied by multisystem disorders as part of a syndromic picture. In both situations, same molecular pathways may be involved to different degrees. This evidence is stronger in syndromic obesity, in which phenotypes of different syndromes may overlap. In these cases, genetic testing can unequivocally provide a final diagnosis. Here we describe a patient who met the diagnostic criteria for Alström syndrome only during adolescence. Genetic testing was requested at 25 years of age for a final confirmation of the diagnosis. The genetic diagnosis of Alström syndrome was obtained through a Next Generation Sequencing genetic test approach using a custom-designed gene panel of 47 genes associated with syndromic and non-syndromic obesity. Genetic analysis revealed a novel homozygous frameshift variant p.(Arg1550Lysfs*10) on exon 8 of the ALMS1 gene. This case shows the need for a revision of the diagnostic criteria guidelines, as a consequence of the recent advent of massive parallel sequencing technology. Indications for genetic testing reported in these currently accepted diagnostic criteria for Alström syndrome, were drafted when sequencing was expensive and time consuming. Nowadays, Next Generation Sequencing testing could be considered as first line diagnostic tool not only for Alström syndrome but, more generally, for all those atypical or not clearly distinguishable cases of syndromic obesity, thus avoiding delayed diagnosis and treatments. Early diagnosis permits a better follow-up and pre-symptomatic interventions. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Meason-Smith, Courtney; Diesel, Alison; Patterson, Adam P; Older, Caitlin E; Johnson, Timothy J; Mansell, Joanne M; Suchodolski, Jan S; Rodrigues Hoffmann, Aline
Next generation sequencing (NGS) studies have demonstrated a diverse skin-associated microbiota and microbial dysbiosis associated with atopic dermatitis in people and in dogs. The skin of cats has yet to be investigated using NGS techniques. We hypothesized that the fungal microbiota of healthy feline skin would be similar to that of dogs, with a predominance of environmental fungi, and that fungal dysbiosis would be present on the skin of allergic cats. Eleven healthy cats and nine cats diagnosed with one or more cutaneous hypersensitivity disorders, including flea bite, food-induced and nonflea nonfood-induced hypersensitivity. Healthy cats were sampled at twelve body sites and allergic cats at six sites. DNA was isolated and Illumina sequencing was performed targeting the internal transcribed spacer region of fungi. Sequences were processed using the bioinformatics software QIIME. The most abundant fungal sequences from the skin of all cats were classified as Cladosporium and Alternaria. The mucosal sites, including nostril, conjunctiva and reproductive tracts, had the fewest number of fungi, whereas the pre-aural space had the most. Allergic feline skin had significantly greater amounts of Agaricomycetes and Sordariomycetes, and significantly less Epicoccum compared to healthy feline skin. The skin of healthy cats appears to have a more diverse fungal microbiota compared to previous studies, and a fungal dysbiosis is noted in the skin of allergic cats. Future studies assessing the temporal stability of the skin microbiota in cats will be useful in determining whether the microbiota sequenced using NGS are colonizers or transient microbes. © 2016 ESVD and ACVD.
Full Text Available We describe the development and evaluation of a novel method for targeted amplification and Next Generation Sequencing (NGS-based identification of viral hemorrhagic fever (VHF agents and assess the feasibility of this approach in diagnostics.An ultrahigh-multiplex panel was designed with primers to amplify all known variants of VHF-associated viruses and relevant controls. The performance of the panel was evaluated via serially quantified nucleic acids from Yellow fever virus, Rift Valley fever virus, Crimean-Congo hemorrhagic fever (CCHF virus, Ebola virus, Junin virus and Chikungunya virus in a semiconductor-based sequencing platform. A comparison of direct NGS and targeted amplification-NGS was performed. The panel was further tested via a real-time nanopore sequencing-based platform, using clinical specimens from CCHF patients.The multiplex primer panel comprises two pools of 285 and 256 primer pairs for the identification of 46 virus species causing hemorrhagic fevers, encompassing 6,130 genetic variants of the strains involved. In silico validation revealed that the panel detected over 97% of all known genetic variants of the targeted virus species. High levels of specificity and sensitivity were observed for the tested virus strains. Targeted amplification ensured viral read detection in specimens with the lowest virus concentration (1-10 genome equivalents and enabled significant increases in specific reads over background for all viruses investigated. In clinical specimens, the panel enabled detection of the causative agent and its characterization within 10 minutes of sequencing, with sample-to-result time of less than 3.5 hours.Virus enrichment via targeted amplification followed by NGS is an applicable strategy for the diagnosis of VHFs which can be adapted for high-throughput or nanopore sequencing platforms and employed for surveillance or outbreak monitoring.
Brinkmann, Annika; Ergünay, Koray; Radonić, Aleksandar; Kocak Tufan, Zeliha; Domingo, Cristina; Nitsche, Andreas
We describe the development and evaluation of a novel method for targeted amplification and Next Generation Sequencing (NGS)-based identification of viral hemorrhagic fever (VHF) agents and assess the feasibility of this approach in diagnostics. An ultrahigh-multiplex panel was designed with primers to amplify all known variants of VHF-associated viruses and relevant controls. The performance of the panel was evaluated via serially quantified nucleic acids from Yellow fever virus, Rift Valley fever virus, Crimean-Congo hemorrhagic fever (CCHF) virus, Ebola virus, Junin virus and Chikungunya virus in a semiconductor-based sequencing platform. A comparison of direct NGS and targeted amplification-NGS was performed. The panel was further tested via a real-time nanopore sequencing-based platform, using clinical specimens from CCHF patients. The multiplex primer panel comprises two pools of 285 and 256 primer pairs for the identification of 46 virus species causing hemorrhagic fevers, encompassing 6,130 genetic variants of the strains involved. In silico validation revealed that the panel detected over 97% of all known genetic variants of the targeted virus species. High levels of specificity and sensitivity were observed for the tested virus strains. Targeted amplification ensured viral read detection in specimens with the lowest virus concentration (1-10 genome equivalents) and enabled significant increases in specific reads over background for all viruses investigated. In clinical specimens, the panel enabled detection of the causative agent and its characterization within 10 minutes of sequencing, with sample-to-result time of less than 3.5 hours. Virus enrichment via targeted amplification followed by NGS is an applicable strategy for the diagnosis of VHFs which can be adapted for high-throughput or nanopore sequencing platforms and employed for surveillance or outbreak monitoring.
Jeffrey W Koehler
Full Text Available A detailed understanding of the circulating pathogens in a particular geographic location aids in effectively utilizing targeted, rapid diagnostic assays, thus allowing for appropriate therapeutic and containment procedures. This is especially important in regions prevalent for highly pathogenic viruses co-circulating with other endemic pathogens such as the malaria parasite. The importance of biosurveillance is highlighted by the ongoing Ebola virus disease outbreak in West Africa. For example, a more comprehensive assessment of the regional pathogens could have identified the risk of a filovirus disease outbreak earlier and led to an improved diagnostic and response capacity in the region. In this context, being able to rapidly screen a single sample for multiple pathogens in a single tube reaction could improve both diagnostics as well as pathogen surveillance. Here, probes were designed to capture identifying filovirus sequence for the ebolaviruses Sudan, Ebola, Reston, Taï Forest, and Bundibugyo and the Marburg virus variants Musoke, Ci67, and Angola. These probes were combined into a single probe panel, and the captured filovirus sequence was successfully identified using the MiSeq next-generation sequencing platform. This panel was then used to identify the specific filovirus from nonhuman primates experimentally infected with Ebola virus as well as Bundibugyo virus in human sera samples from the Democratic Republic of the Congo, thus demonstrating the utility for pathogen detection using clinical samples. While not as sensitive and rapid as real-time PCR, this panel, along with incorporating additional sequence capture probe panels, could be used for broad pathogen screening and biosurveillance.
Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Liu, Qingqing; Tomaszewicz, Keith; Hutchinson, Lloyd; Hornick, Jason L; Woda, Bruce; Yu, Hongbo
Histiocytic sarcoma is a rare malignant neoplasm of presumed hematopoietic origin showing morphologic and immunophenotypic evidence of histiocytic differentiation. Somatic mutation importance in the pathogenesis or disease progression of histiocytic sarcoma was largely unknown. To identify somatic mutations in histiocytic sarcoma, we studied 5 histiocytic sarcomas [3 female and 2 male patients; mean age 54.8 (20-72), anatomic sites include lymph node, uterus, and pleura] and matched normal tissues from each patient as germ line controls. Somatic mutations in 50 "Hotspot" oncogenes and tumor suppressor genes were examined using next generation sequencing. Three (out of five) histiocytic sarcoma cases carried somatic mutations in BRAF. Among them, G464V [variant frequency (VF) of 43.6 %] and G466R (VF of 29.6 %) located at the P loop potentially interfere with the hydrophobic interaction between P and activating loops and ultimately activation of BRAF. Also detected was BRAF somatic mutation N581S (VF of 7.4 %), which was located at the catalytic loop of BRAF kinase domain: its role in modifying kinase activity was unclear. A similar mutational analysis was also performed on nine acute monocytic/monoblastic leukemia cases, which did not identify any BRAF somatic mutations. Our study detected several BRAF mutations in histiocytic sarcomas, which may be important in understanding the tumorigenesis of this rare neoplasm and providing mechanisms for potential therapeutical opportunities.
Pak, Theodore R; Kasarskis, Andrew
Recent reviews have examined the extent to which routine next-generation sequencing (NGS) on clinical specimens will improve the capabilities of clinical microbiology laboratories in the short term, but do not explore integrating NGS with clinical data from electronic medical records (EMRs), immune profiling data, and other rich datasets to create multiscale predictive models. This review introduces a range of "omics" and patient data sources relevant to managing infections and proposes 3 potentially disruptive applications for these data in the clinical workflow. The combined threats of healthcare-associated infections and multidrug-resistant organisms may be addressed by multiscale analysis of NGS and EMR data that is ideally updated and refined over time within each healthcare organization. Such data and analysis should form the cornerstone of future learning health systems for infectious disease. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America.
Weiss, Marjan M.; van der Zwaag, Bert; Jongbloed, Jan D. H.; Vogel, Maartje J.; Brüggenwirth, Hennie T.; Lekanne Deprez, Ronald H.; Mook, Olaf; Ruivenkamp, Claudia A. L.; van Slegtenhorst, Marjon A.; van den Wijngaard, Arthur; Waisfisz, Quinten; Nelen, Marcel R.; van der Stoep, Nienke
Next-generation sequencing (NGS) methods are being adopted by genome diagnostics laboratories worldwide. However, implementing NGS-based tests according to diagnostic standards is a challenge for individual laboratories. To facilitate the implementation of NGS in Dutch laboratories, the Dutch
Teshome Tilahun Bizuayehu
Full Text Available BACKGROUND: microRNAs (miRNAs are implicated in regulation of many cellular processes. miRNAs are processed to their mature functional form in a step-wise manner by multiple proteins and cofactors in the nucleus and cytoplasm. Many miRNAs are conserved across vertebrates. Mature miRNAs have recently been characterized in Atlantic halibut (Hippoglossus hippoglossus L.. The aim of this study was to identify and characterize precursor miRNA (pre-miRNAs and miRNA targets in this non-model flatfish. Discovery of miRNA precursor forms and targets in non-model organisms is difficult because of limited source information available. Therefore, we have developed a methodology to overcome this limitation. METHODS: Genomic DNA and small transcriptome of Atlantic halibut were sequenced using Roche 454 pyrosequencing and SOLiD next generation sequencing (NGS, respectively. Identified pre- miRNAs were further validated with reverse-transcription PCR. miRNA targets were identified using miRanda and RNAhybrid target prediction tools using sequences from public databases. Some of miRNA targets were also identified using RACE-PCR. miRNA binding sites were validated with luciferase assay using the RTS34st cell line. RESULTS: We obtained more than 1.3 M and 92 M sequence reads from 454 genomic DNA sequencing and SOLiD small RNA sequencing, respectively. We identified 34 known and 9 novel pre-miRNAs. We predicted a number of miRNA target genes involved in various biological pathways. miR-24 binding to kisspeptin 1 receptor-2 (kiss1-r2 was confirmed using luciferase assay. CONCLUSION: This study demonstrates that identification of conserved and novel pre-miRNAs in a non-model vertebrate lacking substantial genomic resources can be performed by combining different next generation sequencing technologies. Our results indicate a wide conservation of miRNA precursors and involvement of miRNA in multiple regulatory pathways, and provide resources for further research on mi
Gargis, Amy S; Kalman, Lisa; Lubin, Ira M
Clinical microbiology and public health laboratories are beginning to utilize next-generation sequencing (NGS) for a range of applications. This technology has the potential to transform the field by providing approaches that will complement, or even replace, many conventional laboratory tests. While the benefits of NGS are significant, the complexities of these assays require an evolving set of standards to ensure testing quality. Regulatory and accreditation requirements, professional guidelines, and best practices that help ensure the quality of NGS-based tests are emerging. This review highlights currently available standards and guidelines for the implementation of NGS in the clinical and public health laboratory setting, and it includes considerations for NGS test validation, quality control procedures, proficiency testing, and reference materials. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J
In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ji, Yuan; Si, Yue; McMillin, Gwendolyn A; Lyon, Elaine
The rapid development and dramatic decrease in cost of sequencing techniques have ushered the implementation of genomic testing in patient care. Next generation DNA sequencing (NGS) techniques have been used increasingly in clinical laboratories to scan the whole or part of the human genome in order to facilitate diagnosis and/or prognostics of genetic disease. Despite many hurdles and debates, pharmacogenomics (PGx) is believed to be an area of genomic medicine where precision medicine could have immediate impact in the near future. Areas covered: This review focuses on lessons learned through early attempts of clinically implementing PGx testing; the challenges and opportunities that PGx testing brings to precision medicine in the era of NGS. Expert commentary: Replacing targeted analysis approach with NGS for PGx testing is neither technically feasible nor necessary currently due to several technical limitations and uncertainty involved in interpreting variants of uncertain significance for PGx variants. However, reporting PGx variants out of clinical whole exome or whole genome sequencing (WES/WGS) might represent additional benefits for patients who are tested by WES/WGS.
Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee
Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.
Lo, David; Weng, Jingning; Liu, xiaohong; Yang, Juhua; He, Fen; Wang, Yun; Liu, Xuyang
PURPOSE To detect the disease-causing gene in a Chinese pedigree with autosomal-recessive retinitis pigmentosa (ARRP). METHODS All subjects in this family underwent a complete ophthalmic examination. Targeted-capture next generation sequencing (NGS) was performed on the proband to detect variants. All variants were verified in the remaining family members by PCR amplification and Sanger sequencing. RESULTS All the affected subjects in this pedigree were diagnosed with retinitis pigmentosa (RP). The compound heterozygous c.138delA (p.Asp47IlefsX24) and c.1841G>T (p.Gly614Val) mutations in the Crumbs homolog 1 (CRB1) gene were identified in all the affected patients but not in the unaffected individuals in this family. These mutations were inherited from their parents, respectively. CONCLUSION The novel compound heterozygous mutations in CRB1 were identified in a Chinese pedigree with ARRP using targeted-capture next generation sequencing. After evaluating the significant heredity and impaired protein function, the compound heterozygous c.138delA (p.Asp47IlefsX24) and c.1841G>T (p.Gly614Val) mutations are the causal genes of early onset ARRP in this pedigree. To the best of our knowledge, there is no previous report regarding the compound mutations. PMID:27806333
Full Text Available The impact of natural killer (NK cell alloreactivity on hematopoietic stem cell transplantation (HSCT outcome is still debated due to the complexity of graft parameters, HLA class I environment, the nature of killer cell immunoglobulin-like receptor (KIR/KIR ligand genetic combinations studied, and KIR+ NK cell repertoire size. KIR genes are known to be polymorphic in terms of gene content, copy number variation, and number of alleles. These allelic polymorphisms may impact both the phenotype and function of KIR+ NK cells. We, therefore, speculate that polymorphisms may alter donor KIR+ NK cell phenotype/function thus modulating post-HSCT KIR+ NK cell alloreactivity. To investigate KIR allele polymorphisms of all KIR genes, we developed a next-generation sequencing (NGS technology on a MiSeq platform. To ensure the reliability and specificity of our method, genomic DNA from well-characterized cell lines were used; high-resolution KIR typing results obtained were then compared to those previously reported. Two different bioinformatic pipelines were used allowing the attribution of sequencing reads to specific KIR genes and the assignment of KIR alleles for each KIR gene. Our results demonstrated successful long-range KIR gene amplifications of all reference samples using intergenic KIR primers. The alignment of reads to the human genome reference (hg19 using BiRD pipeline or visualization of data using Profiler software demonstrated that all KIR genes were completely sequenced with a sufficient read depth (mean 317× for all loci and a high percentage of mapping (mean 93% for all loci. Comparison of high-resolution KIR typing obtained to those published data using exome capture resulted in a reported concordance rate of 95% for centromeric and telomeric KIR genes. Overall, our results suggest that NGS can be used to investigate the broad KIR allelic polymorphism. Hence, these data improve our knowledge, not only on KIR+ NK cell alloreactivity in
Sturk-Andreaggi, Kimberly; Peck, Michelle A; Boysen, Cecilie; Dekker, Patrick; McMahon, Timothy P; Marshall, Charla K
The feasibility of generating mitochondrial DNA (mtDNA) data has expanded considerably with the advent of next-generation sequencing (NGS), specifically in the generation of entire mtDNA genome (mitogenome) sequences. However, the analysis of these data has emerged as the greatest challenge to implementation in forensics. To address this need, a custom toolkit for use in the CLC Genomics Workbench (QIAGEN, Hilden, Germany) was developed through a collaborative effort between the Armed Forces Medical Examiner System - Armed Forces DNA Identification Laboratory (AFMES-AFDIL) and QIAGEN Bioinformatics. The AFDIL-QIAGEN mtDNA Expert, or AQME, generates an editable mtDNA profile that employs forensic conventions and includes the interpretation range required for mtDNA data reporting. AQME also integrates an mtDNA haplogroup estimate into the analysis workflow, which provides the analyst with phylogenetic nomenclature guidance and a profile quality check without the use of an external tool. Supplemental AQME outputs such as nucleotide-per-position metrics, configurable export files, and an audit trail are produced to assist the analyst during review. AQME is applied to standard CLC outputs and thus can be incorporated into any mtDNA bioinformatics pipeline within CLC regardless of sample type, library preparation or NGS platform. An evaluation of AQME was performed to demonstrate its functionality and reliability for the analysis of mitogenome NGS data. The study analyzed Illumina mitogenome data from 21 samples (including associated controls) of varying quality and sample preparations with the AQME toolkit. A total of 211 tool edits were automatically applied to 130 of the 698 total variants reported in an effort to adhere to forensic nomenclature. Although additional manual edits were required for three samples, supplemental tools such as mtDNA haplogroup estimation assisted in identifying and guiding these necessary modifications to the AQME-generated profile. Along
Gu, Shun; Tian, Yuanyuan; Chen, Xue; Zhao, Chen
We aim to determine genetic lesions with a phenotypic correlation in four Chinese families with autosomal recessive retinitis pigmentosa (RP). Medical histories were carefully reviewed. All patients received comprehensive ophthalmic evaluations. The next-generation sequencing (NGS) approach targeting a panel of 205 retinal disease-relevant genes and 15 candidate genes was selectively performed on probands from the four recruited families for mutation detection. Online predictive software and crystal structure modeling were also applied to test the potential pathogenic effects of identified mutations. Of the four families, two were diagnosed with RP sino pigmento (RPSP). Patients with RPSP claimed to have earlier RP age of onset but slower disease progression. Five mutations in the eyes shut homolog (EYS) gene, involving two novel (c.7228+1G>A and c.9248G>A) and three recurrent mutations (c.4957dupA, c.6416G>A and c.6557G>A), were found as RP causative in the four families. The missense variant c.5093T>C was determined to be a variant of unknown significance (VUS) due to the variant's colocalization in the same allele with the reported pathogenic mutation c.6416G>A. The two novel variants were further confirmed absent in 100 unrelated healthy controls. Online predictive software indicated potential pathogenicity of the three missense mutations. Further, crystal structural modeling suggested generation of two abnormal hydrogen bonds by the missense mutation p.G2186E (c.6557G>A) and elongation of its neighboring β-sheet induced by p.G3083D (c.9248G>A), which could alter the tertiary structure of the eys protein and thus interrupt its physicochemical properties. Taken together, with the targeted NGS approach, we reveal novel EYS mutations and prove the efficiency of targeted NGS in the genetic diagnoses of RP. We also first report the correlation between EYS mutations and RPSP. The genotypic-phenotypic relationship in all Chinese patients carrying mutations in the EYS
Ziya Motalebipour, Elmira; Kafkas, Salih; Khodaeiaminjan, Mortaza; ?oban, Nergiz; G?zel, Hatice
Background Pistachio (Pistacia vera L.) is one of the most important nut crops in the world. There are about 11 wild species in the genus Pistacia, and they have importance as rootstock seed sources for cultivated P. vera and forest trees. Published information on the pistachio genome is limited. Therefore, a genome survey is necessary to obtain knowledge on the genome structure of pistachio by next generation sequencing. Simple sequence repeat (SSR) markers are useful tools for germplasm cha...
Boland, PM; Ruth, K; Matro, JM; Rainey, KL; Fang, CY; Wong, YN; Daly, MB; Hall, MJ
Genomic tests are increasingly complex, less expensive, and more widely available with the advent of next-generation sequencing (NGS). We assessed knowledge and perceptions among genetic counselors pertaining to NGS genomic testing via an online survey. Associations between selected characteristics and perceptions were examined. Recent education on NGS testing was common, but practical experience limited. Perceived understanding of clinical NGS was modest, specifically concerning tumor testing. Greater perceived understanding of clinical NGS testing correlated with more time spent in cancer-related counseling, exposure to NGS testing, and NGS-focused education. Substantial disagreement about the role of counseling for tumor-based testing was seen. Finally, a majority of counselors agreed with the need for more education about clinical NGS testing, supporting this approach to optimizing implementation. PMID:25523111
Boland, P M; Ruth, K; Matro, J M; Rainey, K L; Fang, C Y; Wong, Y N; Daly, M B; Hall, M J
Genomic tests are increasingly complex, less expensive, and more widely available with the advent of next-generation sequencing (NGS). We assessed knowledge and perceptions among genetic counselors pertaining to NGS genomic testing via an online survey. Associations between selected characteristics and perceptions were examined. Recent education on NGS testing was common, but practical experience limited. Perceived understanding of clinical NGS was modest, specifically concerning tumor testing. Greater perceived understanding of clinical NGS testing correlated with more time spent in cancer-related counseling, exposure to NGS testing, and NGS-focused education. Substantial disagreement about the role of counseling for tumor-based testing was seen. Finally, a majority of counselors agreed with the need for more education about clinical NGS testing, supporting this approach to optimizing implementation. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Wecker, Thomas; Hoffmeier, Klaus; Plötner, Anne; Grüning, Björn Andreas; Horres, Ralf; Backofen, Rolf; Reinhard, Thomas; Schlunck, Günther
Extracellular microRNAs (miRNAs) in aqueous humor were suggested to have a role in transcellular signaling and may serve as disease biomarkers. The authors adopted next-generation sequencing (NGS) techniques to further characterize the miRNA profile in single samples of 60 to 80 μL human aqueous humor. Samples were obtained at the outset of cataract surgery in nine independent, otherwise healthy eyes. Four samples were used to extract RNA and generate sequencing libraries, followed by an adapter-driven amplification step, electrophoretic size selection, sequencing, and data analysis. Five samples were used for quantitative PCR (qPCR) validation of NGS results. Published NGS data on circulating miRNAs in blood were analyzed in comparison. One hundred fifty-eight miRNAs were consistently detected by NGS in all four samples; an additional 59 miRNAs were present in at least three samples. The aqueous humor miRNA profile shows some overlap with published NGS-derived inventories of circulating miRNAs in blood plasma with high prevalence of human miR-451a, -21, and -16. In contrast to blood, miR-184, -4448, -30a, -29a, -29c, -19a, -30d, -205, -24, -22, and -3074 were detected among the 20 most prevalent miRNAs in aqueous humor. Relative expression patterns of miR-451a, -202, and -144 suggested by NGS were confirmed by qPCR. Our data illustrate the feasibility of miRNA analysis by NGS in small individual aqueous humor samples. Intraocular cells as well as blood plasma contribute to the extracellular aqueous humor miRNome. The data suggest possible roles of miRNA in intraocular cell adhesion and signaling by TGF-β and Wnt, which are important in intraocular pressure regulation and glaucoma.
Jakaitiene, Audrone; Avino, Mariano; Guarracino, Mario Rosario
Against diminishing costs, next-generation sequencing (NGS) still remains expensive for studies with a large number of individuals. As cost saving, sequencing genome of pools containing multiple samples might be used. Currently, there are many software available for the detection of single-nucleotide polymorphisms (SNPs). Sensitivity and specificity depend on the model used and data analyzed, indicating that all software have space for improvement. We use beta-binomial model to detect rare mutations in untagged pooled NGS experiments. We propose a multireference framework for pooled data with ability being specific up to two patients affected by neuromuscular disorders (NMD). We assessed the results comparing with The Genome Analysis Toolkit (GATK), CRISP, SNVer, and FreeBayes. Our results show that the multireference approach applying beta-binomial model is accurate in predicting rare mutations at 0.01 fraction. Finally, we explored the concordance of mutations between the model and software, checking their involvement in any NMD-related gene. We detected seven novel SNPs, for which the functional analysis produced enriched terms related to locomotion and musculature.
William E Stutz
Full Text Available Genes of the vertebrate major histocompatibility complex (MHC are of great interest to biologists because of their important role in immunity and disease, and their extremely high levels of genetic diversity. Next generation sequencing (NGS technologies are quickly becoming the method of choice for high-throughput genotyping of multi-locus templates like MHC in non-model organisms. Previous approaches to genotyping MHC genes using NGS technologies suffer from two problems:1 a "gray zone" where low frequency alleles and high frequency artifacts can be difficult to disentangle and 2 a similar sequence problem, where very similar alleles can be difficult to distinguish as two distinct alleles. Here were present a new method for genotyping MHC loci--Stepwise Threshold Clustering (STC--that addresses these problems by taking full advantage of the increase in sequence data provided by NGS technologies. Unlike previous approaches for genotyping MHC with NGS data that attempt to classify individual sequences as alleles or artifacts, STC uses a quasi-Dirichlet clustering algorithm to cluster similar sequences at increasing levels of sequence similarity. By applying frequency and similarity based criteria to clusters rather than individual sequences, STC is able to successfully identify clusters of sequences that correspond to individual or similar alleles present in the genomes of individual samples. Furthermore, STC does not require duplicate runs of all samples, increasing the number of samples that can be genotyped in a given project. We show how the STC method works using a single sample library. We then apply STC to 295 threespine stickleback (Gasterosteus aculeatus samples from four populations and show that neighboring populations differ significantly in MHC allele pools. We show that STC is a reliable, accurate, efficient, and flexible method for genotyping MHC that will be of use to biologists interested in a variety of downstream applications.
De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre
Premise of the study: Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. Methods and Results: A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. Conclusions: The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management. PMID:27610273
Full Text Available We have previously described ProxiMAX, a technology that enables the fabrication of precise, combinatorial gene libraries via codon-by-codon saturation mutagenesis. ProxiMAX was originally performed using manual, enzymatic transfer of codons via blunt-end ligation. Here we present Colibra™: an automated, proprietary version of ProxiMAX used specifically for antibody library generation, in which double-codon hexamers are transferred during the saturation cycling process. The reduction in process complexity, resulting library quality and an unprecedented saturation of up to 24 contiguous codons are described. Utility of the method is demonstrated via fabrication of complementarity determining regions (CDR in antibody fragment libraries and next generation sequencing (NGS analysis of their quality and diversity.
Tafazoli, Alireza; Eshraghi, Peyman; Pantaleoni, Francesca; Vakili, Rahim; Moghaddassian, Morteza; Ghahraman, Martha; Muto, Valentina; Paolacci, Stefano; Golyan, Fatemeh Fardi; Abbaszadegan, Mohammad Reza
Noonan Syndrome (NS) is an autosomal dominant disorder with many variable and heterogeneous conditions. The genetic basis for 20-30% of cases is still unknown. This study evaluates Iranian Noonan patients both clinically and genetically for the first time. Mutational analysis of PTPN11 gene was performed in 15 Iranian patients, using PCR and Sanger sequencing at phase one. Then, as phase two, Next Generation Sequencing (NGS) in the form of targeted resequencing was utilized for analysis of exons from other related genes. Homology modelling for the novel founded mutations was performed as well. The genotype, phenotype correlation was done according to the molecular findings and clinical features. Previously reported mutation (p.N308D) in some patients and a novel mutation (p.D155N) in one of the patients were identified in phase one. After applying NGS methods, known and new variants were found in four patients in other genes, including: CBL (p. V904I), KRAS (p. L53W), SOS1 (p. I1302V), and SOS1 (p. R552G). Structural studies of two deduced novel mutations in related genes revealed deficiencies in the mutated proteins. Following genotype, phenotype correlation, a new pattern of the presence of intellectual disability in two patients was registered. NS shows strong variable expressivity along the high genetic heterogeneity especially in distinct populations and ethnic groups. Also possibly unknown other causative genes may be exist. Obviously, more comprehensive and new technologies like NGS methods are the best choice for detection of molecular defects in patients for genotype, phenotype correlation and disease management. Copyright © 2017 Medical University of Bialystok. Published by Elsevier B.V. All rights reserved.
Fike, Jennifer A.; Oyler-McCance, Sara J.; Zimmerman, Shawna J; Castoe, Todd A.
Gunnison Sage-grouse are an obligate sagebrush species that has experienced significant population declines and has been proposed for listing under the U.S. Endangered Species Act. In order to examine levels of connectivity among Gunnison Sage-grouse leks, we identified 13 novel microsatellite loci though next-generation shotgun sequencing, and tested them on the closely related Greater Sage-grouse. The number of alleles per locus ranged from 2 to 12. No loci were found to be linked, although 2 loci revealed significant departures from Hardy–Weinberg equilibrium or evidence of null alleles. While these microsatellites were designed for Gunnison Sage-grouse, they also work well for Greater Sage-grouse and could be used for numerous genetic questions including landscape and population genetics.
Eerkens, Jelmer W; Nichols, Ruth V; Murray, Gemma G R; Perez, Katherine; Murga, Engel; Kaijankoski, Phil; Rosenthal, Jeffrey S; Engbring, Laurel; Shapiro, Beth
Next Generation Sequencing (NGS) of ancient dental calculus samples from a prehistoric site in San Francisco Bay, CA-SCL-919, reveals a wide range of potentially pathogenic bacteria. One older adult woman, in particular, had high levels of Neisseria meningitidis and low levels of Haemophilus influenzae, species that were not observed in the calculus from three other individuals. Combined with the presence of incipient endocranial lesions and pronounced meningeal grooves, we interpret this as an ancient case of meningococcal disease. This disease afflicts millions around the globe today, but little is known about its (pre)history. With additional sampling, we suggest NGS of calculus offers an exciting new window into the evolutionary history of these bacterial species and their interactions with humans. Copyright © 2018 Elsevier Inc. All rights reserved.
Bybee, Seth M; Bracken-Grissom, Heather; Haynes, Benjamin D; Hermansen, Russell A; Byers, Robert L; Clement, Mark J; Udall, Joshua A; Wilcox, Edward R; Crandall, Keith A
Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach.
Alana Alexander; Debbie Steel; Beth Slikas; Kendra Hoekzema; Colm Carraher; Matthew Parks; Richard Cronn; C. Scott Baker
Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20...
Full Text Available Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line were used as a model. Single-cell capture was performed using laser capture microdissection (LCM with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈106 cells were subjected to whole genome amplification (WGA. For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 1031–35. For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100× were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100× were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.
Full Text Available While Next-Generation Sequencing (NGS can now be considered an established analysis technology for research applications across the life sciences, the analysis workflows still require substantial bioinformatics expertise. Typical challenges include the appropriate selection of analytical software tools, the speedup of the overall procedure using HPC parallelization and acceleration technology, the development of automation strategies, data storage solutions and finally the development of methods for full exploitation of the analysis results across multiple experimental conditions. Recently, NGS has begun to expand into clinical environments, where it facilitates diagnostics enabling personalized therapeutic approaches, but is also accompanied by new technological, legal and ethical challenges. There are probably as many overall concepts for the analysis of the data as there are academic research institutions. Among these concepts are, for instance, complex IT architectures developed in-house, ready-to-use technologies installed on-site as well as comprehensive Everything as a Service (XaaS solutions. In this mini-review, we summarize the key points to consider in the setup of the analysis architectures, mostly for scientific rather than diagnostic purposes, and provide an overview of the current state of the art and challenges of the field.
Lim, Byung Chan; Lee, Seungbok; Shin, Jong-Yeon; Kim, Jong-Il; Hwang, Hee; Kim, Ki Joong; Hwang, Yong Seung; Seo, Jeong-Sun; Chae, Jong Hee
Duchenne muscular dystrophy or Becker muscular dystrophy might be a suitable candidate disease for application of next-generation sequencing in the genetic diagnosis because the complex mutational spectrum and the large size of the dystrophin gene require two or more analytical methods and have a high cost. The authors tested whether large deletions/duplications or small mutations, such as point mutations or short insertions/deletions of the dystrophin gene, could be predicted accurately in a single platform using next-generation sequencing technology. A custom solution-based target enrichment kit was designed to capture whole genomic regions of the dystrophin gene and other muscular-dystrophy-related genes. A multiplexing strategy, wherein four differently bar-coded samples were captured and sequenced together in a single lane of the Illumina Genome Analyser, was applied. The study subjects were 25 16 with deficient dystrophin expression without a large deletion/duplication and 9 with a known large deletion/duplication. Nearly 100% of the exonic region of the dystrophin gene was covered by at least eight reads with a mean read depth of 107. Pathogenic small mutations were identified in 15 of the 16 patients without a large deletion/duplication. Using these 16 patients as the standard, the authors' method accurately predicted the deleted or duplicated exons in the 9 patients with known mutations. Inclusion of non-coding regions and paired-end sequence analysis enabled accurate identification by increasing the read depth and providing information about the breakpoint junction. The current method has an advantage for the genetic diagnosis of Duchenne muscular dystrophy and Becker muscular dystrophy wherein a comprehensive mutational search may be feasible using a single platform.
Yan, Liying; Huang, Lei; Xu, Liya; Huang, Jin; Ma, Fei; Zhu, Xiaohui; Tang, Yaqiong; Liu, Mingshan; Lian, Ying; Liu, Ping; Li, Rong; Lu, Sijia; Tang, Fuchou; Qiao, Jie; Xie, X Sunney
In vitro fertilization (IVF), preimplantation genetic diagnosis (PGD), and preimplantation genetic screening (PGS) help patients to select embryos free of monogenic diseases and aneuploidy (chromosome abnormality). Next-generation sequencing (NGS) methods, while experiencing a rapid cost reduction, have improved the precision of PGD/PGS. However, the precision of PGD has been limited by the false-positive and false-negative single-nucleotide variations (SNVs), which are not acceptable in IVF and can be circumvented by linkage analyses, such as short tandem repeats or karyomapping. It is noteworthy that existing methods of detecting SNV/copy number variation (CNV) and linkage analysis often require separate procedures for the same embryo. Here we report an NGS-based PGD/PGS procedure that can simultaneously detect a single-gene disorder and aneuploidy and is capable of linkage analysis in a cost-effective way. This method, called "mutated allele revealed by sequencing with aneuploidy and linkage analyses" (MARSALA), involves multiple annealing and looping-based amplification cycles (MALBAC) for single-cell whole-genome amplification. Aneuploidy is determined by CNVs, whereas SNVs associated with the monogenic diseases are detected by PCR amplification of the MALBAC product. The false-positive and -negative SNVs are avoided by an NGS-based linkage analysis. Two healthy babies, free of the monogenic diseases of their parents, were born after such embryo selection. The monogenic diseases originated from a single base mutation on the autosome and the X-chromosome of the disease-carrying father and mother, respectively.
Bijwaard, Karen; Dickey, Jennifer S; Kelm, Kellie; Težak, Živana
The rapid emergence and clinical translation of novel high-throughput sequencing technologies created a need to clarify the regulatory pathway for the evaluation and authorization of these unique technologies. Recently, the US FDA authorized for marketing four next generation sequencing (NGS)-based diagnostic devices which consisted of two heritable disease-specific assays, library preparation reagents and a NGS platform that are intended for human germline targeted sequencing from whole blood. These first authorizations can serve as a case study in how different types of NGS-based technology are reviewed by the FDA. In this manuscript we describe challenges associated with the evaluation of these novel technologies and provide an overview of what was reviewed. Besides making validated NGS-based devices available for in vitro diagnostic use, these first authorizations create a regulatory path for similar future instruments and assays.
Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon
The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.
Dannemiller, Karen C.; Lang-Yona, Naama; Yamamoto, Naomichi; Rudich, Yinon; Peccia, Jordan
We examined fungal communities associated with the PM10 mass of Rehovot, Israel outdoor air samples collected in the spring and fall seasons. Fungal communities were described by 454 pyrosequencing of the internal transcribed spacer (ITS) region of the fungal ribosomal RNA encoding gene. To allow for a more quantitative comparison of fungal exposure in humans, the relative abundance values of specific taxa were transformed to absolute concentrations through multiplying these values by the sample's total fungal spore concentration (derived from universal fungal qPCR). Next, the sequencing-based absolute concentrations for Alternaria alternata, Cladosporium cladosporioides, Epicoccum nigrum, and Penicillium/Aspergillus spp. were compared to taxon-specific qPCR concentrations for A. alternata, C. cladosporioides, E. nigrum, and Penicillium/Aspergillus spp. derived from the same spring and fall aerosol samples. Results of these comparisons showed that the absolute concentration values generated from pyrosequencing were strongly associated with the concentration values derived from taxon-specific qPCR (for all four species, p 0.70). The correlation coefficients were greater for species present in higher concentrations. Our microbial aerosol population analyses demonstrated that fungal diversity (number of fungal operational taxonomic units) was higher in the spring compared to the fall (p = 0.02), and principal coordinate analysis showed distinct seasonal differences in taxa distribution (ANOSIM p = 0.004). Among genera containing allergenic and/or pathogenic species, the absolute concentrations of Alternaria, Aspergillus, Fusarium, and Cladosporium were greater in the fall, while Cryptococcus, Penicillium, and Ulocladium concentrations were greater in the spring. The transformation of pyrosequencing fungal population relative abundance data to absolute concentrations can improve next-generation DNA sequencing-based quantitative aerosol exposure assessment.
Warner, Tom; Derber, John; Zupanski, Milija; Cohn, Steve; Verlinde, Hans
Four-dimensional data assimilation strategies can generally be classified as either current or next generation, depending upon whether they are used operationally or not. Current-generation data-assimilation techniques are those that are presently used routinely in operational-forecasting or research applications. They can be classified into the following categories: intermittent assimilation, Newtonian relaxation, and physical initialization. It should be noted that these techniques are the subject of continued research, and their improvement will parallel the development of next generation techniques described by the other speakers. Next generation assimilation techniques are those that are under development but are not yet used operationally. Most of these procedures are derived from control theory or variational methods and primarily represent continuous assimilation approaches, in which the data and model dynamics are 'fitted' to each other in an optimal way. Another 'next generation' category is the initialization of convective-scale models. Intermittent assimilation systems use an objective analysis to combine all observations within a time window that is centered on the analysis time. Continuous first-generation assimilation systems are usually based on the Newtonian-relaxation or 'nudging' techniques. Physical initialization procedures generally involve the use of standard or nonstandard data to force some physical process in the model during an assimilation period. Under the topic of next-generation assimilation techniques, variational approaches are currently being actively developed. Variational approaches seek to minimize a cost or penalty function which measures a model's fit to observations, background fields and other imposed constraints. Alternatively, the Kalman filter technique, which is also under investigation as a data assimilation procedure for numerical weather prediction, can yield acceptable initial conditions for mesoscale models. The
Wagle, Prerana; Nikolić, Miloš; Frommolt, Peter
Next-Generation Sequencing (NGS) has emerged as a widely used tool in molecular biology. While time and cost for the sequencing itself are decreasing, the analysis of the massive amounts of data remains challenging. Since multiple algorithmic approaches for the basic data analysis have been developed, there is now an increasing need to efficiently use these tools to obtain results in reasonable time. We have developed QuickNGS, a new workflow system for laboratories with the need to analyze data from multiple NGS projects at a time. QuickNGS takes advantage of parallel computing resources, a comprehensive back-end database, and a careful selection of previously published algorithmic approaches to build fully automated data analysis workflows. We demonstrate the efficiency of our new software by a comprehensive analysis of 10 RNA-Seq samples which we can finish in only a few minutes of hands-on time. The approach we have taken is suitable to process even much larger numbers of samples and multiple projects at a time. Our approach considerably reduces the barriers that still limit the usability of the powerful NGS technology and finally decreases the time to be spent before proceeding to further downstream analysis and interpretation of the data.
Ono, Shintaro; Nakayama, Manabu; Kanegane, Hirokazu; Hoshino, Akihiro; Shimodera, Saeko; Shibata, Hirofumi; Fujino, Hisanori; Fujino, Takahiro; Yunomae, Yuta; Okano, Tsubasa; Yamashita, Motoi; Yasumi, Takahiro; Izawa, Kazushi; Takagi, Masatoshi; Imai, Kohsuke; Zhang, Kejian; Marsh, Rebecca; Picard, Capucine; Latour, Sylvain; Ohara, Osamu; Morio, Tomohiro
Epstein-Barr virus (EBV) is associated with several life-threatening diseases, such as lymphoproliferative disease (LPD), particularly in immunocompromised hosts. Some categories of primary immunodeficiency diseases (PIDs) including X-linked lymphoproliferative syndrome (XLP), are characterized by susceptibility and vulnerability to EBV infection. The number of genetically defined PIDs is rapidly increasing, and clinical genetic testing plays an important role in establishing a definitive diagnosis. Whole-exome sequencing is performed for diagnosing rare genetic diseases, but is both expensive and time-consuming. Low-cost, high-throughput gene analysis systems are thus necessary. We developed a comprehensive molecular diagnostic method using a two-step tailed polymerase chain reaction (PCR) and a next-generation sequencing (NGS) platform to detect mutations in 23 candidate genes responsible for XLP or XLP-like diseases. Samples from 19 patients suspected of having EBV-associated LPD were used in this comprehensive molecular diagnosis. Causative gene mutations (involving PRF1 and SH2D1A) were detected in two of the 19 patients studied. This comprehensive diagnosis method effectively detected mutations in all coding exons of 23 genes with sufficient read numbers for each amplicon. This comprehensive molecular diagnostic method using PCR and NGS provides a rapid, accurate, low-cost diagnosis for patients with XLP or XLP-like diseases.
Karan, M; Evans, D S; Reilly, D; Schulte, K; Wright, C; Innes, D; Holton, T A; Nikles, D G; Dickinson, G R
Khaya senegalensis (African mahogany or dry-zone mahogany) is a high-value hardwood timber species with great potential for forest plantations in northern Australia. The species is distributed across the sub-Saharan belt from Senegal to Sudan and Uganda. Because of heavy exploitation and constraints on natural regeneration and sustainable planting, it is now classified as a vulnerable species. Here, we describe the development of microsatellite markers for K. senegalensis using next-generation sequencing to assess its intra-specific diversity across its natural range, which is a key for successful breeding programs and effective conservation management of the species. Next-generation sequencing yielded 93,943 sequences with an average read length of 234 bp. The assembled sequences contained 1030 simple sequence repeats, with primers designed for 522 microsatellite loci. Twenty-one microsatellite loci were tested with 11 showing reliable amplification and polymorphism in K. senegalensis. The 11 novel microsatellites, together with one previously published, were used to assess 73 accessions belonging to the Australian K. senegalensis domestication program, sampled from across the natural range of the species. STRUCTURE analysis shows two major clusters, one comprising mainly accessions from west Africa (Senegal to Benin) and the second based in the far eastern limits of the range in Sudan and Uganda. Higher levels of genetic diversity were found in material from western Africa. This suggests that new seed collections from this region may yield more diverse genotypes than those originating from Sudan and Uganda in eastern Africa. © 2011 Blackwell Publishing Ltd.
Quantitation of next generation sequencing library preparation protocol efficiencies using droplet digital PCR assays - a systematic comparison of DNA library preparation kits for Illumina sequencing.
Aigrain, Louise; Gu, Yong; Quail, Michael A
The emergence of next-generation sequencing (NGS) technologies in the past decade has allowed the democratization of DNA sequencing both in terms of price per sequenced bases and ease to produce DNA libraries. When it comes to preparing DNA sequencing libraries for Illumina, the current market leader, a plethora of kits are available and it can be difficult for the users to determine which kit is the most appropriate and efficient for their applications; the main concerns being not only cost but also minimal bias, yield and time efficiency. We compared 9 commercially available library preparation kits in a systematic manner using the same DNA sample by probing the amount of DNA remaining after each protocol steps using a new droplet digital PCR (ddPCR) assay. This method allows the precise quantification of fragments bearing either adaptors or P5/P7 sequences on both ends just after ligation or PCR enrichment. We also investigated the potential influence of DNA input and DNA fragment size on the final library preparation efficiency. The overall library preparations efficiencies of the libraries show important variations between the different kits with the ones combining several steps into a single one exhibiting some final yields 4 to 7 times higher than the other kits. Detailed ddPCR data also reveal that the adaptor ligation yield itself varies by more than a factor of 10 between kits, certain ligation efficiencies being so low that it could impair the original library complexity and impoverish the sequencing results. When a PCR enrichment step is necessary, lower adaptor-ligated DNA inputs leads to greater amplification yields, hiding the latent disparity between kits. We describe a ddPCR assay that allows us to probe the efficiency of the most critical step in the library preparation, ligation, and to draw conclusion on which kits is more likely to preserve the sample heterogeneity and reduce the need of amplification.
Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin
Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937
Polpass Arul Jose
Full Text Available Starting with the discovery of streptomycin, the promise of natural products research on actinomycetes has been captivat¬ing researchers and offered an array of life-saving antibiotics. However, most of the actinomycetes have received a little attention of researchers beyond isolation and activity screening. Noticeable gaps in genomic information and associated biosynthetic potential of actinomycetes are mainly the reasons for this situation, which has led to a decline in the discovery rate of novel antibiotics. Recent insights gained from genome mining have revealed a massive existence of previously unrecognized biosynthetic potential in actinomycetes. Successive developments in next-generation sequencing, genome editing, analytical separation and high-resolution spectroscopic methods have reinvigorated interest on such actinomycetes and opened new avenues for the discovery of natural and natural-inspired antibiotics. This article describes the new dimensions that have driven the ongoing resurgence of research on actinomycetes with historical background since the commencement in 1940, for the attention of worldwide researchers. Coupled with increasing advancement in molecular and analytical tools and techniques, the discovery of next-generation antibiotics could be possible by revisiting the untapped potential of actinomycetes from different natural sources.
Full Text Available Complex chromosome rearrangements (CCRs, which are rather rare in the whole population, may be associated with aberrant phenotypes. Next-generation sequencing (NGS and conventional techniques, could be used to reveal specific CCRs for better genetic counseling. We report the CCRs of a girl and her mother, which were identified using a combination of NGS and conventional techniques including G-banding, fluorescence in situ hybridization (FISH and PCR. The girl demonstrated CCRs involving chromosomes 3 and 8, while the CCRs of her mother involved chromosomes 3, 5, 8, 11 and 15. HumanCytoSNP-12 Chip analysis identified a 35.4 Mb duplication on chromosome 15q21.3-q26.2 in the proband and a 1.6 Mb microdeletion at chromosome 15q21.3 in her mother. The proband inherited the rearranged chromosomes 3 and 8 from her mother, and the duplicated region on chromosome 15 of the proband was inherited from the mother. Approximately one hundred genes were identified in the 15q21.3-q26.2 duplicated region of the proband. In particular, TPM1, SMAD6, SMAD3, and HCN4 may be associated with her heart defects, and HEXA, KIF7, and IDH2 are responsible for her developmental and mental retardation. In addition, we suggest that a microdeletion on the 15q21.3 region of the mother, which involved TCF2, TCF12, ADMA10 and AQP9, might be associated with mental retardation. We delineate the precise structures of the derivative chromosomes, chromosome duplication origin and possible molecular mechanisms for aberrant phenotypes by combining NGS data with conventional techniques.
Qian, Xiaoqin; Hou, Jiayi; Wang, Zheng; Ye, Yi; Lang, Min; Gao, Tianzhen; Liu, Jing; Hou, Yiping
There is high demand for forensic pedigree searches with Y-chromosome short tandem repeat (Y-STR) profiling in large-scale crime investigations. However, when two Y-STR haplotypes have a few mismatched loci, it is difficult to determine if they are from the same male lineage because of the high mutation rate of Y-STRs. Here we design a new strategy to handle cases in which none of pedigree samples shares identical Y-STR haplotype. We combine next generation sequencing (NGS), capillary electrophoresis and pyrosequencing under the term 'NGS+' for typing Y-STRs and Y-chromosomal single nucleotide polymorphisms (Y-SNPs). The high-resolution Y-SNP haplogroup and Y-STR haplotype can be obtained with NGS+. We further developed a new data-driven decision rule, FSindex, for estimating the likelihood for each retrieved pedigree. Our approach enables positive identification of pedigree from mismatched Y-STR haplotypes. It is envisaged that NGS+ will revolutionize forensic pedigree searches, especially when the person of interest was not recorded in forensic DNA database.
Full Text Available Knowledge about diversity and taxonomic structure of the microbial population present in traditional fermented foods plays a key role in starter culture selection, safety improvement and quality enhancement of the end product. Aim of this study was to investigate microbial consortia composition in Slovak bryndza cheese. For this purpose, we used culture-independent approach based on 16S rDNA amplicon sequencing using next generation sequencing platform. Results obtained by the analysis of three commercial (produced on industrial scale in winter season and one traditional (artisanal, most valued, produced in May Slovak bryndza cheese sample were compared. A diverse prokaryotic microflora composed mostly of the genera Lactococcus, Streptococcus, Lactobacillus, and Enterococcus was identified. Lactococcus lactis subsp. lactis and Lactococcus lactis subsp. cremoris were the dominant taxons in all tested samples. Second most abundant species, detected in all bryndza cheeses, were Lactococcus fujiensis and Lactococcus taiwanensis, independently by two different approaches, using different reference 16S rRNA genes databases (Greengenes and NCBI respectively. They have been detected in bryndza cheese samples in substantial amount for the first time. The narrowest microbial diversity was observed in a sample made with a starter culture from pasteurised milk. Metagenomic analysis by high-throughput sequencing using 16S rRNA genes seems to be a powerful tool for studying the structure of the microbial population in cheeses.
Sarah M Hird
Full Text Available Genomic enrichment methods and next-generation sequencing produce uneven coverage for the portions of the genome (the loci they target; this information is essential for ascertaining the suitability of each locus for further analysis. lociNGS is a user-friendly accessory program that takes multi-FASTA formatted loci, next-generation sequence alignments and demographic data as input and collates, displays and outputs information about the data. Summary information includes the parameters coverage per locus, coverage per individual and number of polymorphic sites, among others. The program can output the raw sequences used to call loci from next-generation sequencing data. lociNGS also reformats subsets of loci in three commonly used formats for multi-locus phylogeographic and population genetics analyses - NEXUS, IMa2 and Migrate. lociNGS is available at https://github.com/SHird/lociNGS and is dependent on installation of MongoDB (freely available at http://www.mongodb.org/downloads. lociNGS is written in Python and is supported on MacOSX and Unix; it is distributed under a GNU General Public License.
Full Text Available A diverse antibody repertoire is primarily generated by the rearrangement of V, D, and J genes and subsequent somatic hypermutation (SHM. Class-switch recombination (CSR produces various isotypes and subclasses with different functional properties. Although antibody isotypes and subclasses are considered to be produced by both direct and sequential CSR, it is still not fully understood how SHMs accumulate during the process in which antibody subclasses are generated. Here, we developed a new next-generation sequencing (NGS-based antibody repertoire analysis capable of identifying all antibody isotype and subclass genes and used it to examine the peripheral blood mononuclear cells of 12 healthy individuals. Using a total of 5,480,040 sequences, we compared percentage frequency of variable (V, junctional (J sequence, and a combination of V and J, diversity, length, and amino acid compositions of CDR3, SHM, and shared clones in the IgM, IgD, IgG3, IgG1, IgG2, IgG4, IgA1, IgE, and IgA2 genes. The usage and diversity were similar among the immunoglobulin (Ig subclasses. Clonally related sequences sharing identical V, D, J, and CDR3 amino acid sequences were frequently found within multiple Ig subclasses, especially between IgG1 and IgG2 or IgA1 and IgA2. SHM occurred most frequently in IgG4, while IgG3 genes were the least mutated among all IgG subclasses. The shared clones had almost the same SHM levels among Ig subclasses, while subclass-specific clones had different levels of SHM dependent on the genomic location. Given the sequential CSR, these results suggest that CSR occurs sequentially over multiple subclasses in the order corresponding to the genomic location of IGHCs, but CSR is likely to occur more quickly than SHMs accumulate within Ig genes under physiological conditions. NGS-based antibody repertoire analysis should provide critical information on how various antibodies are generated in the immune system.
Sarver Aaron L
Full Text Available Abstract Background Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1 map junction fragments within the genome and 2 identify Common Insertion Sites (CISs within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation. Results We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher’s exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data. Conclusions The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches
Low, Teck Yew; Heck, Albert Jr
Both genomics and proteomics technologies have matured in the last decade to a level where they are able to deliver system-wide data on the qualitative and quantitative abundance of their respective molecular entities, that is DNA/RNA and proteins. A next logical step is the collective use of these
Liang, Chanjuan; van Dijk, Jeroen P; Scholtens, Ingrid M J; Staats, Martijn; Prins, Theo W; Voorhuijzen, Marleen M; da Silva, Andrea M; Arisi, Ana Carolina Maisonnave; den Dunnen, Johan T; Kok, Esther J
The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.
Full Text Available We used targeted next generation deep-sequencing (Safe Sequencing System to measure ultra-rare de novo mutation frequencies in the human male germline by attaching a unique identifier code to each target DNA molecule. Segments from three different human genes (FGFR3, MECP2 and PTPN11 were studied. Regardless of the gene segment, the particular testis donor or the 73 different testis pieces used, the frequencies for any one of the six different mutation types were consistent. Averaging over the C>T/G>A and G>T/C>A mutation types the background mutation frequency was 2.6x10-5 per base pair, while for the four other mutation types the average background frequency was lower at 1.5x10-6 per base pair. These rates far exceed the well documented human genome average frequency per base pair (~10-8 suggesting a non-biological explanation for our data. By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles. Finally, we looked at a previously studied disease mutation in the PTPN11 gene and could easily distinguish true mutations from the SSS background. We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments.
Ossa, Carmen G; Larridon, Isabel; Peralta, Gioconda; Asselman, Pieter; Pérez, Fernanda
The aim of this study was to develop microsatellite markers as a tool to study population structure, genetic diversity and effective population size of Echinopsis chiloensis, an endemic cactus from arid and semiarid regions of Central Chile. We developed 12 polymorphic microsatellite markers for E. chiloensis using next-generation sequencing and tested them in 60 individuals from six sites, covering all the latitudinal range of this species. The number of alleles per locus ranged from 3 to 8, while the observed (Ho) and expected (He) heterozygosity ranged from 0.0 to 0.80 and from 0.10 to 0.76, respectively. We also detected significant differences between sites, with F ST values ranging from 0.05 to 0.29. Microsatellite markers will enable us to estimate genetic diversity and population structure of E. chiloensis in future ecological and phylogeographic studies.
Brassac, Jonathan; Blattner, Frank R
Polyploidization is an important speciation mechanism in the barley genus Hordeum. To analyze evolutionary changes after allopolyploidization, knowledge of parental relationships is essential. One chloroplast and 12 nuclear single-copy loci were amplified by polymerase chain reaction (PCR) in all Hordeum plus six out-group species. Amplicons from each of 96 individuals were pooled, sheared, labeled with individual-specific barcodes and sequenced in a single run on a 454 platform. Reference sequences were obtained by cloning and Sanger sequencing of all loci for nine supplementary individuals. The 454 reads were assembled into contigs representing the 13 loci and, for polyploids, also homoeologues. Phylogenetic analyses were conducted for all loci separately and for a concatenated data matrix of all loci. For diploid taxa, a Bayesian concordance analysis and a coalescent-based dated species tree was inferred from all gene trees. Chloroplast matK was used to determine the maternal parent in allopolyploid taxa. The relative performance of different multilocus analyses in the presence of incomplete lineage sorting and hybridization was also assessed. The resulting multilocus phylogeny reveals for the first time species phylogeny and progenitor-derivative relationships of all di- and polyploid Hordeum taxa within a single analysis. Our study proves that it is possible to obtain a multilocus species-level phylogeny for di- and polyploid taxa by combining PCR with next-generation sequencing, without cloning and without creating a heavy load of sequence data. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Marchal, Claire; Sasaki, Takayo; Vera, Daniel; Wilson, Korey; Sima, Jiao; Rivera-Mulia, Juan Carlos; Trevilla-García, Claudia; Nogues, Coralin; Nafie, Ebtesam; Gilbert, David M
This protocol is an extension to: Nat. Protoc. 6, 870-895 (2014); doi:10.1038/nprot.2011.328; published online 02 June 2011Cycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early- and late-replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and subnuclear position. Moreover, RT is regulated during development and is altered in diseases. Here, we describe E/L Repli-seq, an extension of our Repli-chip protocol. E/L Repli-seq is a rapid, robust and relatively inexpensive protocol for analyzing RT by next-generation sequencing (NGS), allowing genome-wide assessment of how cellular processes are linked to RT. Briefly, cells are pulse-labeled with BrdU, and early and late S-phase fractions are sorted by flow cytometry. Labeled nascent DNA is immunoprecipitated from both fractions and sequenced. Data processing leads to a single bedGraph file containing the ratio of nascent DNA from early versus late S-phase fractions. The results are comparable to those of Repli-chip, with the additional benefits of genome-wide sequence information and an increased dynamic range. We also provide computational pipelines for downstream analyses, for parsing phased genomes using single-nucleotide polymorphisms (SNPs) to analyze RT allelic asynchrony, and for direct comparison to Repli-chip data. This protocol can be performed in up to 3 d before sequencing, and requires basic cellular and molecular biology skills, as well as a basic understanding of Unix and R.
Friedman, Adam A; Letai, Anthony; Fisher, David E; Flaherty, Keith T
Precision medicine is about matching the right drugs to the right patients. Although this approach is technology agnostic, in cancer there is a tendency to make precision medicine synonymous with genomics. However, genome-based cancer therapeutic matching is limited by incomplete biological understanding of the relationship between phenotype and cancer genotype. This limitation can be addressed by functional testing of live patient tumour cells exposed to potential therapies. Recently, several 'next-generation' functional diagnostic technologies have been reported, including novel methods for tumour manipulation, molecularly precise assays of tumour responses and device-based in situ approaches; these address the limitations of the older generation of chemosensitivity tests. The promise of these new technologies suggests a future diagnostic strategy that integrates functional testing with next-generation sequencing and immunoprofiling to precisely match combination therapies to individual cancer patients.
Ramos, Antonio M.; Crooijmans, Richard P. M. A.; Affara, Nabeel A.; Amaral, Andreia J.; Archibald, Alan L.; Beever, Jonathan E.; Bendixen, Christian; Churcher, Carol; Clark, Richard; Dehais, Patrick; Hansen, Mark S.; Hedegaard, Jakob; Hu, Zhi-Liang; Kerstens, Hindrik H.; Law, Andy S.; Megens, Hendrik-Jan; Milan, Denis; Nonneman, Danny J.; Rohrer, Gary A.; Rothschild, Max F.; Smith, Tim P. L.; Schnabel, Robert D.; Van Tassell, Curt P.; Taylor, Jeremy F.; Wiedmann, Ralph T.; Schook, Lawrence B.; Groenen, Martien A. M.
Background The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay. Methodology/Principal Findings A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina's Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274. Conclusions/Significance Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs. PMID:19654876
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Li, Huei-Ying; Chen, Pei-Lung; Hsiao, Chung-Der
In this study, the complete mitogenome sequence of Northwestern Pacific 2 (NWP2) cryptic species of flathead mullet, Mugil cephalus (Teleostei: Mugilidae) has been amplified by long-range PCR and sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,686 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop was 909 bp length and was located between tRNA-Pro and tRNA-Phe. The overall base composition of NWP2 M. cephalus was 28.4% for A, 29.8% for C, 26.5% for T and 15.3% for G. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Full Text Available Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies (GWAS in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS, diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Shaw, Wen Hui; Lin, Qianqian; Muhammad, Zikry Zhiwei Bin Roslee; Lee, Jia Jun; Khong, Wei Xin; Ng, Oon Tek; Tan, Eng Lee; Li, Peng
Current clinical detection of Human immunodeficiency virus 1 (HIV-1) is used to target viral genes and proteins. However, the immunoassay, such as viral culture or Polymerase Chain Reaction (PCR), lacks accuracy in the diagnosis, as these conventional assays rely on the stable genome and HIV-1 is a highly-mutated virus. Next generation sequencing (NGS) promises to be transformative for the practice of infectious disease, and the rapidly reducing cost and processing time mean that this will become a feasible technology in diagnostic and research laboratories in the near future. The technology offers the superior sensitivity to detect the pathogenic viruses, including unknown and unexpected strains. To leverage the NGS technology in order to improve current HIV-1 diagnosis and genotyping methods. Ten blood samples were collected from HIV-1 infected patients which were diagnosed by RT PCR at Singapore Communicable Disease Centre, Tan Tock Seng Hospital from October 2014 to March 2015. Viral RNAs were extracted from blood plasma and reversed into cDNA. The HIV-1 cDNA samples were cleaned up using a PCR purification kit and the sequencing library was prepared and identified through MiSeq. Two common mutations were observed in all ten samples. The common mutations were identified at genome locations 1908 and 2104 as missense and silent mutations respectively, conferring S37N and S3S found on aspartic protease and reverse transcriptase subunits. The common mutations identified in this study were not previously reported, therefore suggesting the potential for them to be used for identification of viral infection, disease transmission and drug resistance. This was especially the case for, missense mutation S37N which could cause an amino acid change in viral proteases thus reducing the binding affinity of some protease inhibitors. Thus, the unique common mutations identified in this study could be used as diagnostic biomarkers to indicate the origin of infection as being
Bhat, Javaid A; Ali, Sajad; Salgotra, Romesh K; Mir, Zahoor A; Dutta, Sutapa; Jadon, Vasudha; Tyagi, Anshika; Mushtaq, Muntazir; Jain, Neelu; Singh, Pradeep K; Singh, Gyanendra P; Prabhu, K V
Genomic selection (GS) is a promising approach exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. In plant breeding, it provides opportunities to increase genetic gain of complex traits per unit time and cost. The cost-benefit balance was an important consideration for GS to work in crop plants. Availability of genome-wide high-throughput, cost-effective and flexible markers, having low ascertainment bias, suitable for large population size as well for both model and non-model crop species with or without the reference genome sequence was the most important factor for its successful and effective implementation in crop species. These factors were the major limitations to earlier marker systems viz., SSR and array-based, and was unimaginable before the availability of next-generation sequencing (NGS) technologies which have provided novel SNP genotyping platforms especially the genotyping by sequencing. These marker technologies have changed the entire scenario of marker applications and made the use of GS a routine work for crop improvement in both model and non-model crop species. The NGS-based genotyping have increased genomic-estimated breeding value prediction accuracies over other established marker platform in cereals and other crop species, and made the dream of GS true in crop breeding. But to harness the true benefits from GS, these marker technologies will be combined with high-throughput phenotyping for achieving the valuable genetic gain from complex traits. Moreover, the continuous decline in sequencing cost will make the WGS feasible and cost effective for GS in near future. Till that time matures the targeted sequencing seems to be more cost-effective option for large scale marker discovery and GS, particularly in case of large and un-decoded genomes.
Held, Kathrin; Beltrán, Eduardo; Moser, Markus; Hohlfeld, Reinhard; Dornmair, Klaus
Mucosal-associated invariant T (MAIT) cells are a T-cell subset that expresses a conserved TRAV1-2 (Vα7.2) T-cell receptor (TCR) chain and the surface marker CD161. They are involved in the defence against microbes as they recognise small organic molecules of microbial origin that are presented by the non-classical MHC molecule 1 (MR1). MAIT cells express a semi-restricted TCR α chain with TRAV1-2 preferentially linked to TRAJ33, TRAJ12, or TRAJ20 which pairs with a limited set of β chains. To investigate the TCR repertoire of human CD161(hi)TRAV1-2(+) T cells in depth we analysed the α and β chains of this T-cell subset by next generation sequencing. Concomitantly we analysed 132 paired α and β chains from single cells to assess the αβ pairing preferences. We found that the CD161(hi)TRAV1-2(+) TCR repertoire in addition to the typical MAIT TCRs further contains polyclonal elements reminiscent of classical αβ T cells. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Vuyisich, Momchilo [Los Alamos National Laboratory
NGS technology overview: (1) NGS library preparation - Nucleic acids extraction, Sample quality control, RNA conversion to cDNA, Addition of sequencing adapters, Quality control of library; (2) Sequencing - Clonal amplification of library fragments, (except PacBio), Sequencing by synthesis, Data output (reads and quality); and (3) Data analysis - Read mapping, Genome assembly, Gene expression, Operon structure, sRNA discovery, and Epigenetic analyses.
Katalin Komlosi MD, PhD
Full Text Available Next-generation sequencing (NGS panels are used widely in clinical diagnostics to identify genetic causes of various monogenic disease groups including neurometabolic disorders and, more recently, lysosomal storage disorders (LSDs. Many new challenges have been introduced through these new technologies, both at the laboratory level and at the bioinformatics level, with consequences including new requirements for interpretation of results, and for genetic counseling. We review some recent examples of the application of NGS technologies, with purely diagnostic and with both diagnostic and research aims, for establishing a rapid genetic diagnosis in LSDs. Given that NGS can be applied in a way that takes into account the many issues raised by international consensus guidelines, it can have a significant role even early in the course of the diagnostic process, in combination with biochemical and clinical data. Besides decreasing the delay in diagnosis for many patients, a precise molecular diagnosis is extremely important as new therapies are becoming available within the LSD spectrum for patients who share specific types of mutations. A genetic diagnosis is also the prerequisite for genetic counseling, family planning, and the individual choice of reproductive options in affected families.
Full Text Available Next-generation sequencing (NGS has the potential to provide typing results and detect resistance genes in a single assay, thus guiding timely treatment decisions and allowing rapid tracking of transmission of resistant clones. We can be evaluated the performance of a new NGS assay during an outbreak of sequence type 131 (ST131 Escherichia coli infections in a teaching hospital. The assay will be performed on 100 extended-spectrum- beta-lactamase (ESBL E. coli isolates collected from UTI during last 5 years. Typing results will be compared to those of amplified fragment length polymorphism (AFLP, whereby we will be visually assessed the agreement of the Bio-Detection phylogenetic tree with clusters defined by AFLP. A microarray will be considered the gold standard for detection of resistance genes. AFLP will be identified a large cluster of different indistinguishable isolates on adjacent departments, indicating clonal spread. The BioDetection phylogenetic tree will be showed that all isolates of this outbreak cluster will be strongly related, while the further arrangement of the tree also largely agreed with other clusters defined by AFLP. With these experiments we will detect the ESBL and MBL strains and the patient can be prescribed the antibiotics accordingly.
Mathias, Patrick C; Turner, Emily H; Scroggins, Sheena M; Salipante, Stephen J; Hoffman, Noah G; Pritchard, Colin C; Shirts, Brian H
To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors. We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors. We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique. Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Lawrie, Charles H; Armesto, María; Fernandez-Mercado, Marta; Arestín, María; Manterola, Lorea; Goicoechea, Ibai; Larrea, Erika; Caffarel, María M; Araujo, Angela M; Sole, Carla; Sperga, Maris; Alvarado-Cabrero, Isabel; Michal, Michal; Hes, Ondrej; López, José I
Tubulocystic renal cell carcinoma (TC-RCC) is a rare recently described renal neoplasm characterized by gross, microscopic, and immunohistochemical differences from other renal tumor types and was recently classified as a distinct entity. However, this distinction remains controversial particularly because some genetic studies suggest a close relationship with papillary RCC (PRCC). The molecular basis of this disease remains largely unexplored. We therefore performed noncoding (nc) RNA/miRNA expression analysis and targeted next-generation sequencing mutational profiling on 13 TC-RCC cases (11 pure, two mixed TC-RCC/PRCC) and compared with other renal neoplasms. The expression profile of miRNAs and other ncRNAs in TC-RCC was distinct and validated 10 differentially expressed miRNAs by quantitative RT-PCR, including miR-155 and miR-34a, that were significantly down-regulated compared with PRCC cases (n = 22). With the use of targeted next-generation sequencing we identified mutations in 14 different genes, most frequently (>60% of TC-RCC cases) in ABL1 and PDFGRA genes. These mutations were present in 600) of The Cancer Genome Atlas database. In summary, this study is by far the largest molecular study of TC-RCC cases and the first to investigate either ncRNA expression or their genomic profile. These results add molecular evidence that TC-RCC is indeed a distinct entity from PRCC and other renal neoplasms. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Anık, Ahmet; Çatlı, Gönül; Abacı, Ayhan; Sarı, Erkan; Yeşilkaya, Ediz; Korkmaz, Hüseyin Anıl; Demir, Korcan; Altıncık, Ayça; Tuhan, Hale Ünver; Kızıldağ, Sefa; Özkan, Behzat; Ceylaner, Serdar; Böber, Ece
To perform molecular analysis of pediatric maturity onset diabetes of the young (MODY) patients by next-generation sequencing, which enables simultaneous analysis of multiple genes in a single test, to determine the genetic etiology of a group of Turkish children clinically diagnosed as MODY, and to assess genotype-phenotype relationship. Forty-two children diagnosed with MODY and their parents were enrolled in the study. Clinical and laboratory characteristics of the patients at the time of diagnosis were obtained from hospital records. Molecular analyses of GCK, HNF1A, HNF4A, HNF1B, PDX1, NEUROD1, KLF11, CEL, PAX4, INS, and BLK genes were performed on genomic DNA by using next-generation sequencing. Pathogenicity for novel mutations was assessed by bioinformatics prediction software programs and segregation analyses. A mutation in MODY genes was identified in 12 (29%) of the cases. GCK mutations were detected in eight cases, and HNF1B, HNF1A, PDX1, and BLK mutations in the others. We identified five novel missense mutations - three in GCK (p.Val338Met, p.Cys252Ser, and p.Val86Ala), one in HNF1A (p.Cys241Ter), and one in PDX1 (p.Gly55Asp), which we believe to be pathogenic. The results of this study showed that mutations in the GCK gene are the leading cause of MODY in our population. Moreover, genetic diagnosis could be made in 29% of Turkish patients, and five novel mutations were identified.
Weerakkody, Ruwan A; Vandrovcova, Jana; Kanonidou, Christina; Mueller, Michael; Gampawar, Piyush; Ibrahim, Yousef; Norsworthy, Penny; Biggs, Jennifer; Abdullah, Abdulshakur; Ross, David; Black, Holly A; Ferguson, David; Cheshire, Nicholas J; Kazkaz, Hanadi; Grahame, Rodney; Ghali, Neeti; Vandersteen, Anthony; Pope, F Michael; Aitman, Timothy J
Ehlers-Danlos syndrome (EDS) comprises a group of overlapping hereditary disorders of connective tissue with significant morbidity and mortality, including major vascular complications. We sought to identify the diagnostic utility of a next-generation sequencing (NGS) panel in a mixed EDS cohort. We developed and applied PCR-based NGS assays for targeted, unbiased sequencing of 12 collagen and aortopathy genes to a cohort of 177 unrelated EDS patients. Variants were scored blind to previous genetic testing and then compared with results of previous Sanger sequencing. Twenty-eight pathogenic variants in COL5A1/2, COL3A1, FBN1, and COL1A1 and four likely pathogenic variants in COL1A1, TGFBR1/2, and SMAD3 were identified by the NGS assays. These included all previously detected single-nucleotide and other short pathogenic variants in these genes, and seven newly detected pathogenic or likely pathogenic variants leading to clinically significant diagnostic revisions. Twenty-two variants of uncertain significance were identified, seven of which were in aortopathy genes and required clinical follow-up. Unbiased NGS-based sequencing made new molecular diagnoses outside the expected EDS genotype-phenotype relationship and identified previously undetected clinically actionable variants in aortopathy susceptibility genes. These data may be of value in guiding future clinical pathways for genetic diagnosis in EDS.Genet Med 18 11, 1119-1127.
Buonuomo, Paola Sabrina; Iughetti, Lorenzo; Pisciotta, Livia; Rabacchi, Claudio; Papadia, Francesco; Bruzzi, Patrizia; Tummolo, Albina; Bartuli, Andrea; Cortese, Claudio; Bertolini, Stefano; Calandra, Sebastiano
Severe hypercholesterolemia associated or not with xanthomas in a child may suggest the diagnosis of homozygous autosomal dominant hypercholesterolemia (ADH), autosomal recessive hypercholesterolemia (ARH) or sitosterolemia, depending on the transmission of hypercholesterolemia in the patient's family. Sitosterolemia is a recessive disorder characterized by high plasma levels of cholesterol and plant sterols due to mutations in the ABCG5 or the ABCG8 gene, leading to a loss of function of the ATP-binding cassette (ABC) heterodimer transporter G5-G8. We aimed to perform the molecular characterization of two children with severe primary hypercholesterolemia. Case #1 was a 2 year-old girl with high LDL-cholesterol (690 mg/dl) and tuberous and intertriginous xanthomas. Case #2 was a 7 year-old boy with elevated LDL-C (432 mg/dl) but no xanthomas. In both cases, at least one parent had elevated LDL-cholesterol levels. For the molecular diagnosis, we applied targeted next generation sequencing (NGS), which unexpectedly revealed that both patients were compound heterozygous for nonsense mutations: Case #1 in ABCG5 gene [p.(Gln251*)/p.(Arg446*)] and Case #2 in ABCG8 gene [p.(Ser107*)/p.(Trp361*)]. Both children had extremely high serum sitosterol and campesterol levels, thus confirming the diagnosis of sisterolemia. A low-fat/low-sterol diet was promptly adopted with and without the addition of ezetimibe for Case #1 and Case #2, respectively. In both patients, serum total and LDL-cholesterol decreased dramatically in two months and progressively normalized. Targeted NGS allows the rapid diagnosis of sitosterolemia in children with severe hypercholesterolemia, even though their family history does not unequivocally suggest a recessive transmission of hypercholesterolemia. A timely diagnosis is crucial to avoid delays in treatment. Copyright © 2017 Elsevier B.V. All rights reserved.
Tan, Swee Jin; Phan, Huan; Gerry, Benjamin Michael; Kuhn, Alexandre; Hong, Lewis Zuocheng; Min Ong, Yao; Poon, Polly Suk Yean; Unger, Marc Alexander; Jones, Robert C.; Quake, Stephen R.; Burkholder, William F.
Library preparation for next-generation DNA sequencing (NGS) remains a key bottleneck in the sequencing process which can be relieved through improved automation and miniaturization. We describe a microfluidic device for automating laboratory protocols that require one or more column chromatography steps and demonstrate its utility for preparing Next Generation sequencing libraries for the Illumina and Ion Torrent platforms. Sixteen different libraries can be generated simultaneously with significantly reduced reagent cost and hands-on time compared to manual library preparation. Using an appropriate column matrix and buffers, size selection can be performed on-chip following end-repair, dA tailing, and linker ligation, so that the libraries eluted from the chip are ready for sequencing. The core architecture of the device ensures uniform, reproducible column packing without user supervision and accommodates multiple routine protocol steps in any sequence, such as reagent mixing and incubation; column packing, loading, washing, elution, and regeneration; capture of eluted material for use as a substrate in a later step of the protocol; and removal of one column matrix so that two or more column matrices with different functional properties can be used in the same protocol. The microfluidic device is mounted on a plastic carrier so that reagents and products can be aliquoted and recovered using standard pipettors and liquid handling robots. The carrier-mounted device is operated using a benchtop controller that seals and operates the device with programmable temperature control, eliminating any requirement for the user to manually attach tubing or connectors. In addition to NGS library preparation, the device and controller are suitable for automating other time-consuming and error-prone laboratory protocols requiring column chromatography steps, such as chromatin immunoprecipitation. PMID:23894273
Swee Jin Tan
Full Text Available Library preparation for next-generation DNA sequencing (NGS remains a key bottleneck in the sequencing process which can be relieved through improved automation and miniaturization. We describe a microfluidic device for automating laboratory protocols that require one or more column chromatography steps and demonstrate its utility for preparing Next Generation sequencing libraries for the Illumina and Ion Torrent platforms. Sixteen different libraries can be generated simultaneously with significantly reduced reagent cost and hands-on time compared to manual library preparation. Using an appropriate column matrix and buffers, size selection can be performed on-chip following end-repair, dA tailing, and linker ligation, so that the libraries eluted from the chip are ready for sequencing. The core architecture of the device ensures uniform, reproducible column packing without user supervision and accommodates multiple routine protocol steps in any sequence, such as reagent mixing and incubation; column packing, loading, washing, elution, and regeneration; capture of eluted material for use as a substrate in a later step of the protocol; and removal of one column matrix so that two or more column matrices with different functional properties can be used in the same protocol. The microfluidic device is mounted on a plastic carrier so that reagents and products can be aliquoted and recovered using standard pipettors and liquid handling robots. The carrier-mounted device is operated using a benchtop controller that seals and operates the device with programmable temperature control, eliminating any requirement for the user to manually attach tubing or connectors. In addition to NGS library preparation, the device and controller are suitable for automating other time-consuming and error-prone laboratory protocols requiring column chromatography steps, such as chromatin immunoprecipitation.
Hirsch, B; Endris, V; Lassmann, S; Weichert, W; Pfarr, N; Schirmacher, P; Kovaleva, V; Werner, M; Bonzheim, I; Fend, F; Sperveslage, J; Kaulich, K; Zacher, A; Reifenberger, G; Köhrer, K; Stepanow, S; Lerke, S; Mayr, T; Aust, D E; Baretton, G; Weidner, S; Jung, A; Kirchner, T; Hansmann, M L; Burbat, L; von der Wall, E; Dietel, M; Hummel, M
The simultaneous detection of multiple somatic mutations in the context of molecular diagnostics of cancer is frequently performed by means of amplicon-based targeted next-generation sequencing (NGS). However, only few studies are available comparing multicenter testing of different NGS platforms and gene panels. Therefore, seven partner sites of the German Cancer Consortium (DKTK) performed a multicenter interlaboratory trial for targeted NGS using the same formalin-fixed, paraffin-embedded (FFPE) specimen of molecularly pre-characterized tumors (n = 15; each n = 5 cases of Breast, Lung, and Colon carcinoma) and a colorectal cancer cell line DNA dilution series. Detailed information regarding pre-characterized mutations was not disclosed to the partners. Commercially available and custom-designed cancer gene panels were used for library preparation and subsequent sequencing on several devices of two NGS different platforms. For every case, centrally extracted DNA and FFPE tissue sections for local processing were delivered to each partner site to be sequenced with the commercial gene panel and local bioinformatics. For cancer-specific panel-based sequencing, only centrally extracted DNA was analyzed at seven sequencing sites. Subsequently, local data were compiled and bioinformatics was performed centrally. We were able to demonstrate that all pre-characterized mutations were re-identified correctly, irrespective of NGS platform or gene panel used. However, locally processed FFPE tissue sections disclosed that the DNA extraction method can affect the detection of mutations with a trend in favor of magnetic bead-based DNA extraction methods. In conclusion, targeted NGS is a very robust method for simultaneous detection of various mutations in FFPE tissue specimens if certain pre-analytical conditions are carefully considered.
Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun
Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
In the countries where the new order of nuclear reactors has ceased, the development of the light water reactors of new type has been discussed, aiming at the revival of nuclear power. Also in Japan, since it is expected that light water reactors continue to be the main power reactor for long period, the technology of light water reactors of next generation has been discussed. For the development of nuclear power, extremely long lead time is required. The light water reactors of next generation now in consideration will continue to be operated till the middle of the next century, therefore, they must take in advance sufficiently the needs of the age. The improvement of the way men and the facilities should be, the simple design, the flexibility to the trend of fuel cycle and so on are required for the light water reactors of next generation. The trend of the development of next generation light water reactors is discussed. The construction of an ABWR was started in September, 1991, as No. 6 plant in Kashiwazaki Kariwa Power Station. (K.I.)
Zhang, Jing; Song, Xiaohong; Ma, Marella J; Xiao, Li; Kenri, Tsuyoshi; Sun, Hongmei; Ptacek, Travis; Li, Shaoli; Waites, Ken B; Atkinson, T Prescott; Shibayama, Keigo; Dybvig, Kevin; Feng, Yanmei
To characterize inter- and intra-strain variability of variable-number tandem repeats (VNTRs) in Mycoplasma pneumoniae to determine the optimal multilocus VNTR analysis scheme for improved strain typing. Whole genome assemblies and next-generation sequencing data from diverse M. pneumoniae isolates were used to characterize VNTRs and their variability, and to compare the strain discriminability of new VNTR and existing markers. We identified 13 VNTRs including five reported previously. These VNTRs displayed different levels of inter- and intra-strain copy number variations. All new markers showed similar or higher discriminability compared with existing VNTR markers and the P1 typing system. Our study provides novel insights into VNTR variations and potential new multilocus VNTR analysis schemes for improved genotyping of M. pneumoniae.
Lucarelli, Marco; Porcaro, Luigi; Biffignandi, Alice; Costantino, Lucy; Giannone, Valentina; Alberti, Luisella; Bruno, Sabina Maria; Corbetta, Carlo; Torresani, Erminio; Colombo, Carla; Seia, Manuela
Searching for mutations in the cystic fibrosis transmembrane conductance regulator gene (CFTR) is a key step in the diagnosis of and neonatal and carrier screening for cystic fibrosis (CF), and it has implications for prognosis and personalized therapy. The large number of mutations and genetic and phenotypic variability make this search a complex task. Herein, we developed, validated, and tested a laboratory assay for an extended search for mutations in CFTR using a next-generation sequencing-based method, with a panel of 188 CFTR mutations customized for the Italian population. Overall, 1426 dried blood spots from neonatal screening, 402 genomic DNA samples from various origins, and 1138 genomic DNA samples from patients with CF were analyzed. The assay showed excellent analytical and diagnostic operative characteristics. We identified and experimentally validated 159 (of 188) CFTR mutations. The assay achieved detection rates of 95.0% and 95.6% in two large-scale case series of CF patients from central and northern Italy, respectively. These detection rates are among the highest reported so far with a genetic test for CF based on a mutation panel. This assay appears to be well suited for diagnostics, neonatal and carrier screening, and assisted reproduction, and it represents a considerable advantage in CF genetic counseling. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Tillmar, Andreas O.; Dell'Amico, Barbara; Welander, Jenny; Holmlund, Gunilla
Species identification can be interesting in a wide range of areas, for example, in forensic applications, food monitoring and in archeology. The vast majority of existing DNA typing methods developed for species determination, mainly focuses on a single species source. There are, however, many instances where all species from mixed sources need to be determined, even when the species in minority constitutes less than 1 % of the sample. The introduction of next generation sequencing opens new possibilities for such challenging samples. In this study we present a universal deep sequencing method using 454 GS Junior sequencing of a target on the mitochondrial gene 16S rRNA. The method was designed through phylogenetic analyses of DNA reference sequences from more than 300 mammal species. Experiments were performed on artificial species-species mixture samples in order to verify the method’s robustness and its ability to detect all species within a mixture. The method was also tested on samples from authentic forensic casework. The results showed to be promising, discriminating over 99.9 % of mammal species and the ability to detect multiple donors within a mixture and also to detect minor components as low as 1 % of a mixed sample. PMID:24358309
Mihovska, Albena D.; Anggorojati, Bayu; Luo, Jijun
This paper describes a load-dependent multi-stage admission control suitable for next generation systems. The concept uses decision polling in entities located at different levels of the architecture hierarchy and based on the load to activate a sequence of actions related to the admission...
Full Text Available The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43% in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97 and lower for avian species (0.70. PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.
Full Text Available Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.
Full Text Available Abstract Background The rapid adoption of next-generation sequencing provides an efficient system for detecting somatic alterations in neoplasms. The detection of such alterations requires a matched non-neoplastic sample for adequate filtering of non-somatic events such as germline polymorphisms. Non-neoplastic tissue adjacent to the excised neoplasm is often used for this purpose as it is simultaneously collected and generally contains the same tissue type as the neoplasm. Following NGS analysis, we and others have frequently observed low-level somatic mutations in these non-neoplastic tissues, which may impose additional challenges to somatic mutation detection as it complicates germline variant filtering. Methods We hypothesized that the low-level somatic mutation observed in non-neoplastic tissues may be entirely or partially caused by inadvertent contamination by neoplastic cells during the surgical pathology gross assessment or tissue procurement process. To test this hypothesis, we applied a systematic protocol designed to collect multiple grossly non-neoplastic tissues using different methods surrounding each single neoplasm. The procedure was applied in two breast cancer lumpectomy specimens. In each case, all samples were first sequenced by whole-exome sequencing to identify somatic mutations in the neoplasm and determine their presence in the adjacent non-neoplastic tissues. We then generated ultra-deep coverage using targeted sequencing to assess the levels of contamination in non-neoplastic tissue samples collected under different conditions. Results Contamination levels in non-neoplastic tissues ranged up to 3.5 and 20.9 % respectively in the two cases tested, with consistent pattern correlated with the manner of grossing and procurement. By carefully controlling the conditions of various steps during this process, we were able to eliminate any detectable contamination in both patients. Conclusion The results demonstrated that the
Pan, Luyuan; Shah, Arish N; Phelps, Ian G; Doherty, Dan; Johnson, Eric A; Moens, Cecilia B
Targeting Induced Local Lesions IN Genomes (TILLING) is a reverse genetics approach to directly identify point mutations in specific genes of interest in genomic DNA from a large chemically mutagenized population. Classical TILLING processes, based on enzymatic detection of mutations in heteroduplex PCR amplicons, are slow and labor intensive. Here we describe a new TILLING strategy in zebrafish using direct next generation sequencing (NGS) of 250 bp amplicons followed by Paired-End Low-Error (PELE) sequence analysis. By pooling a genomic DNA library made from over 9,000 N-ethyl-N-nitrosourea (ENU) mutagenized F1 fish into 32 equal pools of 288 fish, each with a unique Illumina barcode, we reduce the complexity of the template to a level at which we can detect mutations that occur in a single heterozygous fish in the entire library. MiSeq sequencing generates 250 base-pair overlapping paired-end reads, and PELE analysis aligns the overlapping sequences to each other and filters out any imperfect matches, thereby eliminating variants introduced during the sequencing process. We find that this filtering step reduces the number of false positive calls 50-fold without loss of true variant calls. After PELE we were able to validate 61.5% of the mutant calls that occurred at a frequency between 1 mutant call:100 wildtype calls and 1 mutant call:1000 wildtype calls in a pool of 288 fish. We then use high-resolution melt analysis to identify the single heterozygous mutation carrier in the 288-fish pool in which the mutation was identified. Using this NGS-TILLING protocol we validated 28 nonsense or splice site mutations in 20 genes, at a two-fold higher efficiency than using traditional Cel1 screening. We conclude that this approach significantly increases screening efficiency and accuracy at reduced cost and can be applied in a wide range of organisms.
Lee, Yujung; Kim, Changshin; Park, YoungJoon; Pyun, Jung-A; Kwack, KyuBum
Premature ovarian failure (POF) is characterized by heterogeneous genetic causes such as chromosomal abnormalities and variants in causal genes. Recently, development of techniques made next generation sequencing (NGS) possible to detect genome wide variants including chromosomal abnormalities. Among 37 Korean POF patients, XY karyotype with distal part deletions of Y chromosome, Yp11.32-31 and Yp12 end part, was observed in two patients through NGS. Six deleterious variants in POF genes were also detected which might explain the pathogenesis of POF with abnormalities in the sex chromosomes. Additionally, the two POF patients had no mutation in SRY but three non-synonymous variants were detected in genes regarding sex reversal. These findings suggest candidate causes of POF and sex reversal and show the propriety of NGS to approach the heterogeneous pathogenesis of POF. Copyright © 2016 Elsevier Inc. All rights reserved.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
In this study, the complete mitogenome sequence of a cryptic species from East Australia (Mugil sp. H) belonging to the worldwide Mugil cephalus species complex (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,845 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop consists of 1067 bp length, and is located between tRNA-Pro and tRNA-Phe. The overall base composition of East Australia M. cephalus is 28.4% for A, 29.3% for C, 15.4% for G and 26.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Fior, Simone; Li, Mingai; Oxelman, Bengt; Viola, Roberto; Hodges, Scott A; Ometto, Lino; Varotto, Claudio
Aquilegia is a well-known model system in the field of evolutionary biology, but obtaining a resolved and well-supported phylogenetic reconstruction for the genus has been hindered by its recent and rapid diversification. Here, we applied 454 next-generation sequencing to PCR amplicons of 21 of the most rapidly evolving regions of the plastome to generate c. 24 kb of sequences from each of 84 individuals from throughout the genus. The resulting phylogeny has well-supported resolution of the main lineages of the genus, although recent diversification such as in the European taxa remains unresolved. By producing a chronogram of the whole Ranunculaceae family based on published data, we inferred calibration points for dating the Aquilegia radiation. The genus originated in the upper Miocene c. 6.9 million yr ago (Ma) in Eastern Asia, and diversification occurred c. 4.8 Ma with the split of two main clades, one colonizing North America, and the other Western Eurasia through the mountains of Central Asia. This was followed by a back-to-Asia migration, originating from the European stock using a North Asian route. These results provide the first backbone phylogeny and spatiotemporal reconstruction of the Aquilegia radiation, and constitute a robust framework to address the adaptative nature of speciation within the group. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Wong, Ka-Chun; Peng, Chengbin; Li, Yue
With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene's function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/∼wkc/FullSignalRanker/ © 2015 IEEE.
With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene\\'s function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins\\' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/∼wkc/FullSignalRanker/ © 2015 IEEE.
Full Text Available The human immune system is a fine network consisted of the innumerable numbers of functional cells that balance the immunity and tolerance against various endogenous and environmental challenges. Although advances in modern immunology have revealed a role of many unique immune cell subsets, technologies that enable us to capture the whole landscape of immune responses against specific antigens have been not available to date. Acquired immunity against various microorganisms including host microbiome is principally founded on T cell and B cell populations, each of which expresses antigen-specific receptors that define a unique clonotype. Over the past several years, high-throughput next-generation sequencing has been developed as a powerful tool to profile T- and B-cell receptor repertoires in a given individual at the single-cell level. Sophisticated immuno-bioinformatic analyses by use of this innovative methodology have been already implemented in clinical development of antibody engineering, vaccine design, and cellular immunotherapy. In this article, we aim to discuss the possible application of high-throughput immune receptor sequencing in the field of nutritional and intestinal immunology. Although there are still unsolved caveats, this emerging technology combined with single-cell transcriptomics/proteomics provides a critical tool to unveil the previously unrecognized principle of host–microbiome immune homeostasis. Accumulation of such knowledge will lead to the development of effective ways for personalized immune modulation through deeper understanding of the mechanisms by which the intestinal environment affects our immune ecosystem.
Lőrinc S Pongor
Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/
Koutsis, Georgios; Lynch, David S; Tucci, Arianna; Houlden, Henry; Karadima, Georgia; Panas, Marios
To present a Greek family in which 5 male and 2 female members developed progressive spastic paraplegia. Plasma very long chain fatty acids (VLCFA) were reportedly normal at first testing in an affected male and for over 30 years the presumed diagnosis was hereditary spastic paraplegia (HSP). Targeted next generation sequencing (NGS) was used as a further diagnostic tool. Targeted exome sequencing in the proband, followed by Sanger sequencing confirmation; mutation segregation testing in multiple family members and plasma VLCFA measurement in the proband. NGS of the proband revealed a novel frameshift mutation in ABCD1 (c.1174_1178del, p.Leu392Serfs*7), bringing an end to diagnostic uncertainty by establishing the diagnosis of adrenomyeloneuropathy (AMN), the myelopathic phenotype of X-linked adrenoleukodystrophy (ALD). The mutation segregated in all family members and the diagnosis of AMN/ALD was confirmed by plasma VLCFA measurement. Confounding factors that delayed the diagnosis are presented. This report highlights the diagnostic utility of NGS in patients with undiagnosed spastic paraplegia, establishing a molecular diagnosis of AMN, allowing proper genetic counseling and management, and overcoming the diagnostic delay that can be rarely caused by false negative VLCFA analysis. Copyright © 2015 Elsevier B.V. All rights reserved.
Maddock, Simon T; Briscoe, Andrew G; Wilkinson, Mark; Waeschenbach, Andrea; San Mauro, Diego; Day, Julia J; Littlewood, D Tim J; Foster, Peter G; Nussbaum, Ronald A; Gower, David J
Mitochondrial genome (mitogenome) sequences are being generated with increasing speed due to the advances of next-generation sequencing (NGS) technology and associated analytical tools. However, detailed comparisons to explore the utility of alternative NGS approaches applied to the same taxa have not been undertaken. We compared a 'traditional' Sanger sequencing method with two NGS approaches (shotgun sequencing and non-indexed, multiplex amplicon sequencing) on four different sequencing platforms (Illumina's HiSeq and MiSeq, Roche's 454 GS FLX, and Life Technologies' Ion Torrent) to produce seven (near-) complete mitogenomes from six species that form a small radiation of caecilian amphibians from the Seychelles. The fastest, most accurate method of obtaining mitogenome sequences that we tested was direct sequencing of genomic DNA (shotgun sequencing) using the MiSeq platform. Bayesian inference and maximum likelihood analyses using seven different partitioning strategies were unable to resolve compellingly all phylogenetic relationships among the Seychelles caecilian species, indicating the need for additional data in this case.
Simon T Maddock
Full Text Available Mitochondrial genome (mitogenome sequences are being generated with increasing speed due to the advances of next-generation sequencing (NGS technology and associated analytical tools. However, detailed comparisons to explore the utility of alternative NGS approaches applied to the same taxa have not been undertaken. We compared a 'traditional' Sanger sequencing method with two NGS approaches (shotgun sequencing and non-indexed, multiplex amplicon sequencing on four different sequencing platforms (Illumina's HiSeq and MiSeq, Roche's 454 GS FLX, and Life Technologies' Ion Torrent to produce seven (near- complete mitogenomes from six species that form a small radiation of caecilian amphibians from the Seychelles. The fastest, most accurate method of obtaining mitogenome sequences that we tested was direct sequencing of genomic DNA (shotgun sequencing using the MiSeq platform. Bayesian inference and maximum likelihood analyses using seven different partitioning strategies were unable to resolve compellingly all phylogenetic relationships among the Seychelles caecilian species, indicating the need for additional data in this case.
Full Text Available Abstract Background Chronic lymphocytic leukemia (CLL is a highly genetically heterogeneous disease. Although CLL has been traditionally considered as a mature B cell leukemia, few independent studies have shown that the genetic alterations may appear in CD34+ hematopoietic progenitors. However, the presence of both chromosomal aberrations and gene mutations in CD34+ cells from the same patients has not been explored. Methods Amplicon-based deep next-generation sequencing (NGS studies were carried out in magnetically activated-cell-sorting separated CD19+ mature B lymphocytes and CD34+ hematopoietic progenitors (n = 56 to study the mutational status of TP53, NOTCH1, SF3B1, FBXW7, MYD88, and XPO1 genes. In addition, ultra-deep NGS was performed in a subset of seven patients to determine the presence of mutations in flow-sorted CD34+CD19− early hematopoietic progenitors. Fluorescence in situ hybridization (FISH studies were performed in the CD34+ cells from nine patients of the cohort to examine the presence of cytogenetic abnormalities. Results NGS studies revealed a total of 28 mutations in 24 CLL patients. Interestingly, 15 of them also showed the same mutations in their corresponding whole population of CD34+ progenitors. The majority of NOTCH1 (7/9 and XPO1 (4/4 mutations presented a similar mutational burden in both cell fractions; by contrast, mutations of TP53 (2/2, FBXW7 (2/2, and SF3B1 (3/4 showed lower mutational allele frequencies, or even none, in the CD34+ cells compared with the CD19+ population. Ultra-deep NGS confirmed the presence of FBXW7, MYD88, NOTCH1, and XPO1 mutations in the subpopulation of CD34+CD19− early hematopoietic progenitors (6/7. Furthermore, FISH studies showed the presence of 11q and 13q deletions (2/2 and 3/5, respectively in CD34+ progenitors but the absence of IGH cytogenetic alterations (0/2 in the CD34+ cells. Combining all the results from NGS and FISH, a model of the appearance and expansion of
Nilyanimit, Pornjarim; Chansaenroj, Jira; Poomipak, Witthaya; Praianantathavorn, Kesmanee; Payungporn, Sunchai; Poovorawan, Yong
Human papillomavirus (HPV) infection causes cervical cancer, thus necessitating early detection by screening. Rapid and accurate HPV genotyping is crucial both for the assessment of patients with HPV infection and for surveillance studies. Fifty-eight cervicovaginal samples were tested for HPV genotypes using four methods in parallel: nested-PCR followed by conventional sequencing, INNO-LiPA, electrochemical DNA chip, and next-generation sequencing (NGS). Seven HPV genotypes (16, 18, 31, 33, 45, 56, and 58) were identified by all four methods. Nineteen HPV genotypes were detected by NGS, but not by nested-PCR, INNO-LiPA, or electrochemical DNA chip. Although NGS is relatively expensive and complex, it may serve as a sensitive HPV genotyping method. Because of its highly sensitive detection of multiple HPV genotypes, NGS may serve as an alternative for diagnostic HPV genotyping in certain situations. © The Korean Society for Laboratory Medicine
Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K; Strug, Lisa J
Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants. An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. email@example.com Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Okumura, Kayo; Kato, Masako; Kirikae, Teruo; Kayano, Mitsunori; Miyoshi-Akiyama, Tohru
Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses. The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available. These results indicated that virtual CS
Full Text Available Worldwide, avian communities inhabiting agro-ecosystems are threatened as a consequence of agricultural intensification. Unravelling their ecological role is essential to focus conservation efforts. Dietary analysis can elucidate bird-insect interactions and expose avian pest-reduction services, thus supporting avian conservation. In this study, we used next-generation sequencing to analyse the dietary arthropod contents of 11 sympatric bird species foraging in macadamia orchards in eastern Australia. Across all species and based on arthropod DNA sequence similarities ≥98% with records in the Barcode of Life Database, 257 operational taxonomy units were assigned to 8 orders, 40 families, 90 genera and 89 species. These taxa included 15 insect pests, 5 of which were macadamia pests. Among the latter group, Nezara viridula (Pentatomidae; green vegetable bug, considered a major pest, was present in 23% of all faecal samples collected. Results also showed that resource partitioning in this system is low, as most bird species shared large proportion of their diets by feeding primarily on lepidopteran, dipteran and arachnids. Dietary composition differed between some species, most likely because of differences in foraging behaviour. Overall, this study reached a level of taxonomic resolution never achieved before in the studied species, thus contributing to a significant improvement in the avian ecological knowledge. Our results showed that bird communities prey upon economically important pests in macadamia orchards. This study set a precedent by exploring avian pest-reduction services using next-generation sequencing, which could contribute to the conservation of avian communities and their natural habitats in agricultural systems.
Crisol-Martínez, Eduardo; Moreno-Moyano, Laura T; Wormington, Kevin R; Brown, Philip H; Stanley, Dragana
Worldwide, avian communities inhabiting agro-ecosystems are threatened as a consequence of agricultural intensification. Unravelling their ecological role is essential to focus conservation efforts. Dietary analysis can elucidate bird-insect interactions and expose avian pest-reduction services, thus supporting avian conservation. In this study, we used next-generation sequencing to analyse the dietary arthropod contents of 11 sympatric bird species foraging in macadamia orchards in eastern Australia. Across all species and based on arthropod DNA sequence similarities ≥98% with records in the Barcode of Life Database, 257 operational taxonomy units were assigned to 8 orders, 40 families, 90 genera and 89 species. These taxa included 15 insect pests, 5 of which were macadamia pests. Among the latter group, Nezara viridula (Pentatomidae; green vegetable bug), considered a major pest, was present in 23% of all faecal samples collected. Results also showed that resource partitioning in this system is low, as most bird species shared large proportion of their diets by feeding primarily on lepidopteran, dipteran and arachnids. Dietary composition differed between some species, most likely because of differences in foraging behaviour. Overall, this study reached a level of taxonomic resolution never achieved before in the studied species, thus contributing to a significant improvement in the avian ecological knowledge. Our results showed that bird communities prey upon economically important pests in macadamia orchards. This study set a precedent by exploring avian pest-reduction services using next-generation sequencing, which could contribute to the conservation of avian communities and their natural habitats in agricultural systems.
Dilara Fatma Akin
Oct 1, 2015 ... sarcoma viral oncogene homolog (KRAS), and Casitas B-cell ... AML by screening hot-spot exons of TET2, KRAS, and CBL using Next Generation Sequencing ... Methods: Eight patients who were diagnosed with pediatric AML at Losante ..... mutations in pre-leukemic stem cells in acute myeloid leukemia.