WorldWideScience

Sample records for accurate bisulfite sequencing

  1. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  2. Kismeth: Analyzer of plant methylation states through bisulfite sequencing

    Directory of Open Access Journals (Sweden)

    Martienssen Robert A

    2008-09-01

    Full Text Available Abstract Background There is great interest in probing the temporal and spatial patterns of cytosine methylation states in genomes of a variety of organisms. It is hoped that this will shed light on the biological roles of DNA methylation in the epigenetic control of gene expression. Bisulfite sequencing refers to the treatment of isolated DNA with sodium bisulfite to convert unmethylated cytosine to uracil, with PCR converting the uracil to thymidine followed by sequencing of the resultant DNA to detect DNA methylation. For the study of DNA methylation, plants provide an excellent model system, since they can tolerate major changes in their DNA methylation patterns and have long been studied for the effects of DNA methylation on transposons and epimutations. However, in contrast to the situation in animals, there aren't many tools that analyze bisulfite data in plants, which can exhibit methylation of cytosines in a variety of sequence contexts (CG, CHG, and CHH. Results Kismeth http://katahdin.mssm.edu/kismeth is a web-based tool for bisulfite sequencing analysis. Kismeth was designed to be used with plants, since it considers potential cytosine methylation in any sequence context (CG, CHG, and CHH. It provides a tool for the design of bisulfite primers as well as several tools for the analysis of the bisulfite sequencing results. Kismeth is not limited to data from plants, as it can be used with data from any species. Conclusion Kismeth simplifies bisulfite sequencing analysis. It is the only publicly available tool for the design of bisulfite primers for plants, and one of the few tools for the analysis of methylation patterns in plants. It facilitates analysis at both global and local scales, demonstrated in the examples cited in the text, allowing dissection of the genetic pathways involved in DNA methylation. Kismeth can also be used to study methylation states in different tissues and disease cells compared to a reference sequence.

  3. Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data.

    Science.gov (United States)

    Graña, Osvaldo; López-Fernández, Hugo; Fdez-Riverola, Florentino; González Pisano, David; Glez-Peña, Daniel

    2018-04-15

    High-throughput sequencing of bisulfite-converted DNA is a technique used to measure DNA methylation levels. Although a considerable number of computational pipelines have been developed to analyze such data, none of them tackles all the peculiarities of the analysis together, revealing limitations that can force the user to manually perform additional steps needed for a complete processing of the data. This article presents bicycle, an integrated, flexible analysis pipeline for bisulfite sequencing data. Bicycle analyzes whole genome bisulfite sequencing data, targeted bisulfite sequencing data and hydroxymethylation data. To show how bicycle overtakes other available pipelines, we compared them on a defined number of features that are summarized in a table. We also tested bicycle with both simulated and real datasets, to show its level of performance, and compared it to different state-of-the-art methylation analysis pipelines. Bicycle is publicly available under GNU LGPL v3.0 license at http://www.sing-group.org/bicycle. Users can also download a customized Ubuntu LiveCD including bicycle and other bisulfite sequencing data pipelines compared here. In addition, a docker image with bicycle and its dependencies, which allows a straightforward use of bicycle in any platform (e.g. Linux, OS X or Windows), is also available. ograna@cnio.es or dgpena@uvigo.es. Supplementary data are available at Bioinformatics online.

  4. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    Science.gov (United States)

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  5. A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

    Science.gov (United States)

    Lea, Amanda J.

    2015-01-01

    Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html. PMID:26599596

  6. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments

    Science.gov (United States)

    2014-01-01

    Background Whole-genome bisulfite sequencing currently provides the highest-precision view of the epigenome, with quantitative information about populations of cells down to single nucleotide resolution. Several studies have demonstrated the value of this precision: meaningful features that correlate strongly with biological functions can be found associated with only a few CpG sites. Understanding the role of DNA methylation, and more broadly the role of DNA accessibility, requires that methylation differences between populations of cells are identified with extreme precision and in complex experimental designs. Results In this work we investigated the use of beta-binomial regression as a general approach for modeling whole-genome bisulfite data to identify differentially methylated sites and genomic intervals. Conclusions The regression-based analysis can handle medium- and large-scale experiments where it becomes critical to accurately model variation in methylation levels between replicates and account for influence of various experimental factors like cell types or batch effects. PMID:24962134

  7. Bisulfite sequencing reveals that Aspergillus flavus holds a hollow in DNA methylation.

    Directory of Open Access Journals (Sweden)

    Si-Yang Liu

    Full Text Available Aspergillus flavus first gained scientific attention for its production of aflatoxin. The underlying regulation of aflatoxin biosynthesis has been serving as a theoretical model for biosynthesis of other microbial secondary metabolites. Nevertheless, for several decades, the DNA methylation status, one of the important epigenomic modifications involved in gene regulation, in A. flavus remains to be controversial. Here, we applied bisulfite sequencing in conjunction with a biological replicate strategy to investigate the DNA methylation profiling of A. flavus genome. Both the bisulfite sequencing data and the methylome comparisons with other fungi confirm that the DNA methylation level of this fungus is negligible. Further investigation into the DNA methyltransferase of Aspergillus uncovers its close relationship with RID-like enzymes as well as its divergence with the methyltransferase of species with validated DNA methylation. The lack of repeat contents of the A. flavus' genome and the high RIP-index of the small amount of remanent repeat potentially support our speculation that DNA methylation may be absent in A. flavus or that it may possess de novo DNA methylation which occurs very transiently during the obscure sexual stage of this fungal species. This work contributes to our understanding on the DNA methylation status of A. flavus, as well as reinforces our views on the DNA methylation in fungal species. In addition, our strategy of applying bisulfite sequencing to DNA methylation detection in species with low DNA methylation may serve as a reference for later scientific investigations in other hypomethylated species.

  8. Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools

    Directory of Open Access Journals (Sweden)

    Hong Tran

    2014-01-01

    Full Text Available Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries. We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes, usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.

  9. Genome-wide DNA methylation maps in follicular lymphoma cells determined by methylation-enriched bisulfite sequencing.

    Directory of Open Access Journals (Sweden)

    Jeong-Hyeon Choi

    Full Text Available BACKGROUND: Follicular lymphoma (FL is a form of non-Hodgkin's lymphoma (NHL that arises from germinal center (GC B-cells. Despite the significant advances in immunotherapy, FL is still not curable. Beyond transcriptional profiling and genomics datasets, there currently is no epigenome-scale dataset or integrative biology approach that can adequately model this disease and therefore identify novel mechanisms and targets for successful prevention and treatment of FL. METHODOLOGY/PRINCIPAL FINDINGS: We performed methylation-enriched genome-wide bisulfite sequencing of FL cells and normal CD19(+ B-cells using 454 sequencing technology. The methylated DNA fragments were enriched with methyl-binding proteins, treated with bisulfite, and sequenced using the Roche-454 GS FLX sequencer. The total number of bases covered in the human genome was 18.2 and 49.3 million including 726,003 and 1.3 million CpGs in FL and CD19(+ B-cells, respectively. 11,971 and 7,882 methylated regions of interest (MRIs were identified respectively. The genome-wide distribution of these MRIs displayed significant differences between FL and normal B-cells. A reverse trend in the distribution of MRIs between the promoter and the gene body was observed in FL and CD19(+ B-cells. The MRIs identified in FL cells also correlated well with transcriptomic data and ChIP-on-Chip analyses of genome-wide histone modifications such as tri-methyl-H3K27, and tri-methyl-H3K4, indicating a concerted epigenetic alteration in FL cells. CONCLUSIONS/SIGNIFICANCE: This study is the first to provide a large scale and comprehensive analysis of the DNA methylation sequence composition and distribution in the FL epigenome. These integrated approaches have led to the discovery of novel and frequent targets of aberrant epigenetic alterations. The genome-wide bisulfite sequencing approach developed here can be a useful tool for profiling DNA methylation in clinical samples.

  10. Detecting differential DNA methylation from sequencing of bisulfite converted DNA of diverse species.

    Science.gov (United States)

    Huh, Iksoo; Wu, Xin; Park, Taesung; Yi, Soojin V

    2017-07-21

    DNA methylation is one of the most extensively studied epigenetic modifications of genomic DNA. In recent years, sequencing of bisulfite-converted DNA, particularly via next-generation sequencing technologies, has become a widely popular method to study DNA methylation. This method can be readily applied to a variety of species, dramatically expanding the scope of DNA methylation studies beyond the traditionally studied human and mouse systems. In parallel to the increasing wealth of genomic methylation profiles, many statistical tools have been developed to detect differentially methylated loci (DMLs) or differentially methylated regions (DMRs) between biological conditions. We discuss and summarize several key properties of currently available tools to detect DMLs and DMRs from sequencing of bisulfite-converted DNA. However, the majority of the statistical tools developed for DML/DMR analyses have been validated using only mammalian data sets, and less priority has been placed on the analyses of invertebrate or plant DNA methylation data. We demonstrate that genomic methylation profiles of non-mammalian species are often highly distinct from those of mammalian species using examples of honey bees and humans. We then discuss how such differences in data properties may affect statistical analyses. Based on these differences, we provide three specific recommendations to improve the power and accuracy of DML and DMR analyses of invertebrate data when using currently available statistical tools. These considerations should facilitate systematic and robust analyses of DNA methylation from diverse species, thus advancing our understanding of DNA methylation. © The Author 2017. Published by Oxford University Press.

  11. GPU-BSM: a GPU-based tool to map bisulfite-treated reads.

    Directory of Open Access Journals (Sweden)

    Andrea Manconi

    Full Text Available Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads.

  12. 21 CFR 573.620 - Menadione dimethylpyrimidinol bisulfite.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 6 2010-04-01 2010-04-01 false Menadione dimethylpyrimidinol bisulfite. 573.620... ANIMALS Food Additive Listing § 573.620 Menadione dimethylpyrimidinol bisulfite. The food additive, menadione dimethylpyrimidinol bisulfite, may be safely used in accordance with the following conditions: (a...

  13. Performances of Different Fragment Sizes for Reduced Representation Bisulfite Sequencing in Pigs.

    Science.gov (United States)

    Yuan, Xiao-Long; Zhang, Zhe; Pan, Rong-Yang; Gao, Ning; Deng, Xi; Li, Bin; Zhang, Hao; Sangild, Per Torp; Li, Jia-Qi

    2017-01-01

    Reduced representation bisulfite sequencing (RRBS) has been widely used to profile genome-scale DNA methylation in mammalian genomes. However, the applications and technical performances of RRBS with different fragment sizes have not been systematically reported in pigs, which serve as one of the important biomedical models for humans. The aims of this study were to evaluate capacities of RRBS libraries with different fragment sizes to characterize the porcine genome. We found that the Msp I-digested segments between 40 and 220 bp harbored a high distribution peak at 74 bp, which were highly overlapped with the repetitive elements and might reduce the unique mapping alignment. The RRBS library of 110-220 bp fragment size had the highest unique mapping alignment and the lowest multiple alignment. The cost-effectiveness of the 40-110 bp, 110-220 bp and 40-220 bp fragment sizes might decrease when the dataset size was more than 70, 50 and 110 million reads for these three fragment sizes, respectively. Given a 50-million dataset size, the average sequencing depth of the detected CpG sites in the 110-220 bp fragment size appeared to be deeper than in the 40-110 bp and 40-220 bp fragment sizes, and these detected CpG sties differently located in gene- and CpG island-related regions. In this study, our results demonstrated that selections of fragment sizes could affect the numbers and sequencing depth of detected CpG sites as well as the cost-efficiency. No single solution of RRBS is optimal in all circumstances for investigating genome-scale DNA methylation. This work provides the useful knowledge on designing and executing RRBS for investigating the genome-wide DNA methylation in tissues from pigs.

  14. Shotgun Bisulfite Sequencing of the Betula platyphylla Genome Reveals the Tree’s DNA Methylation Patterning

    Directory of Open Access Journals (Sweden)

    Chang Su

    2014-12-01

    Full Text Available DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees.

  15. 21 CFR 573.625 - Menadione nicotinamide bisulfite.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 6 2010-04-01 2010-04-01 false Menadione nicotinamide bisulfite. 573.625 Section 573.625 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... ANIMALS Food Additive Listing § 573.625 Menadione nicotinamide bisulfite. The food additive may be safely...

  16. Inhibition of phospholipase A2 from human plasma by sodium bisulfite

    International Nuclear Information System (INIS)

    Wiggins, C.W.; Franson, R.C.

    1987-01-01

    The anti-oxidant sodium bisulfite has been shown to inhibit acid active(lysosomal), non-Ca ++ -dependent phospholipase A 2 (PLA 2 ), and to interact reversibly with unsaturated fatty acids, altering their chromatographic mobility. The authors examined the effect of bisulfite on neutral active, Ca ++ -dependent PLA 2 from human plasma. Using [1- 14 C]oleate-labelled autoclaved E. coli as substrate, PLA 2 activity was inhibited in a dose-dependent manner by bisulfite. Maximal inhibition occurred at 100μM bisulfite. Preincubation of plasma for 0-30 minutes with bisulfite resulted in a time-dependent increase in PLA 2 inhibition. Preincubation of substrate with bisulfite had no such effect. When the plasma PLA 2 was purified 25-fold by SP-Sephadex chromatography it was no longer inhibited by bisulfite. The SP-Sephadex wash through fraction, which contained greater than 95% of the applied protein but not PLA 2 activity, did not inhibit the purified enzyme. When incubated with bisulfite however, the SP-wash through fraction produced dose-dependent inhibition of the purified enzyme. These results indicate that sodium bisulfite inhibits human plasma PLA 2 , in vitro, indirectly by interaction with a factor(s) present in plasma and suggests that anti-oxidants may similarly influence expression of extracellular PLA 2 in vivo

  17. Limitations of retarded (bisulfite) x-ray film processing

    International Nuclear Information System (INIS)

    Stoering, J.P.; Dittmore, C.

    1979-01-01

    We demonstrate the limitations of using retarded (bisulfite) developer to abate film sensitivity of x-ray films that have been exposed to intense radiation. We compared the measured densities of a large number of Kodak Type-M x-ray film samples exposed to a known fluence of monochromatic x-rays. These film samples were processed in three separate batches of bisulfite developer mixed in the same proportions. We concluded that reproducible film-density information cannot be obtained using different batches of (bisulfite) developer solutions

  18. Final report on the safety assessment of sodium sulfite, potassium sulfite, ammonium sulfite, sodium bisulfite, ammonium bisulfite, sodium metabisulfite and potassium metabisulfite.

    Science.gov (United States)

    Nair, Bindu; Elmore, Amy R

    2003-01-01

    Sodium Sulfite, Ammonium Sulfite, Sodium Bisulfite, Potassium Bisulfite, Ammonium Bisulfite, Sodium Metabisulfite, and Potassium Metabisulfite are inorganic salts that function as reducing agents in cosmetic formulations. All except Sodium Metabisulfite also function as hair-waving/straightening agents. In addition, Sodium Sulfite, Potassium Sulfite, Sodium Bisulfite, and Sodium Metabisulfite function as antioxidants. Although Ammonium Sulfite is not in current use, the others are widely used in hair care products. Sulfites that enter mammals via ingestion, inhalation, or injection are metabolized by sulfite oxidase to sulfate. In oral-dose animal toxicity studies, hyperplastic changes in the gastric mucosa were the most common findings at high doses. Ammonium Sulfite aerosol had an acute LC(50) of >400 mg/m(3) in guinea pigs. A single exposure to low concentrations of a Sodium Sulfite fine aerosol produced dose-related changes in the lung capacity parameters of guinea pigs. A 3-day exposure of rats to a Sodium Sulfite fine aerosol produced mild pulmonary edema and irritation of the tracheal epithelium. Severe epithelial changes were observed in dogs exposed for 290 days to 1 mg/m(3) of a Sodium Metabisulfite fine aerosol. These fine aerosols contained fine respirable particle sizes that are not found in cosmetic aerosols or pump sprays. None of the cosmetic product types, however, in which these ingredients are used are aerosolized. Sodium Bisulfite (tested at 38%) and Sodium Metabisulfite (undiluted) were not irritants to rabbits following occlusive exposures. Sodium Metabisulfite (tested at 50%) was irritating to guinea pigs following repeated exposure. In rats, Sodium Sulfite heptahydrate at large doses (up to 3.3 g/kg) produced fetal toxicity but not teratogenicity. Sodium Bisulfite, Sodium Metabisulfite, and Potassium Metabisulfite were not teratogenic for mice, rats, hamsters, or rabbits at doses up to 160 mg/kg. Generally, Sodium Sulfite, Sodium

  19. Mapping of the methylation pattern of the hMSH2 promoter in colon cancer, using bisulfite genomic sequencing

    Directory of Open Access Journals (Sweden)

    Zhang Hua

    2006-08-01

    Full Text Available Abstract The detailed methylation status of CpG sites in the promoter region of hMSH2 gene has yet not to be reported. We have mapped the complete methylation status of the hMSH2 promoter, a region that contains 75 CpG sites, using bisulfite genomic sequencing in 60 primary colorectal cancers. And the expression of hMSH2 was detected by immunohistochemistry. The hypermethylation of hMSH2 was detected in 18.33% (11/60 of tumor tissues. The protein of hMSH2 was detected in 41.67% (25/60 of tumor tissues. No hypermethylation of hMSH2 was detected in normal tissues. The protein of hMSH2 was detected in all normal tissues. Our study demonstrated that hMSH2 hypermethylation and protein expression were associated with the development of colorectal cancer.

  20. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    Science.gov (United States)

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of

  1. Bisulfite compounds as metabolic inhibitors: nonspecific effects on membranes

    Energy Technology Data Exchange (ETDEWEB)

    Luettge, U; Osmond, C B; Ball, E; Brinckmann, E; Kinze, G

    1972-01-01

    Bisulfite compounds are shown to be nonspecific inhibitors of photosynthetic processes and of ion transport in green tissues. CO/sub 2/ fixation and light-dependent transient changes in external pH are inhibited about 50% by 5 x 10/sup -4/M glyoxal-Na-bisulfite. Chloride uptake in the light and in the dark is inhibited to the same extent at this concentration. At 5 x 10/sup -3/M the inhibitor reduces ATP levels in the light and in the dark, and the effects on glycolate oxidase and PEP carboxylase are observed. The extent of inhibition is dependent on time of treatment with glyoxal-Na-bisulfite and freshly prepared NaHSO/sub 3/ is equally as effective as the addition compound. Possible explanations of the bisulfite effects and the relationships to SO/sub 2/ effects on photosynthesis are discussed.

  2. An effective colorimetric and ratiometric fluorescent probe for bisulfite in aqueous solution

    International Nuclear Information System (INIS)

    Dai, Xi; Zhang, Tao; Du, Zhi-Fang; Cao, Xiang-Jian; Chen, Ming-Yu; Hu, Sheng-Wen; Miao, Jun-Ying; Zhao, Bao-Xiang

    2015-01-01

    We have developed the first two-photon colorimetric and ratiometric fluorescent probe, BICO, for the detection of bisulfite (HSO 3 − ) in aqueous solution. The probe contains coumarin and benzimidazole moieties and can detect HSO 3 − based on the Michael addition reaction with a limit of detection 5.3 × 10 −8  M in phosphate-buffered saline solution. The probe was used to detect bisulfite in tap water, sugar and dry white wine. Moreover, test strips were made and used easily. We successfully applied the probe to image living cells, using one-photon fluorescence imaging. BICO overcomes the limitations in sensitivity of previously reported probes and the solvation effect of bisulfite, which demonstrates its excellent value in practical application. - Highlights: • A colorimetric and ratiometric fluorescent probe was developed. • The probe could detect bisulfite in PBS buffer solution and real samples. • Bisulfite test paper was made to naked-eye detect bisulfite. • This probe successfully used to living cell imaging in ratiometric manner

  3. An effective colorimetric and ratiometric fluorescent probe for bisulfite in aqueous solution

    Energy Technology Data Exchange (ETDEWEB)

    Dai, Xi [Institute of Organic Chemistry, School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100 (China); Zhang, Tao [Institute of Developmental Biology, School of Life Science, Shandong University, Jinan 250100 (China); Du, Zhi-Fang; Cao, Xiang-Jian; Chen, Ming-Yu [Institute of Organic Chemistry, School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100 (China); Taishan College, Shandong University, Jinan 250100 (China); Hu, Sheng-Wen [Institute of Organic Chemistry, School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100 (China); Miao, Jun-Ying, E-mail: miaojy@sdu.edu.cn [Institute of Developmental Biology, School of Life Science, Shandong University, Jinan 250100 (China); Zhao, Bao-Xiang, E-mail: bxzhao@sdu.edu.cn [Institute of Organic Chemistry, School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100 (China)

    2015-08-12

    We have developed the first two-photon colorimetric and ratiometric fluorescent probe, BICO, for the detection of bisulfite (HSO{sub 3}{sup −}) in aqueous solution. The probe contains coumarin and benzimidazole moieties and can detect HSO{sub 3}{sup −} based on the Michael addition reaction with a limit of detection 5.3 × 10{sup −8} M in phosphate-buffered saline solution. The probe was used to detect bisulfite in tap water, sugar and dry white wine. Moreover, test strips were made and used easily. We successfully applied the probe to image living cells, using one-photon fluorescence imaging. BICO overcomes the limitations in sensitivity of previously reported probes and the solvation effect of bisulfite, which demonstrates its excellent value in practical application. - Highlights: • A colorimetric and ratiometric fluorescent probe was developed. • The probe could detect bisulfite in PBS buffer solution and real samples. • Bisulfite test paper was made to naked-eye detect bisulfite. • This probe successfully used to living cell imaging in ratiometric manner.

  4. Kinetics of oxygen exchange between bisulfite ion and water as studied by oxygen-17 nuclear magnetic resonance spectroscopy

    International Nuclear Information System (INIS)

    Horner, D.A.

    1984-08-01

    The nuclear magnetic relaxation times of oxygen-17 have been measured in aqueous sodium bisulfite solutions in the pH range from 2.5 to 5 as a function of temperature, pH, and S(IV) concentration, at an ionic strength of 1.0 m. The rate law for oxygen exchange between bisulfite ion and water was obtained from an analysis of the data, and is consistent with oxygen exchange occurring via the reaction SO 2 + H 2 O right reversible H + + SHO 3 - . The value of k/sub -1/ is in agreement with relaxation measurements. Direct spectroscopic evidence was found for the existence of two isomers of bisulfite ion: one with the proton bonded to the sulfur (HSO 3 - ) and the other with the proton bonded to an oxygen (SO 3 H - ). (The symbol SHO 3 - in the above chemical equation refers to both isomeric forms of bisulfite ion.) The relative amounts of the two isomers were determined as a function of temperature, and the rate and mechanism of oxygen exchange between the two was investigated. One of the two isomers, presumably SO 3 H - , exchanges oxygens with water much more rapidly than does the other. A two-pulse sequence was developed which greatly diminished the solvent peak in the NMR spectrum

  5. Development of a Method to Implement Whole-Genome Bisulfite Sequencing of cfDNA from Cancer Patients and a Mouse Tumor Model

    Directory of Open Access Journals (Sweden)

    Elaine C. Maggi

    2018-01-01

    Full Text Available The goal of this study was to develop a method for whole genome cell-free DNA (cfDNA methylation analysis in humans and mice with the ultimate goal to facilitate the identification of tumor derived DNA methylation changes in the blood. Plasma or serum from patients with pancreatic neuroendocrine tumors or lung cancer, and plasma from a murine model of pancreatic adenocarcinoma was used to develop a protocol for cfDNA isolation, library preparation and whole-genome bisulfite sequencing of ultra low quantities of cfDNA, including tumor-specific DNA. The protocol developed produced high quality libraries consistently generating a conversion rate >98% that will be applicable for the analysis of human and mouse plasma or serum to detect tumor-derived changes in DNA methylation.

  6. Exploring the relationship between sequence similarity and accurate phylogenetic trees.

    Science.gov (United States)

    Cantarel, Brandi L; Morrison, Hilary G; Pearson, William

    2006-11-01

    We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not

  7. An optimized rapid bisulfite conversion method with high recovery of cell-free DNA.

    Science.gov (United States)

    Yi, Shaohua; Long, Fei; Cheng, Juanbo; Huang, Daixin

    2017-12-19

    Methylation analysis of cell-free DNA is a encouraging tool for tumor diagnosis, monitoring and prognosis. Sensitivity of methylation analysis is a very important matter due to the tiny amounts of cell-free DNA available in plasma. Most current methods of DNA methylation analysis are based on the difference of bisulfite-mediated deamination of cytosine between cytosine and 5-methylcytosine. However, the recovery of bisulfite-converted DNA based on current methods is very poor for the methylation analysis of cell-free DNA. We optimized a rapid method for the crucial steps of bisulfite conversion with high recovery of cell-free DNA. A rapid deamination step and alkaline desulfonation was combined with the purification of DNA on a silica column. The conversion efficiency and recovery of bisulfite-treated DNA was investigated by the droplet digital PCR. The optimization of the reaction results in complete cytosine conversion in 30 min at 70 °C and about 65% of recovery of bisulfite-treated cell-free DNA, which is higher than current methods. The method allows high recovery from low levels of bisulfite-treated cell-free DNA, enhancing the analysis sensitivity of methylation detection from cell-free DNA.

  8. A colorimetric and fluorogenic probe for bisulfite using benzopyrylium as the recognition unit.

    Science.gov (United States)

    Zhang, Yun; Zhang, Xiangwen; Yang, Xiao-Feng; Zhang, Juan

    2017-11-01

    A coumarin-benzopyrylium (CB) platform has been developed for the colorimetric and fluorogenic detection of bisulfite. The proposed probe utilizes coumarin as the fluorophore and positively charged benzopyrylium as the reaction site. The method employs the nucleophilic addition of bisulfite to the benzopyrylium moiety of CB to inactivate the electron-deficient oxonium ion. The driving force for photo-induced electron transfer is considerably diminished, thereby promoting the emission intensity of the coumarin fluorophore. The fluorescence intensity at 510 nm is linear with bisulfite concentration over a range of 0.2-7.5 μM with a detection limit of 42 nM (3δ). CB shows a rapid response (within 30 s) and high selectivity and sensitivity for bisulfite. Preliminary studies show that CB has great potential for bisulfite detection in real samples and in living cells. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Disclosing bias in bisulfite assay: MethPrimers underestimate high DNA methylation.

    Directory of Open Access Journals (Sweden)

    Andrea Fuso

    Full Text Available Discordant results obtained in bisulfite assays using MethPrimers (PCR primers designed using MethPrimer software or assuming that non-CpGs cytosines are non methylated versus primers insensitive to cytosine methylation lead us to hypothesize a technical bias. We therefore used the two kinds of primers to study different experimental models and methylation statuses. We demonstrated that MethPrimers negatively select hypermethylated DNA sequences in the PCR step of the bisulfite assay, resulting in CpG methylation underestimation and non-CpG methylation masking, failing to evidence differential methylation statuses. We also describe the characteristics of "Methylation-Insensitive Primers" (MIPs, having degenerated bases (G/A to cope with the uncertain C/U conversion. As CpG and non-CpG DNA methylation patterns are largely variable depending on the species, developmental stage, tissue and cell type, a variable extent of the bias is expected. The more the methylome is methylated, the greater is the extent of the bias, with a prevalent effect of non-CpG methylation. These findings suggest a revision of several DNA methylation patterns so far documented and also point out the necessity of applying unbiased analyses to the increasing number of epigenomic studies.

  10. CpG location and methylation level are crucial factors for the early detection of oral squamous cell carcinoma in brushing samples using bisulfite sequencing of a 13-gene panel.

    Science.gov (United States)

    Morandi, Luca; Gissi, Davide; Tarsitano, Achille; Asioli, Sofia; Gabusi, Andrea; Marchetti, Claudio; Montebugnoli, Lucio; Foschini, Maria Pia

    2017-01-01

    Oral squamous cell carcinoma (OSCC) is usually diagnosed at an advanced stage and is commonly preceded by oral premalignant lesions. The mortality rates have remained unchanged (50% within 5 years after diagnosis), and it is related to tobacco smoking and alcohol intake. Novel molecular markers for early diagnosis are urgently needed. The purpose of this study was to evaluate the diagnostic value of methylation level in a set of 18 genes by bisulfite next-generation sequencing. With minimally invasive oral brushing, 28 consecutive OSCC, one squamous cell carcinoma with sarcomatoid features, six high-grade squamous intraepithelial lesions (HGSIL), 30 normal contralateral mucosa from the same patients, and 65 healthy donors were evaluated for DNA methylation analyzing 18 target genes by quantitative bisulfite next-generation sequencing. We further evaluated an independent cohort (validation dataset) made of 20 normal donors, one oral fibroma, 14 oral lichen planus (OLP), three proliferative verrucous leukoplakia (PVL), and two OSCC. Comparing OSCC with normal healthy donors and contralateral mucosa in 355 CpGs, we identified the following epigenetically altered genes: ZAP70 , ITGA4 , KIF1A , PARP15 , EPHX3 , NTM , LRRTM1 , FLI1 , MIR193 , LINC00599 , PAX1 , and MIR137HG showing hypermethylation and MIR296 , TERT , and GP1BB showing hypomethylation . The behavior of ZAP70 , GP1BB , H19 , EPHX3 , and MIR193 fluctuated among different interrogated CpGs. The gap between normal and OSCC samples remained mostly the same (Kruskal-Wallis P values < 0.05), but the absolute values changed conspicuously. ROC curve analysis identified the most informative CpGs, and we correctly stratified OSCC and HGSIL from normal donors using a multiclass linear discriminant analysis in a 13-gene panel (AUC 0.981). Only the OSCC with sarcomatoid features was negative. Three contralateral mucosa were positive, a sign of a possible field cancerization. Among imprinted genes, only MIR296

  11. Arbitrarily accurate twin composite π -pulse sequences

    Science.gov (United States)

    Torosov, Boyan T.; Vitanov, Nikolay V.

    2018-04-01

    We present three classes of symmetric broadband composite pulse sequences. The composite phases are given by analytic formulas (rational fractions of π ) valid for any number of constituent pulses. The transition probability is expressed by simple analytic formulas and the order of pulse area error compensation grows linearly with the number of pulses. Therefore, any desired compensation order can be produced by an appropriate composite sequence; in this sense, they are arbitrarily accurate. These composite pulses perform equally well as or better than previously published ones. Moreover, the current sequences are more flexible as they allow total pulse areas of arbitrary integer multiples of π .

  12. High glucose recovery from direct enzymatic hydrolysis of bisulfite-pretreatment on non-detoxified furfural residues.

    Science.gov (United States)

    Xing, Yang; Bu, Lingxi; Sun, Dafeng; Liu, Zhiping; Liu, Shijie; Jiang, Jianxin

    2015-10-01

    This study reports four schemes to pretreat wet furfural residues (FRs) with sodium bisulfite for production of fermentable sugar. The results showed that non-detoxified FRs (pH 2-3) had great potential to lower the cost of bioconversion. The optimal process was that unwashed FRs were first pretreated with bisulfite, and the whole slurry was then directly used for enzymatic hydrolysis. A maximum glucose yield of 99.4% was achieved from substrates pretreated with 0.1 g NaHSO3/g dry substrate (DS), at a relatively low temperature of 100 °C for 3 h. Compared with raw material, enzymatic hydrolysis at a high-solid of 16.5% (w/w) specifically showed more excellent performance with bisulfite treated FRs. Direct bisulfite pretreatment improved the accessibility of substrates and the total glucose recovery. Lignosulfonate in the non-detoxified slurry decreased the non-productive adsorption of cellulase on the substrate, thus improving enzymatic hydrolysis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Validation of a stability-indicating hydrophilic interaction liquid chromatographic method for the quantitative determination of vitamin k3 (menadione sodium bisulfite) in injectable solution formulation.

    Science.gov (United States)

    Ghanem, Mashhour M; Abu-Lafi, Saleh A; Hallak, Hussein O

    2013-01-01

    A simple, specific, accurate, and stability-indicating method was developed and validated for the quantitative determination of menadione sodium bisulfite in the injectable solution formulation. The method is based on zwitterionic hydrophilic interaction liquid chromatography (ZIC-HILIC) coupled with a photodiode array detector. The desired separation was achieved on the ZIC-HILIC column (250 mm × 4.6 mm, 5 μm) at 25°C temperature. The optimized mobile phase consisted of an isocratic solvent mixture of 200mM ammonium acetate (NH4AC) solution and acetonitrile (ACN) (20:80; v/v) pH-adjusted to 5.7 by glacial acetic acid. The mobile phase was fixed at 0.5 ml/min and the analytes were monitored at 261 nm using a photodiode array detector. The effects of the chromatographic conditions on the peak retention, peak USP tailing factor, and column efficiency were systematically optimized. Forced degradation experiments were carried out by exposing menadione sodium bisulfite standard and the injectable solution formulation to thermal, photolytic, oxidative, and acid-base hydrolytic stress conditions. The degradation products were well-resolved from the main peak and the excipients, thus proving that the method is a reliable, stability-indicating tool. The method was validated as per ICH and USP guidelines (USP34/NF29) and found to be adequate for the routine quantitative estimation of menadione sodium bisulfite in commercially available menadione sodium bisulfite injectable solution dosage forms.

  14. Accurate CpG and non-CpG cytosine methylation analysis by high-throughput locus-specific pyrosequencing in plants.

    Science.gov (United States)

    How-Kit, Alexandre; Daunay, Antoine; Mazaleyrat, Nicolas; Busato, Florence; Daviaud, Christian; Teyssier, Emeline; Deleuze, Jean-François; Gallusci, Philippe; Tost, Jörg

    2015-07-01

    Pyrosequencing permits accurate quantification of DNA methylation of specific regions where the proportions of the C/T polymorphism induced by sodium bisulfite treatment of DNA reflects the DNA methylation level. The commercially available high-throughput locus-specific pyrosequencing instruments allow for the simultaneous analysis of 96 samples, but restrict the DNA methylation analysis to CpG dinucleotide sites, which can be limiting in many biological systems. In contrast to mammals where DNA methylation occurs nearly exclusively on CpG dinucleotides, plants genomes harbor DNA methylation also in other sequence contexts including CHG and CHH motives, which cannot be evaluated by these pyrosequencing instruments due to software limitations. Here, we present a complete pipeline for accurate CpG and non-CpG cytosine methylation analysis at single base-resolution using high-throughput locus-specific pyrosequencing. The devised approach includes the design and validation of PCR amplification on bisulfite-treated DNA and pyrosequencing assays as well as the quantification of the methylation level at every cytosine from the raw peak intensities of the Pyrograms by two newly developed Visual Basic Applications. Our method presents accurate and reproducible results as exemplified by the cytosine methylation analysis of the promoter regions of two Tomato genes (NOR and CNR) encoding transcription regulators of fruit ripening during different stages of fruit development. Our results confirmed a significant and temporally coordinated loss of DNA methylation on specific cytosines during the early stages of fruit development in both promoters as previously shown by WGBS. The manuscript describes thus the first high-throughput locus-specific DNA methylation analysis in plants using pyrosequencing.

  15. Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

    Science.gov (United States)

    Chen, Zitian; Zhou, Wenxiong; Qiao, Shuo; Kang, Li; Duan, Haifeng; Xie, X Sunney; Huang, Yanyi

    2017-12-01

    Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

  16. Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

    Directory of Open Access Journals (Sweden)

    Kevin R Ramkissoon

    Full Text Available The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.

  17. Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

    Science.gov (United States)

    Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

    2013-01-01

    The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

  18. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS

    NARCIS (Netherlands)

    Warris, S.; Yalcin, F.; Jackson, K.J.; Nap, J.P.H.

    2015-01-01

    Motivation To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate

  19. A novel fluorescent turn-on probe for bisulfite based on NBD ...

    Indian Academy of Sciences (India)

    College of Sciences, Henan Agricultural University, Zhengzhou 450002, P. R. China e-mail: phxie2013@163. ... used as an additive in foods, beverages and pharmaceu- tical products. ... of bisulfite is of great importance for food safety and.

  20. Influence of 2,3-diphosphoglycerate metabolism on sodium-potassium permeability in human red blood cells: studies with bisulfite and other redox agents

    Science.gov (United States)

    Parker, John C.

    1969-01-01

    It is known that bisulfite ions can selectively deplete red blood cells of 2,3-diphosphoglycerate (2,3-DPG). Studies of the effects of bisulfite on sodium-potassium permeability and metabolism were undertaken to clarify the physiologic role of the abundant quantities of 2,3-DPG in human erythrocytes. Treatment of cells with bisulfite results in a reversible increase in the passive permeability to Na and K ions. Metabolism of glucose to lactate is increased, with a rise in the intracellular ratio of fructose diphosphate to hexose monophosphate. Cell 2,3-DPG is quantitatively converted to pyruvate and inorganic phosphate. The permeability effects of bisulfite are countered by ethacrynic acid and by such oxidizing agents as pyruvate and methylene blue. Taken together, the results suggest that the effects on Na-K flux of bisulfite are related more to the reducing potential of this anion than to its capacity to deplete cells of 2,3-DPG. PMID:5765015

  1. Kinetics, mechanism and thermodynamics of bisulfite-aldehyde adduct formation

    Energy Technology Data Exchange (ETDEWEB)

    Olson, T.M.; Boyce, S.D.; Hoffmann, M.R.

    1986-04-01

    The kinetics and mechanism of bisulfite addition to benzaldehyde were studied at low pH in order to assess the importance of this reaction in stabilizing S(IV) in fog-, cloud-, and rainwater. Previously, the authors established that appreciable concentrations of the formaldehyde-bisulfite adduct (HMSA) are often present in fogwater. Measured HMSA concentrations in fogwater often do not fully account for observed excess S(IV) concentrations, however, so that other S(IV)-aldehyde adducts may be present. Reaction rates were determined by monitoring the disappearance of benzaldehyde by U.V. spectrophotometry under pseudo-first order conditions, (S(IV))/sub T/ >>(phi-CHO)/sub T/, in the pH range 0 - 4.4 at 25/sup 0/C. The equilibrium constant was determined by dissolving the sodium salt of the addition compound in a solution adjusted to pH 3.9, and measuring the absorbance of the equilibrated solution at 250 nm. A literature value of the extinction coefficient for benzaldehyde was used to calculate the concentration of free benzaldehyde. All solutions were prepared under an N/sub 2/ atmosphere using deoxygenated, deionized water and ionic strength was maintained at 1.0 M with sodium chloride.

  2. Accurate Local-Ancestry Inference in Exome-Sequenced Admixed Individuals via Off-Target Sequence Reads

    Science.gov (United States)

    Hu, Youna; Willer, Cristen; Zhan, Xiaowei; Kang, Hyun Min; Abecasis, Gonçalo R.

    2013-01-01

    Estimates of the ancestry of specific chromosomal regions in admixed individuals are useful for studies of human evolutionary history and for genetic association studies. Previously, this ancestry inference relied on high-quality genotypes from genome-wide association study (GWAS) arrays. These high-quality genotypes are not always available when samples are exome sequenced, and exome sequencing is the strategy of choice for many ongoing genetic studies. Here we show that off-target reads generated during exome-sequencing experiments can be combined with on-target reads to accurately estimate the ancestry of each chromosomal segment in an admixed individual. To reconstruct local ancestry, our method SEQMIX models aligned bases directly instead of relying on hard genotype calls. We evaluate the accuracy of our method through simulations and analysis of samples sequenced by the 1000 Genomes Project and the NHLBI Grand Opportunity Exome Sequencing Project. In African Americans, we show that local-ancestry estimates derived by our method are very similar to those derived with Illumina’s Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads. Software implementing this method, SEQMIX, can be applied to analysis of human population history or used for genetic association studies in admixed individuals. PMID:24210252

  3. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    Science.gov (United States)

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  4. Immunocytochemical localization of APS reductase and bisulfite reductase in three Desulfovibrio species

    NARCIS (Netherlands)

    Kremer, D.R.; Veenhuis, M.; Fauque, G.; Peck Jr., H.D.; LeGall, J.; Lampreia, J.; Moura, J.J.G.; Hansen, T.A.

    1988-01-01

    The localization of APS reductase and bisulfite reductase in Desulfovibrio gigas, D. vulgaris Hildenborough and D. thermophilus was studied by immunoelectron microscopy. Polyclonal antibodies were raised against the purified enzymes from each strain. Cells fixed with formaldehyde/glutaraldehyde were

  5. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

    Science.gov (United States)

    Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

    2018-01-01

    Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also

  6. An accurate clone-based haplotyping method by overlapping pool sequencing.

    Science.gov (United States)

    Li, Cheng; Cao, Changchang; Tu, Jing; Sun, Xiao

    2016-07-08

    Chromosome-long haplotyping of human genomes is important to identify genetic variants with differing gene expression, in human evolution studies, clinical diagnosis, and other biological and medical fields. Although several methods have realized haplotyping based on sequencing technologies or population statistics, accuracy and cost are factors that prohibit their wide use. Borrowing ideas from group testing theories, we proposed a clone-based haplotyping method by overlapping pool sequencing. The clones from a single individual were pooled combinatorially and then sequenced. According to the distinct pooling pattern for each clone in the overlapping pool sequencing, alleles for the recovered variants could be assigned to their original clones precisely. Subsequently, the clone sequences could be reconstructed by linking these alleles accordingly and assembling them into haplotypes with high accuracy. To verify the utility of our method, we constructed 130 110 clones in silico for the individual NA12878 and simulated the pooling and sequencing process. Ultimately, 99.9% of variants on chromosome 1 that were covered by clones from both parental chromosomes were recovered correctly, and 112 haplotype contigs were assembled with an N50 length of 3.4 Mb and no switch errors. A comparison with current clone-based haplotyping methods indicated our method was more accurate. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

    Science.gov (United States)

    Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal

    2012-01-01

    Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of

  8. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    Directory of Open Access Journals (Sweden)

    Sven Warris

    Full Text Available To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis.With the Parallel SW Alignment Software (PaSWAS it is possible (a to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs to perform high-speed sequence alignments, and (b retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1 tag recovery in next generation sequence data and (2 isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  9. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    Science.gov (United States)

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

    2015-01-01

    To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  10. Metagenomic analysis indicates Epsilonproteobacteria as a potential cause of microbial corrosion in pipelines injected with bisulfite

    Directory of Open Access Journals (Sweden)

    Dongshan eAn

    2016-01-01

    Full Text Available Sodium bisulfite (SBS is used as an oxygen scavenger to decrease corrosion in pipelines transporting brackish subsurface water used in the production of bitumen by steam-assisted gravity drainage. Sequencing 16S rRNA gene amplicons has indicated that SBS addition increased the fraction of the sulfate-reducing bacteria (SRB Desulfomicrobium, as well as of Desulfocapsa, which can also grow by disproportionating sulfite into sulfide, sulfur and sulfate. SRB use cathodic H2, formed by reduction of aqueous protons at the iron surface, or use low potential electrons from iron and aqueous protons directly for sulfate reduction. In order to reveal the effects of SBS treatment in more detail, metagenomic analysis was performed with pipe-associated solids (PAS scraped from a pipe section upstream (PAS-616P and downstream (PAS-821TP of the SBS injection point. A major SBS-induced change in microbial community composition and in affiliated hynL genes for the large subunit of [NiFe] hydrogenase was the appearance of sulfur-metabolizing Epsilonproteobacteria of the genera Sulfuricurvum and Sulfurovum. These are chemolithotrophs, which oxidize sulfide or sulfur with O2 or reduce sulfur with H2. Because O2 was absent, this class likely catalyzed reduction of sulfur (S0 originating from the metabolism of bisulfite with cathodic H2 (or low potential electrons and aqueous protons originating from the corrosion of steel (Fe0. Overall this accelerates reaction of of S0 and Fe0 to form FeS, making this class a potentially powerful contributor to microbial corrosion. The PAS-821TP metagenome also had increased fractions of Deltaproteobacteria including the SRB Desulfomicrobium and Desulfocapsa. Altogether, SBS increased the fraction of hydrogen-utilizing Delta- and Epsilonproteobacteria in brackish-water-transporting pipelines, potentially stimulating anaerobic pipeline corrosion if dosed in excess of the intended oxygen scavenger function.

  11. Accurate and fast methods to estimate the population mutation rate from error prone sequences

    Directory of Open Access Journals (Sweden)

    Miyamoto Michael M

    2009-08-01

    Full Text Available Abstract Background The population mutation rate (θ remains one of the most fundamental parameters in genetics, ecology, and evolutionary biology. However, its accurate estimation can be seriously compromised when working with error prone data such as expressed sequence tags, low coverage draft sequences, and other such unfinished products. This study is premised on the simple idea that a random sequence error due to a chance accident during data collection or recording will be distributed within a population dataset as a singleton (i.e., as a polymorphic site where one sampled sequence exhibits a unique base relative to the common nucleotide of the others. Thus, one can avoid these random errors by ignoring the singletons within a dataset. Results This strategy is implemented under an infinite sites model that focuses on only the internal branches of the sample genealogy where a shared polymorphism can arise (i.e., a variable site where each alternative base is represented by at least two sequences. This approach is first used to derive independently the same new Watterson and Tajima estimators of θ, as recently reported by Achaz 1 for error prone sequences. It is then used to modify the recent, full, maximum-likelihood model of Knudsen and Miyamoto 2, which incorporates various factors for experimental error and design with those for coalescence and mutation. These new methods are all accurate and fast according to evolutionary simulations and analyses of a real complex population dataset for the California seahare. Conclusion In light of these results, we recommend the use of these three new methods for the determination of θ from error prone sequences. In particular, we advocate the new maximum likelihood model as a starting point for the further development of more complex coalescent/mutation models that also account for experimental error and design.

  12. Reference Materials for Calibration of Analytical Biases in Quantification of DNA Methylation.

    Science.gov (United States)

    Yu, Hannah; Hahn, Yoonsoo; Yang, Inchul

    2015-01-01

    Most contemporary methods for the quantification of DNA methylation employ bisulfite conversion and PCR amplification. However, many reports have indicated that bisulfite-mediated PCR methodologies can result in inaccurate measurements of DNA methylation owing to amplification biases. To calibrate analytical biases in quantification of gene methylation, especially those that arise during PCR, we utilized reference materials that represent exact bisulfite-converted sequences with 0% and 100% methylation status of specific genes. After determining relative quantities using qPCR, pairs of plasmids were gravimetrically mixed to generate working standards with predefined DNA methylation levels at 10% intervals in terms of mole fractions. The working standards were used as controls to optimize the experimental conditions and also as calibration standards in melting-based and sequencing-based analyses of DNA methylation. Use of the reference materials enabled precise characterization and proper calibration of various biases during PCR and subsequent methylation measurement processes, resulting in accurate measurements.

  13. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  14. Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

    Science.gov (United States)

    Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

    2018-05-01

    High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

  15. Sequence- vs. chip-assisted genomic selection: accurate biological information is advised.

    Science.gov (United States)

    Pérez-Enciso, Miguel; Rincón, Juan C; Legarra, Andrés

    2015-05-09

    The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial sequence data can improve upon existing SNP (single nucleotide polymorphism) array-based selection strategies by simulation using a mixed coalescence - gene-dropping approach. We simulated 20 or 100 causal mutations (quantitative trait nucleotides, QTN) within 65 predefined 'gene' regions, each 10 kb long, within a genome composed of ten 3-Mb chromosomes. We compared prediction accuracy by cross-validation using a medium-density chip (7.5 k SNPs), a high-density (HD, 17 k) and sequence data (335 k). Genetic evaluation was based on a GBLUP method. The simulations showed: (1) a law of diminishing returns with increasing number of SNPs; (2) a modest effect of SNP ascertainment bias in arrays; (3) a small advantage of using whole-genome sequence data vs. HD arrays i.e. ~4%; (4) a minor effect of NGS errors except when imputation error rates are high (≥20%); and (5) if QTN were known, prediction accuracy approached 1. Since this is obviously unrealistic, we explored milder assumptions. We showed that, if all SNPs within causal genes were included in the prediction model, accuracy could also dramatically increase by ~40%. However, this criterion was highly sensitive to either misspecification (including wrong genes) or to the use of an incomplete gene list; in these cases, accuracy fell rapidly towards that reached when all SNPs from sequence data were blindly included in the model. Our study shows that, unless an accurate prior estimate on the functionality of SNPs can be included in the predictor, there is a law of diminishing returns with increasing SNP density. As a result, use of whole-genome sequence data may not result in a highly increased selection response over high

  16. Does universal 16S rRNA gene amplicon sequencing of environmental communities provide an accurate description of nitrifying guilds?

    DEFF Research Database (Denmark)

    Diwan, Vaibhav; Albrechtsen, Hans-Jørgen; Smets, Barth F.

    2018-01-01

    amplicon sequencing and from guild targeted approaches. The universal amplicon sequencing provided 1) accurate estimates of nitrifier composition, 2) clustering of the samples based on these compositions consistent with sample origin, 3) estimates of the relative abundance of the guilds correlated...

  17. Use of Whole-Genus Genome Sequence Data To Develop a Multilocus Sequence Typing Tool That Accurately Identifies Yersinia Isolates to the Species and Subspecies Levels

    Science.gov (United States)

    Hall, Miquette; Chattaway, Marie A.; Reuter, Sandra; Savin, Cyril; Strauch, Eckhard; Carniel, Elisabeth; Connor, Thomas; Van Damme, Inge; Rajakaruna, Lakshani; Rajendram, Dunstan; Jenkins, Claire; Thomson, Nicholas R.

    2014-01-01

    The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica. PMID:25339391

  18. Performances of Different Fragment Sizes for Reduced Representation Bisulfite Sequencing in Pigs

    DEFF Research Database (Denmark)

    Yuan, Xiao Long; Zhang, Zhe; Pan, Rong Yang

    2017-01-01

    sizes might decrease when the dataset size was more than 70, 50 and 110 million reads for these three fragment sizes, respectively. Given a 50-million dataset size, the average sequencing depth of the detected CpG sites in the 110-220 bp fragment size appeared to be deeper than in the 40-110 bp and 40...

  19. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

    KAUST Repository

    Allam, Amin

    2015-07-14

    Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.

  20. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

    2015-05-01

    Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.

  1. Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

    Science.gov (United States)

    Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

    2017-10-01

    Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  2. Accurate Typing of Human Leukocyte Antigen Class I Genes by Oxford Nanopore Sequencing.

    Science.gov (United States)

    Liu, Chang; Xiao, Fangzhou; Hoisington-Lopez, Jessica; Lang, Kathrin; Quenzel, Philipp; Duffy, Brian; Mitra, Robi David

    2018-04-03

    Oxford Nanopore Technologies' MinION has expanded the current DNA sequencing toolkit by delivering long read lengths and extreme portability. The MinION has the potential to enable expedited point-of-care human leukocyte antigen (HLA) typing, an assay routinely used to assess the immunologic compatibility between organ donors and recipients, but the platform's high error rate makes it challenging to type alleles with accuracy. We developed and validated accurate typing of HLA by Oxford nanopore (Athlon), a bioinformatic pipeline that i) maps nanopore reads to a database of known HLA alleles, ii) identifies candidate alleles with the highest read coverage at different resolution levels that are represented as branching nodes and leaves of a tree structure, iii) generates consensus sequences by remapping the reads to the candidate alleles, and iv) calls the final diploid genotype by blasting consensus sequences against the reference database. Using two independent data sets generated on the R9.4 flow cell chemistry, Athlon achieved a 100% accuracy in class I HLA typing at the two-field resolution. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  3. Different characteristics between menadione and menadione sodium bisulfite as redox mediator in yeast cell suspension

    OpenAIRE

    Yamashoji, Shiro

    2016-01-01

    Menadione promoted the production of active oxygen species (AOS) in both yeast cell suspension and the crude enzymes from the cells, but menadione sodium bisulfite (MSB) had little effect on the production of AOS in the cell suspension. MSB kept the stable increase in the electron transfer from intact yeast cells to anode compared to menadione, but the electron transfer promoted by MSB was inhibited in permeabilized yeast cell suspension. Menadione promoted oxidation of NAD(P)H much faster th...

  4. Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin.

    Directory of Open Access Journals (Sweden)

    Vineet K Sharma

    Full Text Available Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp. To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/.

  5. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

    Science.gov (United States)

    Neuwald, Andrew F

    2009-08-01

    The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.

  6. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    Science.gov (United States)

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni

    2013-01-01

    Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies......, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  8. Combined Bisulfite Restriction Analysis for brain tissue identification.

    Science.gov (United States)

    Samsuwan, Jarunya; Muangsub, Tachapol; Yanatatsaneejit, Pattamawadee; Mutirangura, Apiwat; Kitkumthorn, Nakarin

    2018-05-01

    According to the tissue-specific methylation database (doi: 10.1016/j.gene.2014.09.060), methylation at CpG locus cg03096975 in EML2 has been preliminarily proven to be specific to brain tissue. In this study, we enlarged sample size and developed a technique for identifying brain tissue in aged samples. Combined Bisulfite Restriction Analysis-for EML2 (COBRA-EML2) technique was established and validated in various organ samples obtained from 108 autopsies. In addition, this technique was also tested for its reliability, minimal DNA concentration detected, and use in aged samples and in samples obtained from specific brain compartments and spinal cord. COBRA-EML2 displayed 100% sensitivity and specificity for distinguishing brain tissue from other tissues, showed high reliability, was capable of detecting minimal DNA concentration (0.015ng/μl), could be used for identifying brain tissue in aged samples. In summary, COBRA-EML2 is a technique to identify brain tissue. This analysis is useful in criminal cases since it can identify the vital organ tissues from small samples acquired from criminal scenes. The results from this analysis can be counted as a medical and forensic marker supporting criminal investigations, and as one of the evidences in court rulings. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

    Directory of Open Access Journals (Sweden)

    Shi Weisong

    2011-06-01

    Full Text Available Abstract Background Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS. However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80% mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http

  10. CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.

    Science.gov (United States)

    Nguyen, Tung; Shi, Weisong; Ruden, Douglas

    2011-06-06

    Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http

  11. Accurate estimation of short read mapping quality for next-generation genome sequencing

    Science.gov (United States)

    Ruffalo, Matthew; Koyutürk, Mehmet; Ray, Soumya; LaFramboise, Thomas

    2012-01-01

    Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment—in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. Approach: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. Results: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can ‘resurrect’ many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. Availability: LoQuM is available as open source at http://compbio.case.edu/loqum/. Contact: matthew.ruffalo@case.edu. PMID:22962451

  12. Targeted sequencing of large genomic regions with CATCH-Seq.

    Directory of Open Access Journals (Sweden)

    Kenneth Day

    Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

  13. MethPrimer: designing primers for methylation PCRs.

    Science.gov (United States)

    Li, Long-Cheng; Dahiya, Rajvir

    2002-11-01

    DNA methylation is an epigenetic mechanism of gene regulation. Bisulfite- conversion-based PCR methods, such as bisulfite sequencing PCR (BSP) and methylation specific PCR (MSP), remain the most commonly used techniques for methylation mapping. Existing primer design programs developed for standard PCR cannot handle primer design for bisulfite-conversion-based PCRs due to changes in DNA sequence context caused by bisulfite treatment and many special constraints both on the primers and the region to be amplified for such experiments. Therefore, the present study was designed to develop a program for such applications. MethPrimer, based on Primer 3, is a program for designing PCR primers for methylation mapping. It first takes a DNA sequence as its input and searches the sequence for potential CpG islands. Primers are then picked around the predicted CpG islands or around regions specified by users. MethPrimer can design primers for BSP and MSP. Results of primer selection are delivered through a web browser in text and in graphic view.

  14. An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing

    KAUST Repository

    Han, Renmin

    2017-12-24

    Long-reads, point-of-care, and PCR-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the global mapping between the raw electrical current signal sequence and the expected signal sequence from the pore model serves as the key building block to base calling, reads mapping, variant identification, and methylation detection. However, the ultra-long reads of nanopore sequencing and an order of magnitude difference in the sampling speeds of the two sequences make the classical dynamic time warping (DTW) and its variants infeasible to solve the problem. Here, we propose a novel multi-level DTW algorithm, cwDTW, based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa, as well as two benchmark datasets from previous studies, demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can almost always generate warping paths that are very close to the original DTW, which are remarkably more accurate than the state-of-the-art methods including FastDTW and PrunedDTW. Meanwhile, on the real nanopore datasets, cwDTW is about 440 times faster than FastDTW and 3000 times faster than the original DTW. Our program is available at https://github.com/realbigws/cwDTW.

  15. Different pathways are involved in the enhancement of photosynthetic rate by sodium bisulfite and benzyladenine, a case study with strawberry (Fragaria x Ananassa Duch) plants

    NARCIS (Netherlands)

    Guo, Y.P.; Peng, Y.; Lin, M.L.; Guo, D.P.; Hu, M.J.; Shen, Y.K.; Li, D.Y.; Zheng, S.J.

    2006-01-01

    In order to understand the pathway involved in the chemical enhancement of photosynthetic rate, sodium bisulfite (NaHSO3) and benzyladenine (BA), a growth regulator, were applied to strawberry plants. The influence of these compounds on gas exchange and millisecond delayed light emission (ms-DLE)

  16. On-Chip Evaluation of DNA Methylation with Electrochemical Combined Bisulfite Restriction Analysis Utilizing a Carbon Film Containing a Nanocrystalline Structure.

    Science.gov (United States)

    Kurita, Ryoji; Yanagisawa, Hiroyuki; Kamata, Tomoyuki; Kato, Dai; Niwa, Osamu

    2017-06-06

    This paper reports an on-chip electrochemical assessment of the DNA methylation status in genomic DNA on a conductive nanocarbon film electrode realized with combined bisulfite restriction analysis (COBRA). The film electrode consists of sp 2 and sp 3 hybrid bonds and is fabricated with an unbalanced magnetron (UBM) sputtering method. First, we studied the effect of the sp 2 /sp 3 ratio of the UBM nanocarbon film electrode with p-aminophenol, which is a major electro-active product of the labeling enzyme from p-aminophenol phosphate. The signal current for p-aminophenol increases as the sp 2 content in the UBM nanocarbon film electrode increases because of the π-π interaction between aromatic p-aminophenol and the graphene-like sp 2 structure. Furthermore, the capacitative current at the UBM nanocarbon film electrode was successfully reduced by about 1 order of magnitude thanks to the angstrom-level surface flatness. Therefore, a high signal-to-noise ratio was achieved compared with that of conventional electrodes. Then, after performing an ELISA-like hybridization assay with a restriction enzyme, we undertook an electrochemical evaluation of the cytosine methylation status in DNA by measuring the oxidation current derived from p-aminophenol. When the target cytosine in the analyte sequence is methylated (unmethylated), the restriction enzyme of HpyCH4IV is able (unable) to cleave the sequence, that is, the detection probe cannot (can) hybridize. We succeeded in estimating the methylation ratio at a site-specific CpG site from the peak current of a cyclic voltammogram obtained from a PCR product solution ranging from 0.01 to 1 nM.

  17. DNA barcode data accurately assign higher spider taxa

    Directory of Open Access Journals (Sweden)

    Jonathan A. Coddington

    2016-07-01

    Full Text Available The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%. Accurate assignment of higher taxa (PIdent above which errors totaled less than 5% occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However

  18. Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

    KAUST Repository

    Kobayashi, Masaaki

    2017-04-20

    Recent availability of large-scale genomic resources enables us to conduct so called genome-wide association studies (GWAS) and genomic prediction (GP) studies, particularly with next-generation sequencing (NGS) data. The effectiveness of GWAS and GP depends on not only their mathematical models, but the quality and quantity of variants employed in the analysis. In NGS single nucleotide polymorphism (SNP) calling, conventional tools ideally require more reads for higher SNP sensitivity and accuracy. In this study, we aimed to develop a tool, Heap, that enables robustly sensitive and accurate calling of SNPs, particularly with a low coverage NGS data, which must be aligned to the reference genome sequences in advance. To reduce false positive SNPs, Heap determines genotypes and calls SNPs at each site except for sites at the both ends of reads or containing a minor allele supported by only one read. Performance comparison with existing tools showed that Heap achieved the highest F-scores with low coverage (7X) restriction-site associated DNA sequencing reads of sorghum and rice individuals. This will facilitate cost-effective GWAS and GP studies in this NGS era. Code and documentation of Heap are freely available from https://github.com/meiji-bioinf/heap (29 March 2017, date last accessed) and our web site (http://bioinf.mind.meiji.ac.jp/lab/en/tools.html (29 March 2017, date last accessed)).

  19. HeurAA: accurate and fast detection of genetic variations with a novel heuristic amplicon aligner program for next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Lőrinc S Pongor

    Full Text Available Next generation sequencing (NGS of PCR amplicons is a standard approach to detect genetic variations in personalized medicine such as cancer diagnostics. Computer programs used in the NGS community often miss insertions and deletions (indels that constitute a large part of known human mutations. We have developed HeurAA, an open source, heuristic amplicon aligner program. We tested the program on simulated datasets as well as experimental data from multiplex sequencing of 40 amplicons in 12 oncogenes collected on a 454 Genome Sequencer from lung cancer cell lines. We found that HeurAA can accurately detect all indels, and is more than an order of magnitude faster than previous programs. HeurAA can compare reads and reference sequences up to several thousand base pairs in length, and it can evaluate data from complex mixtures containing reads of different gene-segments from different samples. HeurAA is written in C and Perl for Linux operating systems, the code and the documentation are available for research applications at http://sourceforge.net/projects/heuraa/

  20. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

    Science.gov (United States)

    Alcantara, Luiz Carlos Junior; Cassol, Sharon; Libin, Pieter; Deforche, Koen; Pybus, Oliver G; Van Ranst, Marc; Galvão-Castro, Bernardo; Vandamme, Anne-Mieke; de Oliveira, Tulio

    2009-07-01

    Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.

  1. Hi-Plex for Simple, Accurate, and Cost-Effective Amplicon-based Targeted DNA Sequencing.

    Science.gov (United States)

    Pope, Bernard J; Hammet, Fleur; Nguyen-Dumont, Tu; Park, Daniel J

    2018-01-01

    Hi-Plex is a suite of methods to enable simple, accurate, and cost-effective highly multiplex PCR-based targeted sequencing (Nguyen-Dumont et al., Biotechniques 58:33-36, 2015). At its core is the principle of using gene-specific primers (GSPs) to "seed" (or target) the reaction and universal primers to "drive" the majority of the reaction. In this manner, effects on amplification efficiencies across the target amplicons can, to a large extent, be restricted to early seeding cycles. Product sizes are defined within a relatively narrow range to enable high-specificity size selection, replication uniformity across target sites (including in the context of fragmented input DNA such as that derived from fixed tumor specimens (Nguyen-Dumont et al., Biotechniques 55:69-74, 2013; Nguyen-Dumont et al., Anal Biochem 470:48-51, 2015), and application of high-specificity genetic variant calling algorithms (Pope et al., Source Code Biol Med 9:3, 2014; Park et al., BMC Bioinformatics 17:165, 2016). Hi-Plex offers a streamlined workflow that is suitable for testing large numbers of specimens without the need for automation.

  2. A probabilistic generative model for quantification of DNA modifications enables analysis of demethylation pathways.

    Science.gov (United States)

    Äijö, Tarmo; Huang, Yun; Mannerström, Henrik; Chavez, Lukas; Tsagaratou, Ageliki; Rao, Anjana; Lähdesmäki, Harri

    2016-03-14

    We present a generative model, Lux, to quantify DNA methylation modifications from any combination of bisulfite sequencing approaches, including reduced, oxidative, TET-assisted, chemical-modification assisted, and methylase-assisted bisulfite sequencing data. Lux models all cytosine modifications (C, 5mC, 5hmC, 5fC, and 5caC) simultaneously together with experimental parameters, including bisulfite conversion and oxidation efficiencies, as well as various chemical labeling and protection steps. We show that Lux improves the quantification and comparison of cytosine modification levels and that Lux can process any oxidized methylcytosine sequencing data sets to quantify all cytosine modifications. Analysis of targeted data from Tet2-knockdown embryonic stem cells and T cells during development demonstrates DNA modification quantification at unprecedented detail, quantifies active demethylation pathways and reveals 5hmC localization in putative regulatory regions.

  3. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    Science.gov (United States)

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  4. Droplet Digital™ PCR Next-Generation Sequencing Library QC Assay.

    Science.gov (United States)

    Heredia, Nicholas J

    2018-01-01

    Digital PCR is a valuable tool to quantify next-generation sequencing (NGS) libraries precisely and accurately. Accurately quantifying NGS libraries enable accurate loading of the libraries on to the sequencer and thus improve sequencing performance by reducing under and overloading error. Accurate quantification also benefits users by enabling uniform loading of indexed/barcoded libraries which in turn greatly improves sequencing uniformity of the indexed/barcoded samples. The advantages gained by employing the Droplet Digital PCR (ddPCR™) library QC assay includes the precise and accurate quantification in addition to size quality assessment, enabling users to QC their sequencing libraries with confidence.

  5. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  6. Evidence Suggesting Absence of Mitochondrial DNA Methylation

    DEFF Research Database (Denmark)

    Mechta, Mie; Ingerslev, Lars R; Fabre, Odile

    2017-01-01

    , 16S, ND5 and CYTB, suggesting that mtDNA supercoiled structure blocks the access to bisulfite conversion. Here, we identified an artifact of mtDNA bisulfite sequencing that can lead to an overestimation of mtDNA methylation levels. Our study supports that cytosine methylation is virtually absent...

  7. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    Science.gov (United States)

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  8. Highly accurate sequence imputation enables precise QTL mapping in Brown Swiss cattle.

    Science.gov (United States)

    Frischknecht, Mirjam; Pausch, Hubert; Bapst, Beat; Signer-Hasler, Heidi; Flury, Christine; Garrick, Dorian; Stricker, Christian; Fries, Ruedi; Gredler-Grandl, Birgit

    2017-12-29

    Within the last few years a large amount of genomic information has become available in cattle. Densities of genomic information vary from a few thousand variants up to whole genome sequence information. In order to combine genomic information from different sources and infer genotypes for a common set of variants, genotype imputation is required. In this study we evaluated the accuracy of imputation from high density chips to whole genome sequence data in Brown Swiss cattle. Using four popular imputation programs (Beagle, FImpute, Impute2, Minimac) and various compositions of reference panels, the accuracy of the imputed sequence variant genotypes was high and differences between the programs and scenarios were small. We imputed sequence variant genotypes for more than 1600 Brown Swiss bulls and performed genome-wide association studies for milk fat percentage at two stages of lactation. We found one and three quantitative trait loci for early and late lactation fat content, respectively. Known causal variants that were imputed from the sequenced reference panel were among the most significantly associated variants of the genome-wide association study. Our study demonstrates that whole-genome sequence information can be imputed at high accuracy in cattle populations. Using imputed sequence variant genotypes in genome-wide association studies may facilitate causal variant detection.

  9. Accurate quantification of mouse mitochondrial DNA without co-amplification of nuclear mitochondrial insertion sequences.

    Science.gov (United States)

    Malik, Afshan N; Czajka, Anna; Cunningham, Phil

    2016-07-01

    Mitochondria contain an extra-nuclear genome in the form of mitochondrial DNA (MtDNA), damage to which can lead to inflammation and bioenergetic deficit. Changes in MtDNA levels are increasingly used as a biomarker of mitochondrial dysfunction. We previously reported that in humans, fragments in the nuclear genome known as nuclear mitochondrial insertion sequences (NumtS) affect accurate quantification of MtDNA. In the current paper our aim was to determine whether mouse NumtS affect the quantification of MtDNA and to establish a method designed to avoid this. The existence of NumtS in the mouse genome was confirmed using blast N, unique MtDNA regions were identified using FASTA, and MtDNA primers which do not co-amplify NumtS were designed and tested. MtDNA copy numbers were determined in a range of mouse tissues as the ratio of the mitochondrial and nuclear genome using real time qPCR and absolute quantification. Approximately 95% of mouse MtDNA was duplicated in the nuclear genome as NumtS which were located in 15 out of 21 chromosomes. A unique region was identified and primers flanking this region were used. MtDNA levels differed significantly in mouse tissues being the highest in the heart, with levels in descending order (highest to lowest) in kidney, liver, blood, brain, islets and lung. The presence of NumtS in the nuclear genome of mouse could lead to erroneous data when studying MtDNA content or mutation. The unique primers described here will allow accurate quantification of MtDNA content in mouse models without co-amplification of NumtS. Copyright © 2016 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

  10. An integrated tool to study MHC region: accurate SNV detection and HLA genes typing in human MHC region using targeted high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Hongzhi Cao

    Full Text Available The major histocompatibility complex (MHC is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that ∼97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community.

  11. Learning a weighted sequence model of the nucleosome core and linker yields more accurate predictions in Saccharomyces cerevisiae and Homo sapiens.

    Directory of Open Access Journals (Sweden)

    Sheila M Reynolds

    2010-07-01

    Full Text Available DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence-301 base pairs, centered at the position to be scored-with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the

  12. Learning a weighted sequence model of the nucleosome core and linker yields more accurate predictions in Saccharomyces cerevisiae and Homo sapiens.

    Science.gov (United States)

    Reynolds, Sheila M; Bilmes, Jeff A; Noble, William Stafford

    2010-07-08

    DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence-301 base pairs, centered at the position to be scored-with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the bulk of the

  13. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads.

    Directory of Open Access Journals (Sweden)

    Christopher A Miller

    2011-01-01

    Full Text Available Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.

  14. Accurate phylogenetic classification of DNA fragments based onsequence composition

    Energy Technology Data Exchange (ETDEWEB)

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequence characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.

  15. Accurate and High-Coverage Immune Repertoire Sequencing Reveals Characteristics of Antibody Repertoire Diversification in Young Children with Malaria

    Science.gov (United States)

    Jiang, Ning

    Accurately measuring the immune repertoire sequence composition, diversity, and abundance is important in studying repertoire response in infections, vaccinations, and cancer immunology. Using molecular identifiers (MIDs) to tag mRNA molecules is an effective method in improving the accuracy of immune repertoire sequencing (IR-seq). However, it is still difficult to use IR-seq on small amount of clinical samples to achieve a high coverage of the repertoire diversities. This is especially challenging in studying infections and vaccinations where B cell subpopulations with fewer cells, such as memory B cells or plasmablasts, are often of great interest to study somatic mutation patterns and diversity changes. Here, we describe an approach of IR-seq based on the use of MIDs in combination with a clustering method that can reveal more than 80% of the antibody diversity in a sample and can be applied to as few as 1,000 B cells. We applied this to study the antibody repertoires of young children before and during an acute malaria infection. We discovered unexpectedly high levels of somatic hypermutation (SHM) in infants and revealed characteristics of antibody repertoire development in young children that would have a profound impact on immunization in children.

  16. Non-invasive prenatal diagnosis of achondroplasia and thanatophoric dysplasia: next-generation sequencing allows for a safer, more accurate, and comprehensive approach.

    Science.gov (United States)

    Chitty, Lyn S; Mason, Sarah; Barrett, Angela N; McKay, Fiona; Lench, Nicholas; Daley, Rebecca; Jenkins, Lucy A

    2015-07-01

    Accurate prenatal diagnosis of genetic conditions can be challenging and usually requires invasive testing. Here, we demonstrate the potential of next-generation sequencing (NGS) for the analysis of cell-free DNA in maternal blood to transform prenatal diagnosis of monogenic disorders. Analysis of cell-free DNA using a PCR and restriction enzyme digest (PCR-RED) was compared with a novel NGS assay in pregnancies at risk of achondroplasia and thanatophoric dysplasia. PCR-RED was performed in 72 cases and was correct in 88.6%, inconclusive in 7% with one false negative. NGS was performed in 47 cases and was accurate in 96.2% with no inconclusives. Both approaches were used in 27 cases, with NGS giving the correct result in the two cases inconclusive with PCR-RED. NGS provides an accurate, flexible approach to non-invasive prenatal diagnosis of de novo and paternally inherited mutations. It is more sensitive than PCR-RED and is ideal when screening a gene with multiple potential pathogenic mutations. These findings highlight the value of NGS in the development of non-invasive prenatal diagnosis for other monogenic disorders. © 2015 John Wiley & Sons, Ltd.

  17. PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

    Directory of Open Access Journals (Sweden)

    Oren E Livne

    2015-03-01

    Full Text Available Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm, a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs, from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

  18. 3T MRI of the knee with optimised isotropic 3D sequences. Accurate delineation of intra-articular pathology without prolonged acquisition times

    Energy Technology Data Exchange (ETDEWEB)

    Abdulaal, Osamah M.; Rainford, Louise; Galligan, Marie; McGee, Allison [University College Dublin, Radiography and Diagnostic Imaging, School of Medicine, Belfield, Dublin (Ireland); MacMahon, Peter; Kavanagh, Eoin [Mater Misericordiae University Hospital, Department of Radiology, Dublin (Ireland); University College Dublin, School of Medicine, Dublin (Ireland); Cashman, James [Mater Misericordiae University Hospital, Department of Orthopaedics, Dublin (Ireland); University College Dublin, School of Medicine, Dublin (Ireland)

    2017-11-15

    To investigate optimised isotropic 3D turbo spin echo (TSE) and gradient echo (GRE)-based pulse sequences for visualisation of articular cartilage lesions within the knee joint. Optimisation of experimental imaging sequences was completed using healthy volunteers (n=16) with a 3-Tesla (3T) MRI scanner. Imaging of patients with knee cartilage abnormalities (n=57) was then performed. Acquired sequences included 3D proton density-weighted (PDW) TSE (SPACE) with and without fat-suppression (FS), and T2*W GRE (TrueFISP) sequences, with acquisition times of 6:51, 6:32 and 5:35 min, respectively. One hundred sixty-one confirmed cartilage lesions were detected and categorised (Grade II n=90, Grade III n=71). The highest sensitivity and specificity for detecting cartilage lesions were obtained with TrueFISP with values of 84.7% and 92%, respectively. Cartilage SNR mean for PDW SPACE-FS was the highest at 72.2. TrueFISP attained the highest CNR means for joint fluid/cartilage (101.5) and joint fluid/ligament (156.5), and the lowest CNR for cartilage/meniscus (48.5). Significant differences were identified across the three sequences for all anatomical structures with respect to SNR and CNR findings (p-value <0.05). Isotropic TrueFISP at 3T, optimised for acquisition time, accurately detects cartilage defects, although it demonstrated the lowest contrast between cartilage and meniscus. (orig.)

  19. BS-virus-finder

    DEFF Research Database (Denmark)

    Gao, Shengjie; Hu, Xuesong; Xu, Fengping

    2018-01-01

    Background: DNA methylation plays a key role in the regulation of gene expression and carcinogenesis. Bisulfite sequencing studies mainly focus on calling SNP, DMR, and ASM. Until now, only a few software tools focus on virus integration using bisulfite sequencing data. Findings: We have developed...... a new and easy-to-use software tool, named BS-virus-finder (BSVF, RRID:SCR_015727), to detect viral integration breakpoints in whole human genomes. The tool is hosted at https://github.com/BGI-SZ/BSVF. Conclusions: BS-virus-finder demonstrates high sensitivity and specificity. It is useful in epigenetic...

  20. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    Science.gov (United States)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  1. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Science.gov (United States)

    Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

    2006-01-01

    Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid

  2. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Directory of Open Access Journals (Sweden)

    Farmerie William G

    2006-08-01

    Full Text Available Abstract Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20 System (454 Life Sciences Corporation, to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae and Platanus occidentalis (Platanaceae. Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy

  3. Identification of Differentially Methylated Sites with Weak Methylation Effects

    Directory of Open Access Journals (Sweden)

    Hong Tran

    2018-02-01

    Full Text Available Deoxyribonucleic acid (DNA methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ twins who have different pain sensitivities—both datasets have weak methylation effects of <1%—show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same

  4. AMID: Accurate Magnetic Indoor Localization Using Deep Learning

    Directory of Open Access Journals (Sweden)

    Namkyoung Lee

    2018-05-01

    Full Text Available Geomagnetic-based indoor positioning has drawn a great attention from academia and industry due to its advantage of being operable without infrastructure support and its reliable signal characteristics. However, it must overcome the problems of ambiguity that originate with the nature of geomagnetic data. Most studies manage this problem by incorporating particle filters along with inertial sensors. However, they cannot yield reliable positioning results because the inertial sensors in smartphones cannot precisely predict the movement of users. There have been attempts to recognize the magnetic sequence pattern, but these attempts are proven only in a one-dimensional space, because magnetic intensity fluctuates severely with even a slight change of locations. This paper proposes accurate magnetic indoor localization using deep learning (AMID, an indoor positioning system that recognizes magnetic sequence patterns using a deep neural network. Features are extracted from magnetic sequences, and then the deep neural network is used for classifying the sequences by patterns that are generated by nearby magnetic landmarks. Locations are estimated by detecting the landmarks. AMID manifested the proposed features and deep learning as an outstanding classifier, revealing the potential of accurate magnetic positioning with smartphone sensors alone. The landmark detection accuracy was over 80% in a two-dimensional environment.

  5. Fast and accurate methods for phylogenomic analyses

    Directory of Open Access Journals (Sweden)

    Warnow Tandy

    2011-10-01

    Full Text Available Abstract Background Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods. Results We report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions, substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct. Conclusions Our study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.

  6. Rapid Multiplex Small DNA Sequencing on the MinION Nanopore Sequencing Platform

    Directory of Open Access Journals (Sweden)

    Shan Wei

    2018-05-01

    Full Text Available Real-time sequencing of short DNA reads has a wide variety of clinical and research applications including screening for mutations, target sequences and aneuploidy. We recently demonstrated that MinION, a nanopore-based DNA sequencing device the size of a USB drive, could be used for short-read DNA sequencing. In this study, an ultra-rapid multiplex library preparation and sequencing method for the MinION is presented and applied to accurately test normal diploid and aneuploidy samples’ genomic DNA in under three hours, including library preparation and sequencing. This novel method shows great promise as a clinical diagnostic test for applications requiring rapid short-read DNA sequencing.

  7. MethylMeter(®): bisulfite-free quantitative and sensitive DNA methylation profiling and mutation detection in FFPE samples.

    Science.gov (United States)

    McCarthy, David; Pulverer, Walter; Weinhaeusel, Andreas; Diago, Oscar R; Hogan, Daniel J; Ostertag, Derek; Hanna, Michelle M

    2016-06-01

    Development of a sensitive method for DNA methylation profiling and associated mutation detection in clinical samples. Formalin-fixed and paraffin-embedded tumors received by clinical laboratories often contain insufficient DNA for analysis with bisulfite or methylation sensitive restriction enzymes-based methods. To increase sensitivity, methyl-CpG DNA capture and Coupled Abscription PCR Signaling detection were combined in a new assay, MethylMeter(®). Gliomas were analyzed for MGMT methylation, glioma CpG island methylator phenotype and IDH1 R132H. MethylMeter had 100% assay success rate measuring all five biomarkers in formalin-fixed and paraffin-embedded tissue. MGMT methylation results were supported by survival and mRNA expression data. MethylMeter is a sensitive and quantitative method for multitarget DNA methylation profiling and associated mutation detection. The MethylMeter-based GliomaSTRAT assay measures methylation of four targets and one mutation to simultaneously grade gliomas and predict their response to temozolomide. This information is clinically valuable in management of gliomas.

  8. CaMV-35S promoter sequence-specific DNA methylation in lettuce.

    Science.gov (United States)

    Okumura, Azusa; Shimada, Asahi; Yamasaki, Satoshi; Horino, Takuya; Iwata, Yuji; Koizumi, Nozomu; Nishihara, Masahiro; Mishiba, Kei-ichiro

    2016-01-01

    We found 35S promoter sequence-specific DNA methylation in lettuce. Additionally, transgenic lettuce plants having a modified 35S promoter lost methylation, suggesting the modified sequence is subjected to the methylation machinery. We previously reported that cauliflower mosaic virus 35S promoter-specific DNA methylation in transgenic gentian (Gentiana triflora × G. scabra) plants occurs irrespective of the copy number and the genomic location of T-DNA, and causes strong gene silencing. To confirm whether 35S-specific methylation can occur in other plant species, transgenic lettuce (Lactuca sativa L.) plants with a single copy of the 35S promoter-driven sGFP gene were produced and analyzed. Among 10 lines of transgenic plants, 3, 4, and 3 lines showed strong, weak, and no expression of sGFP mRNA, respectively. Bisulfite genomic sequencing of the 35S promoter region showed hypermethylation at CpG and CpWpG (where W is A or T) sites in 9 of 10 lines. Gentian-type de novo methylation pattern, consisting of methylated cytosines at CpHpH (where H is A, C, or T) sites, was also observed in the transgenic lettuce lines, suggesting that lettuce and gentian share similar methylation machinery. Four of five transgenic lettuce lines having a single copy of a modified 35S promoter, which was modified in the proposed core target of de novo methylation in gentian, exhibited 35S hypomethylation, indicating that the modified sequence may be the target of the 35S-specific methylation machinery.

  9. Enhanced degradation of organic contaminants in water by peroxydisulfate coupled with bisulfite

    Energy Technology Data Exchange (ETDEWEB)

    Qi, Chengdu, E-mail: qichengdu@mail.tsinghua.edu.cn [State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing 100875 (China); State Key Joint Laboratory of Environment Simulation and Pollution Control (SKLESPC), Beijing Key Laboratory for Emerging Organic Contaminants Control, School of Environment, Tsinghua University, Beijing 100084 (China); Liu, Xitao, E-mail: liuxt@bnu.edu.cn [State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing 100875 (China); Li, Yang; Lin, Chunye; Ma, Jun; Li, Xiaowan; Zhang, Huijuan [State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing 100875 (China)

    2017-04-15

    Highlights: • S(IV)/PDS system showed synergistic degradation of BPA than S(IV) and PDS. • BPA degradation involved hydroxyl and oxysulfur radicals in the S(IV)/PDS system. • Based on the identified intermediates, the BPA degradation pathway was proposed. - Abstract: In this study, the bisulfite-peroxydisulfate system (S(IV)/PDS) widely used in polymerization was innovatively applied for organic contaminants degradation in water. The addition of S(IV) into PDS system remarkably enhanced the degradation efficiency of bisphenol A (BPA, a frequently detected endocrine disrupting chemical in the environments) from 17.0% to 84.7% within 360 min. The degradation efficiency of BPA in the S(IV)/PDS system followed pseudo-first-order kinetics, with rate constant values ranging from 0.00005 min{sup −1} to 0.02717 min{sup −1} depending on the operating parameters, such as the initial S(IV) and PDS dosage, solution pH, reaction temperature, chloride and water type. Furthermore, nitrogen purging experiment, radical scavenging experiment and electron spin resonance (ESR) analysis were used to elucidate the possible mechanism. The results revealed that sulfate radical was the dominant reactive species in the S(IV)/PDS system. Finally, based on the results of liquid chromatography–mass spectrometry (LC–MS) and gas chromatography–mass spectrometry (GC–MS), the BPA degradation pathway was proposed to involve β-scission (C−C), hydroxylation, dehydration, oxidative skeletal rearrangement, and ring opening. This study helps to characterize the combination of PDS and inorganic S(IV), a common industrial contaminant, to generate reactive species to enhance organic contaminants degradation in water.

  10. Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens

    Science.gov (United States)

    Reynolds, Sheila M.; Bilmes, Jeff A.; Noble, William Stafford

    2010-01-01

    DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence—301 base pairs, centered at the position to be scored—with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the bulk of the

  11. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

    Science.gov (United States)

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201

  12. Indexed variation graphs for efficient and accurate resistome profiling.

    Science.gov (United States)

    Rowe, Will P M; Winn, Martyn D

    2018-05-14

    Antimicrobial resistance remains a major threat to global health. Profiling the collective antimicrobial resistance genes within a metagenome (the "resistome") facilitates greater understanding of antimicrobial resistance gene diversity and dynamics. In turn, this can allow for gene surveillance, individualised treatment of bacterial infections and more sustainable use of antimicrobials. However, resistome profiling can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed an efficient and accurate method for resistome profiling that addresses these complications and improves upon currently available tools. Our method combines a variation graph representation of gene sets with an LSH Forest indexing scheme to allow for fast classification of metagenomic sequence reads using similarity-search queries. Subsequent hierarchical local alignment of classified reads against graph traversals enables accurate reconstruction of full-length gene sequences using a scoring scheme. We provide our implementation, GROOT, and show it to be both faster and more accurate than a current reference-dependent tool for resistome profiling. GROOT runs on a laptop and can process a typical 2 gigabyte metagenome in 2 minutes using a single CPU. Our method is not restricted to resistome profiling and has the potential to improve current metagenomic workflows. GROOT is written in Go and is available at https://github.com/will-rowe/groot (MIT license). will.rowe@stfc.ac.uk. Supplementary data are available at Bioinformatics online.

  13. Short read sequence typing (SRST: multi-locus sequence types from short reads

    Directory of Open Access Journals (Sweden)

    Inouye Michael

    2012-07-01

    Full Text Available Abstract Background Multi-locus sequence typing (MLST has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context. Results We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles. Conclusions SRST is a novel

  14. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

    Science.gov (United States)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

    2009-07-01

    The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

  15. Using structure to explore the sequence alignment space of remote homologs.

    Science.gov (United States)

    Kuziemko, Andrew; Honig, Barry; Petrey, Donald

    2011-10-01

    Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

  16. Environmental assessment of mild bisulfite pretreatment of forest residues into fermentable sugars for biofuel production.

    Science.gov (United States)

    Nwaneshiudu, Ikechukwu C; Ganguly, Indroneil; Pierobon, Francesca; Bowers, Tait; Eastin, Ivan

    2016-01-01

    Sugar production via pretreatment and enzymatic hydrolysis of cellulosic feedstock, in this case softwood harvest residues, is a critical step in the biochemical conversion pathway towards drop-in biofuels. Mild bisulfite (MBS) pretreatment is an emerging option for the breakdown and subsequent processing of biomass towards fermentable sugars. An environmental assessment of this process is critical to discern its future sustainability in the ever-changing biofuels landscape. The subsequent cradle-to-gate assessment of a proposed sugar production facility analyzes sugar made from woody biomass using MBS pretreatment across all seven impact categories (functional unit 1 kg dry mass sugar), with a specific focus on potential global warming and eutrophication impacts. The study found that the eutrophication impact (0.000201 kg N equivalent) is less than the impacts from conventional beet and cane sugars, while the global warming impact (0.353 kg CO2 equivalent) falls within the range of conventional processes. This work discusses some of the environmental impacts of designing and operating a sugar production facility that uses MBS as a method of treating cellulosic forest residuals. The impacts of each unit process in the proposed facility are highlighted. A comparison to other sugar-making process is detailed and will inform the growing biofuels literature.

  17. DNA Methylation as a Biomarker for Body Fluid Identification

    Directory of Open Access Journals (Sweden)

    Rania Gomaa

    2017-12-01

    Full Text Available Currently, available identification techniques for forensic samples are either enzyme or protein based, which can be subjected to degradation, thus limiting its storage potentials. Epigenetic changes arising due to DNA methylation and histone acetylation can be used for body fluid identification. Markers DACT1, USP49, ZC3H12D, FGF7, cg23521140, cg17610929, chromosome 4 (25287119–25287254, chromosome 11 (72085678–72085798, 57171095–57171236, 1493401–1493538, and chromosome 19 (47395505–47395651 are currently being used for semen identification. Markers cg26107890, cg20691722, cg01774894 and cg14991487 are used to differentiate saliva and vaginal secretions from other body fluids. However, such markers show overlapping methylation pattern. This review article aimed to highlight the feasibility of using DNA methylation of certain genetic markers in body fluid identification and its implications for forensic investigations. The reviewed articles have employed molecular genetics techniques such as Bisulfite sequencing PCR (BSP, methylation specific PCR (MSP, Pyrosequencing, Combined Bisulfite Restriction Analysis (COBRA, Methylation-sensitive Single Nucleotide Primer Extension (SNuPE, and Multiplex SNaPshot Microarray. Bioinformatics software such as MATLAB and BiQ Analyzer has been used. Biological fluids have different methylation patterns and thus, this difference can be used to identify the nature of the biological fluid found at the crime scene. Using DNA methylation to identify the body fluids gives accurate results without consumption of the trace evidence and requires a minute amount of DNA for analysis. Recent studies have incorporated next-generation sequencing aiming to find out more reliable markers that can differentiate between different body fluids. Nonetheless, new DNA methylation markers are yet to be discovered to accurately differentiate between saliva and vaginal secretions with high confidence. Epigenetic changes are

  18. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  19. Using structure to explore the sequence alignment space of remote homologs.

    Directory of Open Access Journals (Sweden)

    Andrew Kuziemko

    2011-10-01

    Full Text Available Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.

  20. Historian: accurate reconstruction of ancestral sequences and evolutionary rates.

    Science.gov (United States)

    Holmes, Ian H

    2017-04-15

    Reconstruction of ancestral sequence histories, and estimation of parameters like indel rates, are improved by using explicit evolutionary models and summing over uncertain alignments. The previous best tool for this purpose (according to simulation benchmarks) was ProtPal, but this tool was too slow for practical use. Historian combines an efficient reimplementation of the ProtPal algorithm with performance-improving heuristics from other alignment tools. Simulation results on fidelity of rate estimation via ancestral reconstruction, along with evaluations on the structurally informed alignment dataset BAliBase 3.0, recommend Historian over other alignment tools for evolutionary applications. Historian is available at https://github.com/evoldoers/historian under the Creative Commons Attribution 3.0 US license. ihholmes+historian@gmail.com. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  1. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

    Science.gov (United States)

    Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2014-09-07

    Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. A rapid, naked-eye detection of hypochlorite and bisulfite using a robust and highly-photostable indicator dye Quinaldine Red in aqueous medium

    Science.gov (United States)

    Dutta, Tanoy; Chandra, Falguni; Koner, Apurba L.

    2018-02-01

    A ;naked-eye; detection of health hazardous bisulfite (HSO3-) and hypochlorite (ClO-) using an indicator dye (Quinaldine Red, QR) in a wide range of pH is demonstrated. The molecule contains a quinoline moiety linked to an N,N-dimethylaniline moiety with a conjugated double bond. Treatment of QR with HSO3- and ClO-, in aqueous solution at near-neutral pH, resulted in a colorless product with high selectivity and sensitivity. The detection limit was 47.8 μM and 0.2 μM for HSO3- and ClO- respectively. However, ClO- was 50 times more sensitive and with 2 times faster response compared to HSO3-. The detail characterization and related analysis demonstrate the potential of QR for a rapid, robust and highly efficient colorimetric sensor for the practical applications to detect hypochlorite in water samples.

  3. Device accurately measures and records low gas-flow rates

    Science.gov (United States)

    Branum, L. W.

    1966-01-01

    Free-floating piston in a vertical column accurately measures and records low gas-flow rates. The system may be calibrated, using an adjustable flow-rate gas supply, a low pressure gage, and a sequence recorder. From the calibration rates, a nomograph may be made for easy reduction. Temperature correction may be added for further accuracy.

  4. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.

    Science.gov (United States)

    Leung, Henry C M; Yiu, S M; Yang, Bin; Peng, Yu; Wang, Yi; Liu, Zhihua; Chen, Jingchi; Qin, Junjie; Li, Ruiqiang; Chin, Francis Y L

    2011-06-01

    With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'. Binning algorithms that are based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms or phylogenetic markers. Due to the limited availability of reference genomes and the bias and low availability of markers, these algorithms may not be applicable in all cases. Unsupervised binning algorithms which can handle fragments from unknown species provide an alternative approach. However, existing unsupervised binning algorithms only work on datasets either with balanced species abundance ratios or rather different abundance ratios, but not both. In this article, we present MetaCluster 3.0, an integrated binning method based on the unsupervised top--down separation and bottom--up merging strategy, which can bin metagenomic fragments of species with very balanced abundance ratios (say 1:1) to very different abundance ratios (e.g. 1:24) with consistently higher accuracy than existing methods. MetaCluster 3.0 can be downloaded at http://i.cs.hku.hk/~alse/MetaCluster/.

  5. Methylation Integration (Mint) | Informatics Technology for Cancer Research (ITCR)

    Science.gov (United States)

    A comprehensive software pipeline and set of Galaxy tools/workflows for integrative analysis of genome-wide DNA methylation and hydroxymethylation data. Data types can be either bisulfite sequencing and/or pull-down methods.

  6. Fast and accurate phylogeny reconstruction using filtered spaced-word matches

    Science.gov (United States)

    Sohrabi-Jahromi, Salma; Morgenstern, Burkhard

    2017-01-01

    Abstract Motivation: Word-based or ‘alignment-free’ algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don’t-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don’t-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don’t-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. Availability and Implementation: The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/ Contact: chris.leimeister@stud.uni-goettingen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28073754

  7. Facilitating genome navigation : survey sequencing and dense radiation-hybrid gene mapping

    NARCIS (Netherlands)

    Hitte, C; Madeoy, J; Kirkness, EF; Priat, C; Lorentzen, TD; Senger, F; Thomas, D; Derrien, T; Ramirez, C; Scott, C; Evanno, G; Pullar, B; Cadieu, E; Oza, [No Value; Lourgant, K; Jaffe, DB; Tacher, S; Dreano, S; Berkova, N; Andre, C; Deloukas, P; Fraser, C; Lindblad-Toh, K; Ostrander, EA; Galibert, F

    Accurate and comprehensive sequence coverage for large genomes has been restricted to only a few species of specific interest. Lower sequence coverage (survey sequencing) of related species can yield a wealth of information about gene content and putative regulatory elements. But survey sequences

  8. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

    Science.gov (United States)

    Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie

    2018-04-16

    RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.

  9. An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing

    KAUST Repository

    Han, Renmin; Li, Yu; Wang, Sheng; Gao, Xin

    2017-01-01

    Long-reads, point-of-care, and PCR-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the global mapping between the raw electrical current signal sequence and the expected signal sequence from

  10. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  11. Accurate molecular diagnosis of phenylketonuria and tetrahydrobiopterin-deficient hyperphenylalaninemias using high-throughput targeted sequencing

    Science.gov (United States)

    Trujillano, Daniel; Perez, Belén; González, Justo; Tornador, Cristian; Navarrete, Rosa; Escaramis, Georgia; Ossowski, Stephan; Armengol, Lluís; Cornejo, Verónica; Desviat, Lourdes R; Ugarte, Magdalena; Estivill, Xavier

    2014-01-01

    Genetic diagnostics of phenylketonuria (PKU) and tetrahydrobiopterin (BH4) deficient hyperphenylalaninemia (BH4DH) rely on methods that scan for known mutations or on laborious molecular tools that use Sanger sequencing. We have implemented a novel and much more efficient strategy based on high-throughput multiplex-targeted resequencing of four genes (PAH, GCH1, PTS, and QDPR) that, when affected by loss-of-function mutations, cause PKU and BH4DH. We have validated this approach in a cohort of 95 samples with the previously known PAH, GCH1, PTS, and QDPR mutations and one control sample. Pooled barcoded DNA libraries were enriched using a custom NimbleGen SeqCap EZ Choice array and sequenced using a HiSeq2000 sequencer. The combination of several robust bioinformatics tools allowed us to detect all known pathogenic mutations (point mutations, short insertions/deletions, and large genomic rearrangements) in the 95 samples, without detecting spurious calls in these genes in the control sample. We then used the same capture assay in a discovery cohort of 11 uncharacterized HPA patients using a MiSeq sequencer. In addition, we report the precise characterization of the breakpoints of four genomic rearrangements in PAH, including a novel deletion of 899 bp in intron 3. Our study is a proof-of-principle that high-throughput-targeted resequencing is ready to substitute classical molecular methods to perform differential genetic diagnosis of hyperphenylalaninemias, allowing the establishment of specifically tailored treatments a few days after birth. PMID:23942198

  12. A simple and accurate two-step long DNA sequences synthesis strategy to improve heterologous gene expression in pichia.

    Directory of Open Access Journals (Sweden)

    Jiang-Ke Yang

    Full Text Available In vitro gene chemical synthesis is a powerful tool to improve the expression of gene in heterologous system. In this study, a two-step gene synthesis strategy that combines an assembly PCR and an overlap extension PCR (AOE was developed. In this strategy, the chemically synthesized oligonucleotides were assembled into several 200-500 bp fragments with 20-25 bp overlap at each end by assembly PCR, and then an overlap extension PCR was conducted to assemble all these fragments into a full length DNA sequence. Using this method, we de novo designed and optimized the codon of Rhizopus oryzae lipase gene ROL (810 bp and Aspergillus niger phytase gene phyA (1404 bp. Compared with the original ROL gene and phyA gene, the codon-optimized genes expressed at a significantly higher level in yeasts after methanol induction. We believe this AOE method to be of special interest as it is simple, accurate and has no limitation with respect to the size of the gene to be synthesized. Combined with de novo design, this method allows the rapid synthesis of a gene optimized for expression in the system of choice and production of sufficient biological material for molecular characterization and biotechnological application.

  13. Induction and maintenance of DNA methylation in plant promoter sequences by apple latent spherical virus-induced transcriptional gene silencing

    Directory of Open Access Journals (Sweden)

    Tatsuya eKon

    2014-11-01

    Full Text Available Apple latent spherical virus (ALSV is an efficient virus-induced gene silencing vector in functional genomics analyses of a broad range of plant species. Here, an Agrobacterium-mediated inoculation (agroinoculation system was developed for the ALSV vector, and virus-induced transcriptional gene silencing (VITGS is described in plants infected with the ALSV vector. The cDNAs of ALSV RNA1 and RNA2 were inserted between the CaMV 35S promoter and the NOS-T sequences in a binary vector pCAMBIA1300 to produce pCALSR1 and pCALSR2-XSB or pCALSR2-XSB/MN. When these vector constructs were agroinoculated into Nicotiana benthamiana plants with a construct expressing a viral silencing suppressor, the infection efficiency of the vectors was 100%. A recombinant ALSV vector carrying part of the 35S promoter sequence induced transcriptional gene silencing of the green fluorescent protein gene in a line of N. benthamiana plants, resulting in the disappearance of green fluorescence of infected plants. Bisulfite sequencing showed that cytosine residues at CG and CHG sites of the 35S promoter sequence were highly methylated in the silenced generation 0 plants infected with the ALSV carrying the promoter sequence as well as in progeny. The ALSV-mediated VITGS state was inherited by progeny for multiple generations. In addition, induction of VITGS of an endogenous gene (chalcone synthase-A was demonstrated in petunia plants infected with an ALSV vector carrying the native promoter sequence. These results suggest that ALSV-based vectors can be applied to study DNA methylation in plant genomes, and provide a useful tool for plant breeding via epigenetic modification.

  14. READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation

    KAUST Repository

    Naeem, Raeece

    2012-11-28

    Summary: READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes (Supplementary Material). Availability: http://cbrc.kaust.edu.sa/readscan Contact: or raeece.naeem@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. 2012 The Author(s).

  15. READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation

    KAUST Repository

    Naeem, Raeece; Rashid, Mamoon; Pain, Arnab

    2012-01-01

    Summary: READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes (Supplementary Material). Availability: http://cbrc.kaust.edu.sa/readscan Contact: or raeece.naeem@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. 2012 The Author(s).

  16. Methylation-Specific PCR Unraveled

    Directory of Open Access Journals (Sweden)

    Sarah Derks

    2004-01-01

    Full Text Available Methylation‐specific PCR (MSP is a simple, quick and cost‐effective method to analyze the DNA methylation status of virtually any group of CpG sites within a CpG island. The technique comprises two parts: (1 sodium bisulfite conversion of unmethylated cytosine's to uracil under conditions whereby methylated cytosines remains unchanged and (2 detection of the bisulfite induced sequence differences by PCR using specific primer sets for both unmethylated and methylated DNA. This review discusses the critical parameters of MSP and presents an overview of the available MSP variants and the (clinical applications.

  17. A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2.

    Science.gov (United States)

    Li, Jianli; Dai, Hongzheng; Feng, Yanming; Tang, Jia; Chen, Stella; Tian, Xia; Gorman, Elizabeth; Schmitt, Eric S; Hansen, Terah A A; Wang, Jing; Plon, Sharon E; Zhang, Victor Wei; Wong, Lee-Jun C

    2015-09-01

    Germline mutations in the DNA mismatch repair gene PMS2 underlie the cancer susceptibility syndrome, Lynch syndrome. However, accurate molecular testing of PMS2 is complicated by a large number of highly homologous sequences. To establish a comprehensive approach for mutation detection of PMS2, we have designed a strategy combining targeted capture next-generation sequencing (NGS), multiplex ligation-dependent probe amplification, and long-range PCR followed by NGS to simultaneously detect point mutations and copy number changes of PMS2. Exonic deletions (E2 to E9, E5 to E9, E8, E10, E14, and E1 to E15), duplications (E11 to E12), and a nonsense mutation, p.S22*, were identified. Traditional multiplex ligation-dependent probe amplification and Sanger sequencing approaches cannot differentiate the origin of the exonic deletions in the 3' region when PMS2 and PMS2CL share identical sequences as a result of gene conversion. Our approach allows unambiguous identification of mutations in the active gene with a straightforward long-range-PCR/NGS method. Breakpoint analysis of multiple samples revealed that recurrent exon 14 deletions are mediated by homologous Alu sequences. Our comprehensive approach provides a reliable tool for accurate molecular analysis of genes containing multiple copies of highly homologous sequences and should improve PMS2 molecular analysis for patients with Lynch syndrome. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  18. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

    KAUST Repository

    Allam, Amin; Kalnis, Panos; Solovyev, Victor

    2015-01-01

    accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality

  19. A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics.

    Directory of Open Access Journals (Sweden)

    Qiang Song

    Full Text Available DNA methylation is implicated in a surprising diversity of regulatory, evolutionary processes and diseases in eukaryotes. The introduction of whole-genome bisulfite sequencing has enabled the study of DNA methylation at a single-base resolution, revealing many new aspects of DNA methylation and highlighting the usefulness of methylome data in understanding a variety of genomic phenomena. As the number of publicly available whole-genome bisulfite sequencing studies reaches into the hundreds, reliable and convenient tools for comparing and analyzing methylomes become increasingly important. We present MethPipe, a pipeline for both low and high-level methylome analysis, and MethBase, an accompanying database of annotated methylomes from the public domain. Together these resources enable researchers to extract interesting features from methylomes and compare them with those identified in public methylomes in our database.

  20. Noninvasive prenatal diagnosis of fetal trisomy 18 and trisomy 13 by maternal plasma DNA sequencing.

    NARCIS (Netherlands)

    Chen, E.Z.; Chiu, R.W.; Sun, H; Akolekar, R.; Chan, K.C.; Leung, T.Y.; Jiang, P.; Zheng, Y.W.; Lun, F.M.; Chan, L.Y.; Jin, Y.; Go, A.T.; Lau, E.T; To, W.W.; Leung, W.C.; Tang, R.Y.; Au-Yeung, S.K.; Lam, H.; Kung, Y.Y.; Zhang, X.; Vugt, J.M.G. van; Minekawa, R.; Tang, M.H.; Wang, J.; Oudejans, C.B.; Lau, T.K.; Nicolaides, K.H.; Lo, Y.M.

    2011-01-01

    Massively parallel sequencing of DNA molecules in the plasma of pregnant women has been shown to allow accurate and noninvasive prenatal detection of fetal trisomy 21. However, whether the sequencing approach is as accurate for the noninvasive prenatal diagnosis of trisomy 13 and 18 is unclear due

  1. An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1.

    Directory of Open Access Journals (Sweden)

    Sergei L Kosakovsky Pond

    2009-11-01

    Full Text Available Genetically diverse pathogens (such as Human Immunodeficiency virus type 1, HIV-1 are frequently stratified into phylogenetically or immunologically defined subtypes for classification purposes. Computational identification of such subtypes is helpful in surveillance, epidemiological analysis and detection of novel variants, e.g., circulating recombinant forms in HIV-1. A number of conceptually and technically different techniques have been proposed for determining the subtype of a query sequence, but there is not a universally optimal approach. We present a model-based phylogenetic method for automatically subtyping an HIV-1 (or other viral or bacterial sequence, mapping the location of breakpoints and assigning parental sequences in recombinant strains as well as computing confidence levels for the inferred quantities. Our Subtype Classification Using Evolutionary ALgorithms (SCUEAL procedure is shown to perform very well in a variety of simulation scenarios, runs in parallel when multiple sequences are being screened, and matches or exceeds the performance of existing approaches on typical empirical cases. We applied SCUEAL to all available polymerase (pol sequences from two large databases, the Stanford Drug Resistance database and the UK HIV Drug Resistance Database. Comparing with subtypes which had previously been assigned revealed that a minor but substantial (approximately 5% fraction of pure subtype sequences may in fact be within- or inter-subtype recombinants. A free implementation of SCUEAL is provided as a module for the HyPhy package and the Datamonkey web server. Our method is especially useful when an accurate automatic classification of an unknown strain is desired, and is positioned to complement and extend faster but less accurate methods. Given the increasingly frequent use of HIV subtype information in studies focusing on the effect of subtype on treatment, clinical outcome, pathogenicity and vaccine design, the importance

  2. Treatment of PCR products with exonuclease I and heat-labile alkaline phosphatase improves the visibility of combined bisulfite restriction analysis

    International Nuclear Information System (INIS)

    Watanabe, Kousuke; Emoto, Noriko; Sunohara, Mitsuhiro; Kawakami, Masanori; Kage, Hidenori; Nagase, Takahide; Ohishi, Nobuya; Takai, Daiya

    2010-01-01

    Research highlights: → Incubating PCR products at a high temperature causes smears in gel electrophoresis. → Smears interfere with the interpretation of methylation analysis using COBRA. → Treatment with exonuclease I and heat-labile alkaline phosphatase eliminates smears. → The elimination of smears improves the visibility of COBRA. -- Abstract: DNA methylation plays a vital role in the regulation of gene expression. Abnormal promoter hypermethylation is an important mechanism of inactivating tumor suppressor genes in human cancers. Combined bisulfite restriction analysis (COBRA) is a widely used method for identifying the DNA methylation of specific CpG sites. Here, we report that exonuclease I and heat-labile alkaline phosphatase can be used for PCR purification for COBRA, improving the visibility of gel electrophoresis after restriction digestion. This improvement is observed when restriction digestion is performed at a high temperature, such as 60 o C or 65 o C, with BstUI and TaqI, respectively. This simple method can be applied instead of DNA purification using spin columns or phenol/chloroform extraction. It can also be applied to other situations when PCR products are digested by thermophile-derived restriction enzymes, such as PCR restriction fragment length polymorphism (RFLP) analysis.

  3. ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers

    Energy Technology Data Exchange (ETDEWEB)

    Visel, Axel; Blow, Matthew J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Ren, Bing; Rubin, Edward M.; Pennacchio, Len A.

    2009-02-01

    A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.

  4. Zadoff-Chu coded ultrasonic signal for accurate range estimation

    KAUST Repository

    AlSharif, Mohammed H.

    2017-11-02

    This paper presents a new adaptation of Zadoff-Chu sequences for the purpose of range estimation and movement tracking. The proposed method uses Zadoff-Chu sequences utilizing a wideband ultrasonic signal to estimate the range between two devices with very high accuracy and high update rate. This range estimation method is based on time of flight (TOF) estimation using cyclic cross correlation. The system was experimentally evaluated under different noise levels and multi-user interference scenarios. For a single user, the results show less than 7 mm error for 90% of range estimates in a typical indoor environment. Under the interference from three other users, the 90% error was less than 25 mm. The system provides high estimation update rate allowing accurate tracking of objects moving with high speed.

  5. Zadoff-Chu coded ultrasonic signal for accurate range estimation

    KAUST Repository

    AlSharif, Mohammed H.; Saad, Mohamed; Siala, Mohamed; Ballal, Tarig; Boujemaa, Hatem; Al-Naffouri, Tareq Y.

    2017-01-01

    This paper presents a new adaptation of Zadoff-Chu sequences for the purpose of range estimation and movement tracking. The proposed method uses Zadoff-Chu sequences utilizing a wideband ultrasonic signal to estimate the range between two devices with very high accuracy and high update rate. This range estimation method is based on time of flight (TOF) estimation using cyclic cross correlation. The system was experimentally evaluated under different noise levels and multi-user interference scenarios. For a single user, the results show less than 7 mm error for 90% of range estimates in a typical indoor environment. Under the interference from three other users, the 90% error was less than 25 mm. The system provides high estimation update rate allowing accurate tracking of objects moving with high speed.

  6. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Science.gov (United States)

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  7. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    Directory of Open Access Journals (Sweden)

    Steven Kelly

    Full Text Available The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  8. Single base-resolution methylome of the silkworm reveals a sparse epigenomic map

    DEFF Research Database (Denmark)

    Xiang, Hui; Zhu, Jingde; Chen, Quan

    2010-01-01

    Epigenetic regulation in insects may have effects on diverse biological processes. Here we survey the methylome of a model insect, the silkworm Bombyx mori, at single-base resolution using Illumina high-throughput bisulfite sequencing (MethylC-Seq). We conservatively estimate that 0.11% of genomi...

  9. The potential role of Alu Y in the development of resistance to SN38 (Irinotecan) or oxaliplatin in colorectal cancer

    DEFF Research Database (Denmark)

    Lin, Xue; Stenvang, Jan; Rasmussen, Mads Heilskov

    2015-01-01

    toxicity induced by carcinogens or drugs can reactivate Alus by altering DNA methylation. Whether or not reactivation of Alus occurs in SN38 and oxaliplatin resistance remains unknown. Results: We applied reduced representation bisulfite sequencing (RRBS) to investigate the DNA methylome in SN38...

  10. CpG methylation differences between neurons and glia are highly conserved from mouse to human

    Science.gov (United States)

    Understanding epigenetic differences that distinguish neurons and glia is of fundamental importance to the nascent field of neuroepigenetics. A recent study used genome-wide bisulfite sequencing to survey differences in DNA methylation between these two cell types, in both humans and mice. That stud...

  11. Experimental Influences in the Accurate Measurement of Cartilage Thickness in MRI.

    Science.gov (United States)

    Wang, Nian; Badar, Farid; Xia, Yang

    2018-01-01

    Objective To study the experimental influences to the measurement of cartilage thickness by magnetic resonance imaging (MRI). Design The complete thicknesses of healthy and trypsin-degraded cartilage were measured at high-resolution MRI under different conditions, using two intensity-based imaging sequences (ultra-short echo [UTE] and multislice-multiecho [MSME]) and 3 quantitative relaxation imaging sequences (T 1 , T 2 , and T 1 ρ). Other variables included different orientations in the magnet, 2 soaking solutions (saline and phosphate buffered saline [PBS]), and external loading. Results With cartilage soaked in saline, UTE and T 1 methods yielded complete and consistent measurement of cartilage thickness, while the thickness measurement by T 2 , T 1 ρ, and MSME methods were orientation dependent. The effect of external loading on cartilage thickness is also sequence and orientation dependent. All variations in cartilage thickness in MRI could be eliminated with the use of a 100 mM PBS or imaged by UTE sequence. Conclusions The appearance of articular cartilage and the measurement accuracy of cartilage thickness in MRI can be influenced by a number of experimental factors in ex vivo MRI, from the use of various pulse sequences and soaking solutions to the health of the tissue. T 2 -based imaging sequence, both proton-intensity sequence and quantitative relaxation sequence, similarly produced the largest variations. With adequate resolution, the accurate measurement of whole cartilage tissue in clinical MRI could be utilized to detect differences between healthy and osteoarthritic cartilage after compression.

  12. LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

    DEFF Research Database (Denmark)

    Will, Sebastian; Joshi, Tejal; Hofacker, Ivo L.

    2012-01-01

    Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing...... methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based...... on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used nc...

  13. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  14. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  15. Efficient alignment of pyrosequencing reads for re-sequencing applications

    Directory of Open Access Journals (Sweden)

    Russo Luis MS

    2011-05-01

    Full Text Available Abstract Background Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. Results We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454 system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. Conclusions The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.

  16. HIPPI: highly accurate protein family classification with ensembles of HMMs

    Directory of Open Access Journals (Sweden)

    Nam-phuong Nguyen

    2016-11-01

    Full Text Available Abstract Background Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. Results We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification. HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. Conclusion HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  17. CycloBranch: De Novo Sequencing of Nonribosomal Peptides from Accurate Product Ion Mass Spectra

    Czech Academy of Sciences Publication Activity Database

    Novák, Jiří; Lemr, Karel; Schug, K. A.; Havlíček, Vladimír

    2015-01-01

    Roč. 26, č. 10 (2015), s. 1780-1786 ISSN 1044-0305 R&D Projects: GA ČR(CZ) GAP206/12/1150 Grant - others:OPPC(XE) CZ.2.16/3.1.00/24023 Institutional support: RVO:61388971 Keywords : De novo sequencing * Nonribosomal peptides * Linear Subject RIV: CE - Biochemistry Impact factor: 3.031, year: 2015

  18. An accurate and efficient method for large-scale SSR genotyping and applications.

    Science.gov (United States)

    Li, Lun; Fang, Zhiwei; Zhou, Junfei; Chen, Hong; Hu, Zhangfeng; Gao, Lifen; Chen, Lihong; Ren, Sheng; Ma, Hongyu; Lu, Long; Zhang, Weixiong; Peng, Hai

    2017-06-02

    Accurate and efficient genotyping of simple sequence repeats (SSRs) constitutes the basis of SSRs as an effective genetic marker with various applications. However, the existing methods for SSR genotyping suffer from low sensitivity, low accuracy, low efficiency and high cost. In order to fully exploit the potential of SSRs as genetic marker, we developed a novel method for SSR genotyping, named as AmpSeq-SSR, which combines multiplexing polymerase chain reaction (PCR), targeted deep sequencing and comprehensive analysis. AmpSeq-SSR is able to genotype potentially more than a million SSRs at once using the current sequencing techniques. In the current study, we simultaneously genotyped 3105 SSRs in eight rice varieties, which were further validated experimentally. The results showed that the accuracies of AmpSeq-SSR were nearly 100 and 94% with a single base resolution for homozygous and heterozygous samples, respectively. To demonstrate the power of AmpSeq-SSR, we adopted it in two applications. The first was to construct discriminative fingerprints of the rice varieties using 3105 SSRs, which offer much greater discriminative power than the 48 SSRs commonly used for rice. The second was to map Xa21, a gene that confers persistent resistance to rice bacterial blight. We demonstrated that genome-scale fingerprints of an organism can be efficiently constructed and candidate genes, such as Xa21 in rice, can be accurately and efficiently mapped using an innovative strategy consisting of multiplexing PCR, targeted sequencing and computational analysis. While the work we present focused on rice, AmpSeq-SSR can be readily extended to animals and micro-organisms. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Comparison of static model and dynamic model for the evaluation of station blackout sequences

    International Nuclear Information System (INIS)

    Lee, Kwang-Nam; Kang, Sun-Koo; Hong, Sung-Yull.

    1992-01-01

    Station blackout is one of major contributors to the core damage frequency (CDF) in many PSA studies. Since station blackout sequence exhibits dynamic features, accurate calculation of CDF for the station blackout sequence is not possible with event tree/fault tree (ET/FT) method. Although the integral method can determine accurate CDF, it is time consuming and is difficult to evaluate various alternative AC source configuration and sensitivities. In this study, a comparison is made between static model and dynamic model and a new methodology which combines static model and dynamic model is provided for the accurate quantification of CDF and evaluation of improvement alternatives. Results of several case studies show that accurate calculation of CDF is possible by introducing equivalent mission time. (author)

  20. A Dual-Mode Large-Arrayed CMOS ISFET Sensor for Accurate and High-Throughput pH Sensing in Biomedical Diagnosis.

    Science.gov (United States)

    Huang, Xiwei; Yu, Hao; Liu, Xu; Jiang, Yu; Yan, Mei; Wu, Dongping

    2015-09-01

    The existing ISFET-based DNA sequencing detects hydrogen ions released during the polymerization of DNA strands on microbeads, which are scattered into microwell array above the ISFET sensor with unknown distribution. However, false pH detection happens at empty microwells due to crosstalk from neighboring microbeads. In this paper, a dual-mode CMOS ISFET sensor is proposed to have accurate pH detection toward DNA sequencing. Dual-mode sensing, optical and chemical modes, is realized by integrating a CMOS image sensor (CIS) with ISFET pH sensor, and is fabricated in a standard 0.18-μm CIS process. With accurate determination of microbead physical locations with CIS pixel by contact imaging, the dual-mode sensor can correlate local pH for one DNA slice at one location-determined microbead, which can result in improved pH detection accuracy. Moreover, toward a high-throughput DNA sequencing, a correlated-double-sampling readout that supports large array for both modes is deployed to reduce pixel-to-pixel nonuniformity such as threshold voltage mismatch. The proposed CMOS dual-mode sensor is experimentally examined to show a well correlated pH map and optical image for microbeads with a pH sensitivity of 26.2 mV/pH, a fixed pattern noise (FPN) reduction from 4% to 0.3%, and a readout speed of 1200 frames/s. A dual-mode CMOS ISFET sensor with suppressed FPN for accurate large-arrayed pH sensing is proposed and demonstrated with state-of-the-art measured results toward accurate and high-throughput DNA sequencing. The developed dual-mode CMOS ISFET sensor has great potential for future personal genome diagnostics with high accuracy and low cost.

  1. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

    Science.gov (United States)

    Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

    2013-06-01

    We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

  2. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    Science.gov (United States)

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Centrifuge: rapid and sensitive classification of metagenomic sequences.

    Science.gov (United States)

    Kim, Daehwan; Song, Li; Breitwieser, Florian P; Salzberg, Steven L

    2016-12-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. © 2016 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  4. Accurate and simple wavefunctions for the helium isoelectronic sequence with correct cusp conditions

    Energy Technology Data Exchange (ETDEWEB)

    Rodriguez, K V [Departamento de Fisica, Universidad Nacional del Sur and Consejo Nacional de Investigaciones CientIficas y Tecnicas, 8000 BahIa Blanca, Buenos Aires (Argentina); Gasaneo, G [Departamento de Fisica, Universidad Nacional del Sur and Consejo Nacional de Investigaciones CientIficas y Tecnicas, 8000 BahIa Blanca, Buenos Aires (Argentina); Mitnik, D M [Instituto de AstronomIa y Fisica del Espacio, and Departamento de Fisica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C C 67, Suc. 28 (C1428EGA) Buenos Aires (Argentina)

    2007-10-14

    Simple and accurate wavefunctions for the He atom and He-like isoelectronic ions are presented. These functions-the product of hydrogenic one-electron solutions and a fully correlated part-satisfy all the coalescence cusp conditions at the Coulomb singularities. Functions with different numbers of parameters and different degrees of accuracy are discussed. Simple analytic expressions for the wavefunction and the energy, valid for a wide range of nuclear charges, are presented. The wavefunctions are tested, in the case of helium, through the calculations of various cross sections which probe different regions of the configuration space, mostly those close to the two-particle coalescence points.

  5. Safety Assessment of Advanced Imaging Sequences I: Measurements

    DEFF Research Database (Denmark)

    Jensen, Jørgen Arendt; Rasmussen, Morten Fischer; Pihl, Michael Johannes

    2016-01-01

    intensity measurement program. The approach can measure and store data for a full imaging sequence in 3.8 to 8.2 s per spatial position. Based on Ispta, MI, and probe surface temperature, the method gives the ability to determine whether a sequence is within US FDA limits, or alternatively indicate how......A method for rapid measurement of intensities (Ispta), mechanical index (MI), and probe surface temperature for any ultrasound scanning sequence is presented. It uses the scanner’s sampling capability to give an accurate measurement of the whole imaging sequence for all emissions to yield the true...... measurement system (Onda Corporation, Sunnyvale, CA, USA). Four different sequences have been measured: a fixed focus emission, a duplex sequence containing B-mode and flow emissions, a vector flow sequence with B-mode and flow emissions in 17 directions, and finally a synthetic aperture (SA) duplex flow...

  6. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    Directory of Open Access Journals (Sweden)

    Huilin Wang

    Full Text Available X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM. Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I. Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II, which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization

  7. Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map

    Directory of Open Access Journals (Sweden)

    Xu Xiangming

    2010-12-01

    Full Text Available Abstract Background Determining the position and order of contigs and scaffolds from a genome assembly within an organism's genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. Results The strategy was tested on a draft genome of the fungal pathogen Venturia inaequalis, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome Fragaria vesca. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for V. inaequalis and F. vesca, respectively, to genetic linkage maps. Conclusions We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.

  8. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

    Science.gov (United States)

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-06-15

    Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by

  9. An assembly sequence planning method based on composite algorithm

    Directory of Open Access Journals (Sweden)

    Enfu LIU

    2016-02-01

    Full Text Available To solve the combination explosion problem and the blind searching problem in assembly sequence planning of complex products, an assembly sequence planning method based on composite algorithm is proposed. In the composite algorithm, a sufficient number of feasible assembly sequences are generated using formalization reasoning algorithm as the initial population of genetic algorithm. Then fuzzy knowledge of assembly is integrated into the planning process of genetic algorithm and ant algorithm to get the accurate solution. At last, an example is conducted to verify the feasibility of composite algorithm.

  10. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  11. Sequencing Cyclic Peptides by Multistage Mass Spectrometry

    Science.gov (United States)

    Mohimani, Hosein; Yang, Yu-Liang; Liu, Wei-Ting; Hsieh, Pei-Wen; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2012-01-01

    Some of the most effective antibiotics (e.g., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. While hundreds of biomedically important cyclic peptides have been sequenced, the computational techniques for sequencing cyclic peptides are still in their infancy. Previous methods for sequencing peptide antibiotics and other cyclic peptides are based on Nuclear Magnetic Resonance spectroscopy, and require large amount (miligrams) of purified materials that, for most compounds, are not possible to obtain. Recently, development of mass spectrometry based methods has provided some hope for accurate sequencing of cyclic peptides using picograms of materials. In this paper we develop a method for sequencing of cyclic peptides by multistage mass spectrometry, and show its advantages over single stage mass spectrometry. The method is tested on known and new cyclic peptides from Bacillus brevis, Dianthus superbus and Streptomyces griseus, as well as a new family of cyclic peptides produced by marine bacteria. PMID:21751357

  12. Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space

    Science.gov (United States)

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.

    2013-01-01

    For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960

  13. Combinatorial pooling enables selective sequencing of the barley gene space.

    Directory of Open Access Journals (Sweden)

    Stefano Lonardi

    2013-04-01

    Full Text Available For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

  14. Combinatorial pooling enables selective sequencing of the barley gene space.

    Science.gov (United States)

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J

    2013-04-01

    For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

  15. Uniform, optimal signal processing of mapped deep-sequencing data.

    Science.gov (United States)

    Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

    2013-07-01

    Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.

  16. Targeted assembly of short sequence reads.

    Directory of Open Access Journals (Sweden)

    René L Warren

    Full Text Available As next-generation sequence (NGS production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

  17. Multiplexed microsatellite recovery using massively parallel sequencing

    Science.gov (United States)

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  18. De novo assembly of human genomes with massively parallel short read sequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue

    2010-01-01

    genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....

  19. Towards predicting the encoding capability of MR fingerprinting sequences.

    Science.gov (United States)

    Sommer, K; Amthor, T; Doneva, M; Koken, P; Meineke, J; Börnert, P

    2017-09-01

    Sequence optimization and appropriate sequence selection is still an unmet need in magnetic resonance fingerprinting (MRF). The main challenge in MRF sequence design is the lack of an appropriate measure of the sequence's encoding capability. To find such a measure, three different candidates for judging the encoding capability have been investigated: local and global dot-product-based measures judging dictionary entry similarity as well as a Monte Carlo method that evaluates the noise propagation properties of an MRF sequence. Consistency of these measures for different sequence lengths as well as the capability to predict actual sequence performance in both phantom and in vivo measurements was analyzed. While the dot-product-based measures yielded inconsistent results for different sequence lengths, the Monte Carlo method was in a good agreement with phantom experiments. In particular, the Monte Carlo method could accurately predict the performance of different flip angle patterns in actual measurements. The proposed Monte Carlo method provides an appropriate measure of MRF sequence encoding capability and may be used for sequence optimization. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.

    Science.gov (United States)

    Pericard, Pierre; Dufresne, Yoann; Couderc, Loïc; Blanquart, Samuel; Touzet, Hélène

    2018-02-15

    Advances in the sequencing of uncultured environmental samples, dubbed metagenomics, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length marker genes and this poses difficult bioinformatic challenges for taxonomy identification at high resolution. We designed MATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph. It is applied to the assembly of 16S rRNA markers and is validated on simulated, synthetic and genuine metagenomes. We show that MATAM outperforms other available methods in terms of low error rates and recovered fractions and is suitable to provide improved assemblies for precise taxonomic assignments. https://github.com/bonsai-team/matam. pierre.pericard@gmail.com or helene.touzet@univ-lille1.fr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    Directory of Open Access Journals (Sweden)

    John P. Jakupciak

    2013-01-01

    Full Text Available Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  2. TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

    Directory of Open Access Journals (Sweden)

    Sharma Gaurav

    2011-04-01

    Full Text Available Abstract Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a

  3. DNA demethylation activates genes in seed maternal integument development in rice (Oryza sativa L.).

    Science.gov (United States)

    Wang, Yifeng; Lin, Haiyan; Tong, Xiaohong; Hou, Yuxuan; Chang, Yuxiao; Zhang, Jian

    2017-11-01

    DNA methylation is an important epigenetic modification that regulates various plant developmental processes. Rice seed integument determines the seed size. However, the role of DNA methylation in its development remains largely unknown. Here, we report the first dynamic DNA methylomic profiling of rice maternal integument before and after pollination by using a whole-genome bisulfite deep sequencing approach. Analysis of DNA methylation patterns identified 4238 differentially methylated regions underpin 4112 differentially methylated genes, including GW2, DEP1, RGB1 and numerous other regulators participated in maternal integument development. Bisulfite sanger sequencing and qRT-PCR of six differentially methylated genes revealed extensive occurrence of DNA hypomethylation triggered by double fertilization at IAP compared with IBP, suggesting that DNA demethylation might be a key mechanism to activate numerous maternal controlling genes. These results presented here not only greatly expanded the rice methylome dataset, but also shed novel insight into the regulatory roles of DNA methylation in rice seed maternal integument development. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  4. Reconstruction of ancestral RNA sequences under multiple structural constraints

    OpenAIRE

    Tremblay-Savard, Olivier; Reinharz, Vladimir; Waldisp?hl, J?r?me

    2016-01-01

    Background Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA) families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors. Methods In this paper, we introduce achARNement, a maximum parsimony approach that, given...

  5. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  6. A parallel reconfigurable platform for efficient sequence alignment ...

    African Journals Online (AJOL)

    Bioinformatics is one of the emerging trends in today's world. The major part of bioinformatics is dealing with DNA. Analysis of DNA requires more memory and high efficient computations to produce accurate outputs. Researchers use various bioinformatics algorithms for sequencing and pattern detection techniques, but still ...

  7. On the Accuracy of Ancestral Sequence Reconstruction for Ultrametric Trees with Parsimony.

    Science.gov (United States)

    Herbst, Lina; Fischer, Mareike

    2018-04-01

    We examine a mathematical question concerning the reconstruction accuracy of the Fitch algorithm for reconstructing the ancestral sequence of the most recent common ancestor given a phylogenetic tree and sequence data for all taxa under consideration. In particular, for the symmetric four-state substitution model which is also known as Jukes-Cantor model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that for any ultrametric phylogenetic tree and a symmetric model, the Fitch parsimony method using all terminal taxa is more accurate, or at least as accurate, for ancestral state reconstruction than using any particular terminal taxon or any particular pair of taxa. This conjecture had so far only been answered for two-state data by Fischer and Thatte. Here, we focus on answering the biologically more relevant case with four states, which corresponds to ancestral sequence reconstruction from DNA or RNA data.

  8. Differential DNA methylation profile of key genes in malignant prostate epithelial cells transformed by inorganic arsenic or cadmium.

    Science.gov (United States)

    Pelch, Katherine E; Tokar, Erik J; Merrick, B Alex; Waalkes, Michael P

    2015-08-01

    Previous work shows altered methylation patterns in inorganic arsenic (iAs)- or cadmium (Cd)-transformed epithelial cells. Here, the methylation status near the transcriptional start site was assessed in the normal human prostate epithelial cell line (RWPE-1) that was malignantly transformed by 10μM Cd for 11weeks (CTPE) or 5μM iAs for 29weeks (CAsE-PE), at which time cells showed multiple markers of acquired cancer phenotype. Next generation sequencing of the transcriptome of CAsE-PE cells identified multiple dysregulated genes. Of the most highly dysregulated genes, five genes that can be relevant to the carcinogenic process (S100P, HYAL1, NTM, NES, ALDH1A1) were chosen for an in-depth analysis of the DNA methylation profile. DNA was isolated, bisulfite converted, and combined bisulfite restriction analysis was used to identify differentially methylated CpG sites, which was confirmed with bisulfite sequencing. Four of the five genes showed differential methylation in transformants relative to control cells that was inversely related to altered gene expression. Increased expression of HYAL1 (>25-fold) and S100P (>40-fold) in transformants was correlated with hypomethylation near the transcriptional start site. Decreased expression of NES (>15-fold) and NTM (>1000-fold) in transformants was correlated with hypermethylation near the transcriptional start site. ALDH1A1 expression was differentially expressed in transformed cells but was not differentially methylated relative to control. In conclusion, altered gene expression observed in Cd and iAs transformed cells may result from altered DNA methylation status. Published by Elsevier Inc.

  9. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.

    Science.gov (United States)

    Waldmann, Jost; Gerken, Jan; Hankeln, Wolfgang; Schweer, Timmy; Glöckner, Frank Oliver

    2014-06-14

    Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

  10. Enduring epigenetic landmarks define the cancer microenvironment

    Science.gov (United States)

    Pidsley, Ruth; Lawrence, Mitchell G.; Zotenko, Elena; Niranjan, Birunthi; Statham, Aaron; Song, Jenny; Chabanon, Roman M.; Qu, Wenjia; Wang, Hong; Richards, Michelle; Nair, Shalima S.; Armstrong, Nicola J.; Nim, Hieu T.; Papargiris, Melissa; Balanathan, Preetika; French, Hugh; Peters, Timothy; Norden, Sam; Ryan, Andrew; Pedersen, John; Kench, James; Daly, Roger J.; Horvath, Lisa G.; Stricker, Phillip; Frydenberg, Mark; Taylor, Renea A.; Stirzaker, Clare; Risbridger, Gail P.; Clark, Susan J.

    2018-01-01

    The growth and progression of solid tumors involves dynamic cross-talk between cancer epithelium and the surrounding microenvironment. To date, molecular profiling has largely been restricted to the epithelial component of tumors; therefore, features underpinning the persistent protumorigenic phenotype of the tumor microenvironment are unknown. Using whole-genome bisulfite sequencing, we show for the first time that cancer-associated fibroblasts (CAFs) from localized prostate cancer display remarkably distinct and enduring genome-wide changes in DNA methylation, significantly at enhancers and promoters, compared to nonmalignant prostate fibroblasts (NPFs). Differentially methylated regions associated with changes in gene expression have cancer-related functions and accurately distinguish CAFs from NPFs. Remarkably, a subset of changes is shared with prostate cancer epithelial cells, revealing the new concept of tumor-specific epigenome modifications in the tumor and its microenvironment. The distinct methylome of CAFs provides a novel epigenetic hallmark of the cancer microenvironment and promises new biomarkers to improve interpretation of diagnostic samples. PMID:29650553

  11. Digital quantification of gene methylation in stool DNA by emulsion-PCR coupled with hydrogel immobilized bead-array.

    Science.gov (United States)

    Liu, Yunlong; Wu, Haiping; Zhou, Qiang; Song, Qinxin; Rui, Jianzhong; Zou, Bingjie; Zhou, Guohua

    2017-06-15

    Aberrations of gene methylation in stool DNA (sDNA) is an effective biomarker for non-invasive colorectal cancer diagnosis. However, it is challenging to accurately quantitate the gene methylation levels in sDNA due to the low abundance and degradation of sDNA. In this study, a digital quantification strategy was proposed by combining emulsion PCR (emPCR) with hydrogel immobilized bead-array. The assay includes following steps: bisulfite conversion of sDNA, pre-amplification by PCR with specific primers containing 5' universal sequences, emPCR of pre-amplicons with beaded primers to achieve single-molecular amplification and identification of hydrogel embedding beads coated with amplicons. The sensitivity and the specificity of the method are high enough to pick up 0.05% methylated targets from unmethylated DNA background. The successful detection of hypermethylated vimentin gene in clinical stool samples suggests that the proposed method should be a potential tool for non-invasive colorectal cancer screening. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

    Directory of Open Access Journals (Sweden)

    Maggi Giorgio P

    2008-06-01

    Full Text Available Abstract Background The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent on the availability of annotated proteins. Results In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. Conclusion Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.

  13. Bacterial identification and subtyping using DNA microarray and DNA sequencing.

    Science.gov (United States)

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Allard, Marc M; Lienau, E Kurt; Brown, Eric D

    2012-01-01

    The era of fast and accurate discovery of biological sequence motifs in prokaryotic and eukaryotic cells is here. The co-evolution of direct genome sequencing and DNA microarray strategies not only will identify, isotype, and serotype pathogenic bacteria, but also it will aid in the discovery of new gene functions by detecting gene expressions in different diseases and environmental conditions. Microarray bacterial identification has made great advances in working with pure and mixed bacterial samples. The technological advances have moved beyond bacterial gene expression to include bacterial identification and isotyping. Application of new tools such as mid-infrared chemical imaging improves detection of hybridization in DNA microarrays. The research in this field is promising and future work will reveal the potential of infrared technology in bacterial identification. On the other hand, DNA sequencing by using 454 pyrosequencing is so cost effective that the promise of $1,000 per bacterial genome sequence is becoming a reality. Pyrosequencing technology is a simple to use technique that can produce accurate and quantitative analysis of DNA sequences with a great speed. The deposition of massive amounts of bacterial genomic information in databanks is creating fingerprint phylogenetic analysis that will ultimately replace several technologies such as Pulsed Field Gel Electrophoresis. In this chapter, we will review (1) the use of DNA microarray using fluorescence and infrared imaging detection for identification of pathogenic bacteria, and (2) use of pyrosequencing in DNA cluster analysis to fingerprint bacterial phylogenetic trees.

  14. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...... determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than...

  15. Efficiency to Discovery Transgenic Loci in GM Rice Using Next Generation Sequencing Whole Genome Re-sequencing

    Directory of Open Access Journals (Sweden)

    Doori Park

    2015-09-01

    Full Text Available Molecular characterization technology in genetically modified organisms, in addition to how transgenic biotechnologies are developed now require full transparency to assess the risk to living modified and non-modified organisms. Next generation sequencing (NGS methodology is suggested as an effective means in genome characterization and detection of transgenic insertion locations. In the present study, we applied NGS to insert transgenic loci, specifically the epidermal growth factor (EGF in genetically modified rice cells. A total of 29.3 Gb (~72× coverage was sequenced with a 2 × 150 bp paired end method by Illumina HiSeq2500, which was consecutively mapped to the rice genome and T-vector sequence. The compatible pairs of reads were successfully mapped to 10 loci on the rice chromosome and vector sequences were validated to the insertion location by polymerase chain reaction (PCR amplification. The EGF transgenic site was confirmed only on chromosome 4 by PCR. Results of this study demonstrated the success of NGS data to characterize the rice genome. Bioinformatics analyses must be developed in association with NGS data to identify highly accurate transgenic sites.

  16. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies. PMID:21106786

  17. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    Science.gov (United States)

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    clustering of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies.

  18. DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing

    Directory of Open Access Journals (Sweden)

    Jackson Stuart

    2010-04-01

    Full Text Available Abstract Background DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. Results We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. Conclusion The described assay outputs absolute copy number, outputs an error estimate (p-value, and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.

  19. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  20. Accurate, model-based tuning of synthetic gene expression using introns in S. cerevisiae.

    Directory of Open Access Journals (Sweden)

    Ido Yofe

    2014-06-01

    Full Text Available Introns are key regulators of eukaryotic gene expression and present a potentially powerful tool for the design of synthetic eukaryotic gene expression systems. However, intronic control over gene expression is governed by a multitude of complex, incompletely understood, regulatory mechanisms. Despite this lack of detailed mechanistic understanding, here we show how a relatively simple model enables accurate and predictable tuning of synthetic gene expression system in yeast using several predictive intron features such as transcript folding and sequence motifs. Using only natural Saccharomyces cerevisiae introns as regulators, we demonstrate fine and accurate control over gene expression spanning a 100 fold expression range. These results broaden the engineering toolbox of synthetic gene expression systems and provide a framework in which precise and robust tuning of gene expression is accomplished.

  1. BLESS 2: accurate, memory-efficient and fast error correction method.

    Science.gov (United States)

    Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei; Ma, Jian; Chen, Deming

    2016-08-01

    The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Freely available at https://sourceforge.net/projects/bless-ec dchen@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  3. A multiplex microplatform for the detection of multiple DNA methylation events using gold-DNA affinity.

    Science.gov (United States)

    Sina, Abu Ali Ibn; Foster, Matthew Thomas; Korbie, Darren; Carrascosa, Laura G; Shiddiky, Muhammad J A; Gao, Jing; Dey, Shuvashis; Trau, Matt

    2017-10-07

    We report a new multiplexed strategy for the electrochemical detection of regional DNA methylation across multiple regions. Using the sequence dependent affinity of bisulfite treated DNA towards gold surfaces, the method integrates the high sensitivity of a micro-fabricated multiplex device comprising a microarray of gold electrodes, with the powerful multiplexing capability of multiplex-PCR. The synergy of this combination enables the monitoring of the methylation changes across several genomic regions simultaneously from as low as 500 pg μl -1 of DNA with no sequencing requirement.

  4. Sequence2Vec: A novel embedding approach for modeling transcription factor binding affinity landscape

    KAUST Repository

    Dai, Hanjun

    2017-07-26

    Motivation: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Results: Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model (HMM) which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these HMMs into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA data sets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods.

  5. NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes.

    Directory of Open Access Journals (Sweden)

    Gabriel A Al-Ghalith

    2016-01-01

    Full Text Available The explosion of bioinformatics technologies in the form of next generation sequencing (NGS has facilitated a massive influx of genomics data in the form of short reads. Short read mapping is therefore a fundamental component of next generation sequencing pipelines which routinely match these short reads against reference genomes for contig assembly. However, such techniques have seldom been applied to microbial marker gene sequencing studies, which have mostly relied on novel heuristic approaches. We propose NINJA Is Not Just Another OTU-Picking Solution (NINJA-OPS, or NINJA for short, a fast and highly accurate novel method enabling reference-based marker gene matching (picking Operational Taxonomic Units, or OTUs. NINJA takes advantage of the Burrows-Wheeler (BW alignment using an artificial reference chromosome composed of concatenated reference sequences, the "concatesome," as the BW input. Other features include automatic support for paired-end reads with arbitrary insert sizes. NINJA is also free and open source and implements several pre-filtering methods that elicit substantial speedup when coupled with existing tools. We applied NINJA to several published microbiome studies, obtaining accuracy similar to or better than previous reference-based OTU-picking methods while achieving an order of magnitude or more speedup and using a fraction of the memory footprint. NINJA is a complete pipeline that takes a FASTA-formatted input file and outputs a QIIME-formatted taxonomy-annotated BIOM file for an entire MiSeq run of human gut microbiome 16S genes in under 10 minutes on a dual-core laptop.

  6. Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification

    Science.gov (United States)

    Faye, Laura L.; Machiela, Mitchell J.; Kraft, Peter; Bull, Shelley B.; Sun, Lei

    2013-01-01

    Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website. PMID:23950724

  7. Probabilistic Methods for Processing High-Throughput Sequencing Signals

    DEFF Research Database (Denmark)

    Sørensen, Lasse Maretty

    High-throughput sequencing has the potential to answer many of the big questions in biology and medicine. It can be used to determine the ancestry of species, to chart complex ecosystems and to understand and diagnose disease. However, going from raw sequencing data to biological or medical insig....... By estimating the genotypes on a set of candidate variants obtained from both a standard mapping-based approach as well as de novo assemblies, we are able to find considerably more structural variation than previous studies...... for reconstructing transcript sequences from RNA sequencing data. The method is based on a novel sparse prior distribution over transcript abundances and is markedly more accurate than existing approaches. The second chapter describes a new method for calling genotypes from a fixed set of candidate variants....... The method queries the reads using a graph representation of the variants and hereby mitigates the reference-bias that characterise standard genotyping methods. In the last chapter, we apply this method to call the genotypes of 50 deeply sequencing parent-offspring trios from the GenomeDenmark project...

  8. Main sequences defined by Hyades and field stars

    International Nuclear Information System (INIS)

    Upgren, A.R.

    1978-01-01

    The author reviews the main sequences defined by members of the Hyades cluster and by the field stars in the solar neighborhood. For this purpose, the discussion is limited primarily to the stars of the lower portions of the main sequence, especially those of spectral classes K and early M. There are two reasons for emphasis on the faint red dwarf stars. First, the value of a parallax depends on its size or, more accurately, on the error in parallax divided by the parallax itself. Large parallaxes of high precision occur in large numbers only for stars inhabiting the lower main sequence. Furthermore, brighter stars of earlier spectral classes are more likely to be influenced by evolutionary effects which may differ between the Hyades and field stars, and which are difficult to calibrate. (Auth.)

  9. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  10. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  11. MANGO: a new approach to multiple sequence alignment.

    Science.gov (United States)

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  12. sRNAnalyzer-a flexible and customizable small RNA sequencing data analysis pipeline.

    Science.gov (United States)

    Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai

    2017-12-01

    Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. sRNAnalyzer—a flexible and customizable small RNA sequencing data analysis pipeline

    Science.gov (United States)

    Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J.

    2017-01-01

    Abstract Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline—sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. PMID:29069500

  14. pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment.

    Science.gov (United States)

    Warris, Sven; Timal, N Roshan N; Kempenaar, Marcel; Poortinga, Arne M; van de Geest, Henri; Varbanescu, Ana L; Nap, Jan-Peter

    2018-01-01

    Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python. The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS. pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.

  15. S3 HMBC hetero: Spin-State-Selective HMBC for accurate measurement of long-range heteronuclear coupling constants

    DEFF Research Database (Denmark)

    Hoeck, Casper; Gotfredsen, Charlotte Held; Sørensen, Ole W.

    2017-01-01

    A novel method, Spin-State-Selective (S3) HMBC hetero, for accurate measurement of heteronuclear coupling constants is introduced. The method extends the S3 HMBC technique for measurement of homonuclear coupling constants by appending a pulse sequence element that interchanges the polarization...

  16. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jonas Binladen

    2007-02-01

    Full Text Available The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources.We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences. Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis.We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%. Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial

  17. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active...... related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein...... sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally...

  18. Noninvasive prenatal paternity testing (NIPAT) through maternal plasma DNA sequencing

    DEFF Research Database (Denmark)

    Jiang, Haojun; Xie, Yifan; Li, Xuchao

    2016-01-01

    developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels......Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we...... paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future....

  19. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

    KAUST Repository

    Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M

    2018-01-01

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  20. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

    KAUST Repository

    Teng, Haotian

    2018-04-10

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  1. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  2. COBRA-Seq: Sensitive and Quantitative Methylome Profiling

    Directory of Open Access Journals (Sweden)

    Hilal Varinli

    2015-10-01

    Full Text Available Combined Bisulfite Restriction Analysis (COBRA quantifies DNA methylation at a specific locus. It does so via digestion of PCR amplicons produced from bisulfite-treated DNA, using a restriction enzyme that contains a cytosine within its recognition sequence, such as TaqI. Here, we introduce COBRA-seq, a genome wide reduced methylome method that requires minimal DNA input (0.1–1.0 mg and can either use PCR or linear amplification to amplify the sequencing library. Variants of COBRA-seq can be used to explore CpG-depleted as well as CpG-rich regions in vertebrate DNA. The choice of enzyme influences enrichment for specific genomic features, such as CpG-rich promoters and CpG islands, or enrichment for less CpG dense regions such as enhancers. COBRA-seq coupled with linear amplification has the additional advantage of reduced PCR bias by producing full length fragments at high abundance. Unlike other reduced representative methylome methods, COBRA-seq has great flexibility in the choice of enzyme and can be multiplexed and tuned, to reduce sequencing costs and to interrogate different numbers of sites. Moreover, COBRA-seq is applicable to non-model organisms without the reference genome and compatible with the investigation of non-CpG methylation by using restriction enzymes containing CpA, CpT, and CpC in their recognition site.

  3. The DNA methylome of human peripheral blood mononuclear cells

    DEFF Research Database (Denmark)

    Li, Yingrui; Zhu, Jingde; Tian, Geng

    2010-01-01

    DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per...... strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found...... research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies....

  4. Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

    Directory of Open Access Journals (Sweden)

    Niklas Berliner

    Full Text Available Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.

  5. Divide and conquer: enriching environmental sequencing data.

    Directory of Open Access Journals (Sweden)

    Anne Bergeron

    2007-09-01

    Full Text Available In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species.Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities.Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.

  6. BPP: a sequence-based algorithm for branch point prediction.

    Science.gov (United States)

    Zhang, Qing; Fan, Xiaodan; Wang, Yejun; Sun, Ming-An; Shao, Jianlin; Guo, Dianjing

    2017-10-15

    Although high-throughput sequencing methods have been proposed to identify splicing branch points in the human genome, these methods can only detect a small fraction of the branch points subject to the sequencing depth, experimental cost and the expression level of the mRNA. An accurate computational model for branch point prediction is therefore an ongoing objective in human genome research. We here propose a novel branch point prediction algorithm that utilizes information on the branch point sequence and the polypyrimidine tract. Using experimentally validated data, we demonstrate that our proposed method outperforms existing methods. Availability and implementation: https://github.com/zhqingit/BPP. djguo@cuhk.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. Implementation of Targeted Next Generation Sequencing in Clinical Diagnostics

    DEFF Research Database (Denmark)

    Larsen, Martin Jakob; Burton, Mark; Thomassen, Mads

    Accurate mutation detection is essential in clinical genetic diagnostics of monogenic hereditary diseases. Targeted next generation sequencing (NGS) provides a promising and cost-effective alternative to Sanger sequencing and MLPA analysis currently used in most diagnostic laboratories. One...... of mutation positive controls previously characterized by Sanger/MLPA analysis. Agilent SureSelect Target-Enrichment kits were used for capturing a set of genes associated with hereditary breast and ovarian cancer syndrome and a compilation of genes involved in multiple rare single gene disorders......, respectively. For diagnostics, the sequencing coverage is essential, wherefore a minimum coverage of 30x per nucleotide in the coding regions was used as our primary quality criterion. For the majority of the included genes, we obtained adequate gene coverage, in which we were able to detect 100% of the known...

  8. Robustness of ancestral sequence reconstruction to phylogenetic uncertainty.

    Science.gov (United States)

    Hanson-Smith, Victor; Kolaczkowski, Bryan; Thornton, Joseph W

    2010-09-01

    Ancestral sequence reconstruction (ASR) is widely used to formulate and test hypotheses about the sequences, functions, and structures of ancient genes. Ancestral sequences are usually inferred from an alignment of extant sequences using a maximum likelihood (ML) phylogenetic algorithm, which calculates the most likely ancestral sequence assuming a probabilistic model of sequence evolution and a specific phylogeny--typically the tree with the ML. The true phylogeny is seldom known with certainty, however. ML methods ignore this uncertainty, whereas Bayesian methods incorporate it by integrating the likelihood of each ancestral state over a distribution of possible trees. It is not known whether Bayesian approaches to phylogenetic uncertainty improve the accuracy of inferred ancestral sequences. Here, we use simulation-based experiments under both simplified and empirically derived conditions to compare the accuracy of ASR carried out using ML and Bayesian approaches. We show that incorporating phylogenetic uncertainty by integrating over topologies very rarely changes the inferred ancestral state and does not improve the accuracy of the reconstructed ancestral sequence. Ancestral state reconstructions are robust to uncertainty about the underlying tree because the conditions that produce phylogenetic uncertainty also make the ancestral state identical across plausible trees; conversely, the conditions under which different phylogenies yield different inferred ancestral states produce little or no ambiguity about the true phylogeny. Our results suggest that ML can produce accurate ASRs, even in the face of phylogenetic uncertainty. Using Bayesian integration to incorporate this uncertainty is neither necessary nor beneficial.

  9. Effective Feature Selection for Classification of Promoter Sequences.

    Directory of Open Access Journals (Sweden)

    Kouser K

    Full Text Available Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine, KNN (K Nearest Neighbor and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  10. Statistical processing of large image sequences.

    Science.gov (United States)

    Khellah, F; Fieguth, P; Murray, M J; Allen, M

    2005-01-01

    The dynamic estimation of large-scale stochastic image sequences, as frequently encountered in remote sensing, is important in a variety of scientific applications. However, the size of such images makes conventional dynamic estimation methods, for example, the Kalman and related filters, impractical. In this paper, we present an approach that emulates the Kalman filter, but with considerably reduced computational and storage requirements. Our approach is illustrated in the context of a 512 x 512 image sequence of ocean surface temperature. The static estimation step, the primary contribution here, uses a mixture of stationary models to accurately mimic the effect of a nonstationary prior, simplifying both computational complexity and modeling. Our approach provides an efficient, stable, positive-definite model which is consistent with the given correlation structure. Thus, the methods of this paper may find application in modeling and single-frame estimation.

  11. Evaluating next-generation sequencing for direct clinical diagnostics in diarrhoeal disease

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Engsbro, A L Ø; Lukjancenko, Oksana

    2017-01-01

    The accurate microbiological diagnosis of diarrhoea involves numerous laboratory tests and, often, the pathogen is not identified in time to guide clinical management. With next-generation sequencing (NGS) becoming cheaper, it has huge potential in routine diagnostics. The aim of this study...... was to evaluate the potential of NGS-based diagnostics through direct sequencing of faecal samples. Fifty-eight clinical faecal samples were obtained from patients with diarrhoea as part of the routine diagnostics at Hvidovre University Hospital, Denmark. Ten samples from healthy individuals were also included...

  12. Increased sequence diversity coverage improves detection of HIV-Specific T cell responses

    DEFF Research Database (Denmark)

    Frahm, N.; Kaufmann, D.E.; Yusim, K.

    2007-01-01

    The accurate identification of HIV-specific T cell responses is important for determining the relationship between immune response, viral control, and disease progression. HIV-specific immune responses are usually measured using peptide sets based on consensus sequences, which frequently miss res...

  13. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    Science.gov (United States)

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  14. Rapid-Sequence Serial Sexual Homicides.

    Science.gov (United States)

    Schlesinger, Louis B; Ramirez, Stephanie; Tusa, Brittany; Jarvis, John P; Erdberg, Philip

    2017-03-01

    Serial sexual murderers have been described as committing homicides in a methodical manner, taking substantial time between offenses to elude the authorities. The results of our study of the temporal patterns (i.e., the length of time between homicides) of a nonrandom national sample of 44 serial sexual murderers and their 201 victims indicate that this representation may not always be accurate. Although 25 offenders (56.8%) killed with longer than a 14-day period between homicides, a sizeable subgroup was identified: 19 offenders (43.2%) who committed homicides in rapid-sequence fashion, with fewer than 14 days between all or some of the murders. Six offenders (13.6%) killed all their victims in one rapid-sequence, spree-like episode, with homicides just days apart or sometimes two murders in the same day. Thirteen offenders (29.5%) killed in one or two rapid-sequence clusters (i.e., more than one murder within a 14-day period, as well as additional homicides with greater than 14 days between each). The purpose of our study was to describe this subgroup of rapid-sequence offenders who have not been identified until now. These findings argue for accelerated forensic assessments of dangerousness and public safety when a sexual murder is detected. Psychiatric disorders with rapidly occurring symptom patterns, or even atypical mania or mood dysregulation, may serve as exemplars for understanding this extraordinary group of offenders. © 2017 American Academy of Psychiatry and the Law.

  15. Accurate clinical genetic testing for autoinflammatory diseases using the next-generation sequencing platform MiSeq.

    Science.gov (United States)

    Nakayama, Manabu; Oda, Hirotsugu; Nakagawa, Kenji; Yasumi, Takahiro; Kawai, Tomoki; Izawa, Kazushi; Nishikomori, Ryuta; Heike, Toshio; Ohara, Osamu

    2017-03-01

    Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA), also known as neonatal-onset multisystem inflammatory disease (NOMID). In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program-which detects and avoids common SNPs in gene-specific PCR primers-we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of somatic mosaic mutations.

  16. Ancestral sequence alignment under optimal conditions

    Directory of Open Access Journals (Sweden)

    Brown Daniel G

    2005-11-01

    success of aligning ancestral sequences containing ambiguity is very sensitive to the choice of gap open cost. Surprisingly, we find that using maximum likelihood to infer ancestral sequences results in less accurate alignments than when using parsimony to infer ancestral sequences. Finally, we find that the sum-of-pairs methods produce better alignments than all of the ancestral alignment methods.

  17. Quantification of massively parallel sequencing libraries - a comparative study of eight methods

    DEFF Research Database (Denmark)

    Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt

    2018-01-01

    Quantification of massively parallel sequencing libraries is important for acquisition of monoclonal beads or clusters prior to clonal amplification and to avoid large variations in library coverage when multiple samples are included in one sequencing analysis. No gold standard for quantification...... estimates followed by Qubit and electrophoresis-based instruments (Bioanalyzer, TapeStation, GX Touch, and Fragment Analyzer), while SYBR Green and TaqMan based qPCR assays gave the lowest estimates. qPCR gave more accurate predictions of sequencing coverage than Qubit and TapeStation did. Costs, time......-consumption, workflow simplicity, and ability to quantify multiple samples are discussed. Technical specifications, advantages, and disadvantages of the various methods are pointed out....

  18. A STUDY ON DETERMINING THE REFERENCE SPREADING SEQUENCES FOR A DS/CDMACOMMUNICATION SYSTEM

    Directory of Open Access Journals (Sweden)

    Cebrail ÇİFTLİKLİ

    2002-02-01

    Full Text Available In a direct sequence/code division multiple access (DS/CDMA system, the role of the spreading sequences (codes is crucial since the multiple access interference (MAI is the main performance limitation. In this study, we propose an accurate criterion which enables the determination of the reference spreading codes which yield lower bit error rates (BER's in a given code set for a DS/CDMA system using despreading sequences weighted by stepping chip waveforms. The numerical results show that the spreading codes determined by the proposed criterion are the most suitable codes for using as references.

  19. ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

    Directory of Open Access Journals (Sweden)

    Cabanski Christopher R

    2012-09-01

    Full Text Available Abstract Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.

  20. DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach.

    Directory of Open Access Journals (Sweden)

    Zhiheng Wang

    Full Text Available The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database.The DisoMCS is available at http://cal.tongji.edu.cn/disorder/.

  1. Effects of metal-rich particulate matter exposure on exogenous and endogenous viral sequence methylation in healthy steel-workers.

    Science.gov (United States)

    Mercorio, Roberta; Bonzini, Matteo; Angelici, Laura; Iodice, Simona; Delbue, Serena; Mariani, Jacopo; Apostoli, Pietro; Pesatori, Angela Cecilia; Bollati, Valentina

    2017-11-01

    Inhaled particles have been shown to produce systemic changes in DNA methylation. Global hypomethylation has been associated to viral sequence reactivation, possibly linked to the activation of pro-inflammatory pathways occurring after exposure. This observation provides a rationale to investigate viral sequence (both exogenous and endogenous) methylation in association to metal-rich particulate matter exposure. To verify this hypothesis, we chose the Wp promoter of the Epstein-Barr Virus (EBV-Wp) and the promoter of the human-endogenous-retrovirus w (HERV-w), respectively as a paradigm of an exogenous and an endogenous retroviral sequence, to be investigated by bisulfite PCR Pyrosequencing. We enrolled 63 male workers in an electric furnace steel plant, exposed to high level of metal-rich particulate matter. Comparing samples obtained in the first day of a work week (time 0-baseline, after 2 days off work) and the samples obtained after 3 days of work (time 1-post exposure), the mean methylation of EBV-Wp was significantly higher at baseline compared to post-exposure (mean baseline = 56.7%5mC; mean post-exposure = 47.9%5mC; p-value = 0.009), whereas the mean methylation of HERV-w did not significantly differ. Individual exposure to inhalable particles and metals was estimated based on measures in all working areas and time spent by the study subjects in each area. In a regression model adjusted for age, body mass index and smoking, PM and metal components had a positive association with EBV-Wp methylation (i.e. PM10: β = 5.99, p-value < 0.038; nickel: β = 17.82, p-value = 0.02; arsenic: β = 13.59, p-value < 0.015). The difference observed comparing baseline and post-exposure samples may be suggestive of a rapid change in EBV methylation induced by air particles, while correlation between EBV methylation and PM/metal exposure may represent a more stable adaptive mechanism. Future studies investigating a larger panel of viral sequences could better elucidate

  2. MR imaging of the orbit and eye using inversion recovery sequences

    International Nuclear Information System (INIS)

    Smith, F.W.; Parekh, S.; Forrester, J.; Redpath, T.W.

    1986-01-01

    Most centers performing MR imaging use spin-echo sequences to produce images; however, there are many advantages to using short TI inversion-recovery sequences for examination of the orbits. By selecting a TI similar to the relaxation time of any structure, the signal from this can be suppressed, thereby enhancing the signal from other structures. Using a sequence of TR = 1,000 msec and TI of less than 200 msec, the signal from fat is suppressed, improving image quality adjacent to the surface coil and providing better contrast between orbital structures and fat. The use of this short TI sequence for the examination of the eye in patients with opaque lenses is an accurate method of diagnosis since the sequence enhances the signal from both long T1 and T2 lesions. Eighty-five patients with orbital or ocular pathology have been studied, and the results demonstrate the usefulness of this technique for diagnosis

  3. Implementing targeted region capture sequencing for the clinical detection of Alagille syndrome: An efficient and cost‑effective method.

    Science.gov (United States)

    Huang, Tianhong; Yang, Guilin; Dang, Xiao; Ao, Feijian; Li, Jiankang; He, Yizhou; Tang, Qiyuan; He, Qing

    2017-11-01

    Alagille syndrome (AGS) is a highly variable, autosomal dominant disease that affects multiple structures including the liver, heart, eyes, bones and face. Targeted region capture sequencing focuses on a panel of known pathogenic genes and provides a rapid, cost‑effective and accurate method for molecular diagnosis. In a Chinese family, this method was used on the proband and Sanger sequencing was applied to validate the candidate mutation. A de novo heterozygous mutation (c.3254_3255insT p.Leu1085PhefsX24) of the jagged 1 gene was identified as the potential disease‑causing gene mutation. In conclusion, the present study suggested that target region capture sequencing is an efficient, reliable and accurate approach for the clinical diagnosis of AGS. Furthermore, these results expand on the understanding of the pathogenesis of AGS.

  4. Effects of Early and Late Rest Intervals on Performance and Overnight Consolidation of a Keyboard Sequence

    Science.gov (United States)

    Cash, Carla Davis

    2009-01-01

    Thirty-six nonmusicians practiced a five-element key-press sequence on a digital piano, repeating the sequence as quickly and accurately as possible during twelve 30-s practice blocks alternating with 30-s pauses. Twelve learners rested for 5 min between Blocks 3 and 4, another 12 learners rested for 5 min between Blocks 9 and 10, and the…

  5. Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

    Directory of Open Access Journals (Sweden)

    Jinzhuang Dou

    2017-09-01

    Full Text Available Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X. Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.

  6. Patterns of DNA Methylation in Development, Division of Labor and Hybridization in an Ant with Genetic Caste Determination

    OpenAIRE

    Smith, Chris R.; Mutti, Navdeep S.; Jasper, W. Cameron; Naidu, Agni; Smith, Christopher D.; Gadau, Jürgen

    2012-01-01

    BACKGROUND: DNA methylation is a common regulator of gene expression, including acting as a regulator of developmental events and behavioral changes in adults. Using the unique system of genetic caste determination in Pogonomyrmex barbatus, we were able to document changes in DNA methylation during development, and also across both ancient and contemporary hybridization events. METHODOLOGY/PRINCIPAL FINDINGS: Sodium bisulfite sequencing demonstrated in vivo methylation of symmetric CG dinucle...

  7. Measuring Sandy Bottom Dynamics by Exploiting Depth from Stereo Video Sequences

    DEFF Research Database (Denmark)

    Musumeci, Rosaria E.; Farinella, Giovanni M.; Foti, Enrico

    2013-01-01

    In this paper an imaging system for measuring sandy bottom dynamics is proposed. The system exploits stereo sequences and projected laser beams to build the 3D shape of the sandy bottom during time. The reconstruction is used by experts of the field to perform accurate measurements and analysis...

  8. Why do results conflict regarding the prognostic value of the methylation status in colon cancers? the role of the preservation method

    Directory of Open Access Journals (Sweden)

    Tournier Benjamin

    2012-01-01

    Full Text Available Abstract Background In colorectal carcinoma, extensive gene promoter hypermethylation is called the CpG island methylator phenotype (CIMP. Explaining why studies on CIMP and survival yield conflicting results is essential. Most experiments to measure DNA methylation rely on the sodium bisulfite conversion of unmethylated cytosines into uracils. No study has evaluated the performance of bisulfite conversion and methylation levels from matched cryo-preserved and Formalin-Fixed Paraffin Embedded (FFPE samples using pyrosequencing. Methods Couples of matched cryo-preserved and FFPE samples from 40 colon adenocarcinomas were analyzed. Rates of bisulfite conversion and levels of methylation of LINE-1, MLH1 and MGMT markers were measured. Results For the reproducibility of bisulfite conversion, the mean of bisulfite-to-bisulfite standard deviation (SD was 1.3%. The mean of run-to-run SD of PCR/pyrosequencing was 0.9%. Of the 40 DNA couples, only 67.5%, 55.0%, and 57.5% of FFPE DNA were interpretable for LINE-1, MLH1, and MGMT markers, respectively, after the first analysis. On frozen samples the proportion of well converted samples was 95.0%, 97.4% and 87.2% respectively. For DNA showing a total bisulfite conversion, 8 couples (27.6% for LINE-1, 4 couples (15.4% for MLH1 and 8 couples (25.8% for MGMT displayed significant differences in methylation levels. Conclusions Frozen samples gave reproducible results for bisulfite conversion and reliable methylation levels. FFPE samples gave unsatisfactory and non reproducible bisulfite conversions leading to random results for methylation levels. The use of FFPE collections to assess DNA methylation by bisulfite methods must not be recommended. This can partly explain the conflicting results on the prognosis of CIMP colon cancers.

  9. BLAST and FASTA similarity searching for multiple sequence alignment.

    Science.gov (United States)

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  10. Digital PCR provides sensitive and absolute calibration for high throughput sequencing

    Directory of Open Access Journals (Sweden)

    Fan H Christina

    2009-03-01

    Full Text Available Abstract Background Next-generation DNA sequencing on the 454, Solexa, and SOLiD platforms requires absolute calibration of the number of molecules to be sequenced. This requirement has two unfavorable consequences. First, large amounts of sample-typically micrograms-are needed for library preparation, thereby limiting the scope of samples which can be sequenced. For many applications, including metagenomics and the sequencing of ancient, forensic, and clinical samples, the quantity of input DNA can be critically limiting. Second, each library requires a titration sequencing run, thereby increasing the cost and lowering the throughput of sequencing. Results We demonstrate the use of digital PCR to accurately quantify 454 and Solexa sequencing libraries, enabling the preparation of sequencing libraries from nanogram quantities of input material while eliminating costly and time-consuming titration runs of the sequencer. We successfully sequenced low-nanogram scale bacterial and mammalian DNA samples on the 454 FLX and Solexa DNA sequencing platforms. This study is the first to definitively demonstrate the successful sequencing of picogram quantities of input DNA on the 454 platform, reducing the sample requirement more than 1000-fold without pre-amplification and the associated bias and reduction in library depth. Conclusion The digital PCR assay allows absolute quantification of sequencing libraries, eliminates uncertainties associated with the construction and application of standard curves to PCR-based quantification, and with a coefficient of variation close to 10%, is sufficiently precise to enable direct sequencing without titration runs.

  11. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    Science.gov (United States)

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  12. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  13. Genotyping common and rare variation using overlapping pool sequencing

    Directory of Open Access Journals (Sweden)

    Pasaniuc Bogdan

    2011-07-01

    Full Text Available Abstract Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.

  14. CREST--classification resources for environmental sequence tags.

    Directory of Open Access Journals (Sweden)

    Anders Lanzén

    Full Text Available Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags, a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3 from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com.

  15. Accurate measurement of heteronuclear dipolar couplings by phase-alternating R-symmetry (PARS) sequences in magic angle spinning NMR spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Hou, Guangjin, E-mail: hou@udel.edu, E-mail: tpolenov@udel.edu; Lu, Xingyu, E-mail: luxingyu@udel.edu, E-mail: lexvega@comcast.net; Vega, Alexander J., E-mail: luxingyu@udel.edu, E-mail: lexvega@comcast.net; Polenova, Tatyana, E-mail: hou@udel.edu, E-mail: tpolenov@udel.edu [Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, USA and Pittsburgh Center for HIV Protein Interactions, University of Pittsburgh School of Medicine, 1051 Biomedical Science Tower 3, 3501 Fifth Ave., Pittsburgh, Pennsylvania 15261 (United States)

    2014-09-14

    We report a Phase-Alternating R-Symmetry (PARS) dipolar recoupling scheme for accurate measurement of heteronuclear {sup 1}H-X (X = {sup 13}C, {sup 15}N, {sup 31}P, etc.) dipolar couplings in MAS NMR experiments. It is an improvement of conventional C- and R-symmetry type DIPSHIFT experiments where, in addition to the dipolar interaction, the {sup 1}H CSA interaction persists and thereby introduces considerable errors in the dipolar measurements. In PARS, phase-shifted RN symmetry pulse blocks applied on the {sup 1}H spins combined with π pulses applied on the X spins at the end of each RN block efficiently suppress the effect from {sup 1}H chemical shift anisotropy, while keeping the {sup 1}H-X dipolar couplings intact. Another advantage over conventional DIPSHIFT experiments, which require the signal to be detected in the form of a reduced-intensity Hahn echo, is that the series of π pulses refocuses the X chemical shift and avoids the necessity of echo formation. PARS permits determination of accurate dipolar couplings in a single experiment; it is suitable for a wide range of MAS conditions including both slow and fast MAS frequencies; and it assures dipolar truncation from the remote protons. The performance of PARS is tested on two model systems, [{sup 15}N]-N-acetyl-valine and [U-{sup 13}C,{sup 15}N]-N-formyl-Met-Leu-Phe tripeptide. The application of PARS for site-resolved measurement of accurate {sup 1}H-{sup 15}N dipolar couplings in the context of 3D experiments is presented on U-{sup 13}C,{sup 15}N-enriched dynein light chain protein LC8.

  16. Validation of rice genome sequence by optical mapping

    Directory of Open Access Journals (Sweden)

    Pape Louise

    2007-08-01

    Full Text Available Abstract Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project and TIGR (The Institute for Genomic Research genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of

  17. The 2012 Emilia seismic sequence (Northern Italy): Imaging the thrust fault system by accurate aftershock location

    Science.gov (United States)

    Govoni, Aladino; Marchetti, Alessandro; De Gori, Pasquale; Di Bona, Massimo; Lucente, Francesco Pio; Improta, Luigi; Chiarabba, Claudio; Nardi, Anna; Margheriti, Lucia; Agostinetti, Nicola Piana; Di Giovambattista, Rita; Latorre, Diana; Anselmi, Mario; Ciaccio, Maria Grazia; Moretti, Milena; Castellano, Corrado; Piccinini, Davide

    2014-05-01

    Starting from late May 2012, the Emilia region (Northern Italy) was severely shaken by an intense seismic sequence, originated from a ML 5.9 earthquake on May 20th, at a hypocentral depth of 6.3 km, with thrust-type focal mechanism. In the following days, the seismic rate remained high, counting 50 ML ≥ 2.0 earthquakes a day, on average. Seismicity spreads along a 30 km east-west elongated area, in the Po river alluvial plain, in the nearby of the cities Ferrara and Modena. Nine days after the first shock, another destructive thrust-type earthquake (ML 5.8) hit the area to the west, causing further damage and fatalities. Aftershocks following this second destructive event extended along the same east-westerly trend for further 20 km to the west, thus illuminating an area of about 50 km in length, on the whole. After the first shock struck, on May 20th, a dense network of temporary seismic stations, in addition to the permanent ones, was deployed in the meizoseismal area, leading to a sensible improvement of the earthquake monitoring capability there. A combined dataset, including three-component seismic waveforms recorded by both permanent and temporary stations, has been analyzed in order to obtain an appropriate 1-D velocity model for earthquake location in the study area. Here we describe the main seismological characteristics of this seismic sequence and, relying on refined earthquakes location, we make inferences on the geometry of the thrust system responsible for the two strongest shocks.

  18. Accurate clinical genetic testing for autoinflammatory diseases using the next-generation sequencing platform MiSeq

    Directory of Open Access Journals (Sweden)

    Manabu Nakayama

    2017-03-01

    Full Text Available Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA, also known as neonatal-onset multisystem inflammatory disease (NOMID. In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program—which detects and avoids common SNPs in gene-specific PCR primers—we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of

  19. Plann: A command-line application for annotating plastome sequences.

    Science.gov (United States)

    Huang, Daisie I; Cronk, Quentin C B

    2015-08-01

    Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann's output can be used in the National Center for Biotechnology Information's tbl2asn to create a Sequin file for GenBank submission. Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved.

  20. Genome-wide DNA methylation analysis in jejunum of Sus scrofa with intrauterine growth restriction.

    Science.gov (United States)

    Hu, Yue; Hu, Liang; Gong, Desheng; Lu, Hanlin; Xuan, Yue; Wang, Ru; Wu, De; Chen, Daiwen; Zhang, Keying; Gao, Fei; Che, Lianqiang

    2018-02-01

    Intrauterine growth restriction (IUGR) may elicit a series of postnatal body developmental and metabolic diseases due to their impaired growth and development in the mammalian embryo/fetus during pregnancy. In the present study, we hypothesized that IUGR may lead to abnormally regulated DNA methylation in the intestine, causing intestinal dysfunctions. We applied reduced representation bisulfite sequencing (RRBS) technology to study the jejunum tissues from four newborn IUGR piglets and their normal body weight (NBW) littermates. The results revealed extensively regional DNA methylation changes between IUGR/NBW pairs from different gilts, affecting dozens of genes. Hiseq-based bisulfite sequencing PCR (Hiseq-BSP) was used for validations of 19 genes with epigenetic abnormality, confirming three genes (AIFM1, MTMR1, and TWIST2) in extra samples. Furthermore, integrated analysis of these 19 genes with proteome data indicated that there were three main genes (BCAP31, IRAK1, and AIFM1) interacting with important immunity- or metabolism-related proteins, which could explain the potential intestinal dysfunctions of IUGR piglets. We conclude that IUGR can lead to disparate DNA methylation in the intestine and these changes may affect several important biological processes such as cell apoptosis, cell differentiation, and immunity, which provides more clues linking IUGR and its long-term complications.

  1. Flexible taxonomic assignment of ambiguous sequencing reads

    Directory of Open Access Journals (Sweden)

    Jansson Jesper

    2011-01-01

    Full Text Available Abstract Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.

  2. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    Science.gov (United States)

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  3. A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE.

    Directory of Open Access Journals (Sweden)

    Kevin P Keegan

    Full Text Available We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation, to assess sequencing quality (alternatively referred to as "noise" or "error" within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred. Here, DRISEE is applied to (non amplicon data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs, a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

  4. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

    KAUST Repository

    Shao, Mingfu; Ma, Jianzhu; Wang, Sheng

    2017-01-01

    Motivation: Reconstructing the full- length expressed transcripts (a. k. a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

  5. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

    KAUST Repository

    Shao, Mingfu

    2017-04-20

    Motivation: Reconstructing the full- length expressed transcripts (a. k. a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

  6. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    KAUST Repository

    Chen, Peng; Li, Jinyan; Limsoon, Wong; Kuwahara, Hiroyuki; Huang, Jianhua Z.; Gao, Xin

    2013-01-01

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

  7. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

    KAUST Repository

    Chen, Peng

    2013-07-23

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.

  8. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.

    Science.gov (United States)

    Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin

    2013-08-01

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.

  9. Interspecies hybridization on DNA resequencing microarrays: efficiency of sequence recovery and accuracy of SNP detection in human, ape, and codfish mitochondrial DNA genomes sequenced on a human-specific MitoChip

    Directory of Open Access Journals (Sweden)

    Carr Steven M

    2007-09-01

    Full Text Available Abstract Background Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific biodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes. Results In the human genome, 99.67% of the sequence was recovered with 100.0% accuracy. Accuracy of SNP identification declines log-linearly with sequence divergence from the reference, from 0.067 to 0.247 errors per SNP in the chimpanzee and gorilla genomes, respectively. Efficiency of sequence recovery declines with the increase of the number of interspecific SNPs in the 25b interval tiled by the reference oligonucleotides. In the gorilla genome, which differs from the human reference by 10%, and in which 46% of these 25b regions contain 3 or more SNP differences from the reference, only 88% of the sequence is recoverable. In the codfish genome, which differs from the reference by > 30%, less than 4% of the sequence is recoverable, in short islands ≥ 12b that are conserved between primates and fish. Conclusion Experimental DNAs bind inefficiently to homologous reference oligonucleotide sets on a re-sequencing microarray when their sequences differ by

  10. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  11. Seismic sequences in the Sombrero Seismic Zone

    Science.gov (United States)

    Pulliam, J.; Huerfano, V. A.; ten Brink, U.; von Hillebrandt, C.

    2007-05-01

    The northeastern Caribbean, in the vicinity of Puerto Rico and the Virgin Islands, has a long and well-documented history of devastating earthquakes and tsunamis, including major events in 1670, 1787, 1867, 1916, 1918, and 1943. Recently, seismicity has been concentrated to the north and west of the British Virgin Islands, in the region referred to as the Sombrero Seismic Zone by the Puerto Rico Seismic Network (PRSN). In the combined seismicity catalog maintained by the PRSN, several hundred small to moderate magnitude events can be found in this region prior to 2006. However, beginning in 2006 and continuing to the present, the rate of seismicity in the Sombrero suddenly increased, and a new locus of activity developed to the east of the previous location. Accurate estimates of seismic hazard, and the tsunamigenic potential of seismic events, depend on an accurate and comprehensive understanding of how strain is being accommodated in this corner region. Are faults locked and accumulating strain for release in a major event? Or is strain being released via slip over a diffuse system of faults? A careful analysis of seismicity patterns in the Sombrero region has the potential to both identify faults and modes of failure, provided the aggregation scheme is tuned to properly identify related events. To this end, we experimented with a scheme to identify seismic sequences based on physical and temporal proximity, under the assumptions that (a) events occur on related fault systems as stress is refocused by immediately previous events and (b) such 'stress waves' die out with time, so that two events that occur on the same system within a relatively short time window can be said to have a similar 'trigger' in ways that two nearby events that occurred years apart cannot. Patterns that emerge from the identification, temporal sequence, and refined locations of such sequences of events carry information about stress accommodation that is obscured by large clouds of

  12. Inactivation of the Tumor Suppressor Genes Causing the Hereditary Syndromes Predisposing to Head and Neck Cancer via Promoter Hypermethylation in Sporadic Head and Neck Cancers

    OpenAIRE

    Smith, Ian M.; Mithani, Suhail K.; Mydlarz, Wojciech K.; Chang, Steven S.; Califano, Joseph A.

    2010-01-01

    Fanconi anemia (FA) and dyskeratosis congenita (DC) are rare inherited syndromes that cause head and neck squamous cell cancer (HNSCC). Prior studies of inherited forms of cancer have been extremely important in elucidating tumor suppressor genes inactivated in sporadic tumors. Here, we studied whether sporadic tumors have epigenetic silencing of the genes causing the inherited forms of HNSCC. Using bisulfite sequencing, we investigated the incidence of promoter hypermethylation of the 17 Fan...

  13. Whole genome DNA methylation: beyond genes silencing

    OpenAIRE

    Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

    2016-01-01

    The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the ...

  14. A novel approach to sequence validating protein expression clones with automated decision making

    Directory of Open Access Journals (Sweden)

    Mohr Stephanie E

    2007-06-01

    Full Text Available Abstract Background Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation. Results We have developed an Automated Clone Evaluation (ACE system – the first comprehensive, multi-platform, web-based plasmid sequence verification software package. ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects, each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user acceptance criteria that specify the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed for use in other experiments. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms and clone finishing. Designed to manage thousands of clones simultaneously, ACE maintains a relational database to store information about clones at various completion stages, project processing parameters and acceptance criteria. In a direct comparison, the automated analysis by ACE took less time and was more accurate than a manual analysis of a 93 gene clone set. Conclusion ACE was designed to facilitate high throughput clone sequence

  15. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  16. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  17. Tracking TCRβ sequence clonotype expansions during antiviral therapy using high-throughput sequencing of the hypervariable region

    Directory of Open Access Journals (Sweden)

    Mark W Robinson

    2016-04-01

    future experiments to accurately address the question of whether T cell responses contribute to SVR upon antiviral therapy. This pipeline represents a novel technique to analyse T cell dynamics in situations where conventional antigen-dependent methods are limited due to suppression of T cell functions and highly diverse antigenic sequences.

  18. Accurate protein structure modeling using sparse NMR data and homologous structure information.

    Science.gov (United States)

    Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David

    2012-06-19

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.

  19. Impacts of visuomotor sequence learning methods on speed and accuracy: Starting over from the beginning or from the point of error.

    Science.gov (United States)

    Tanaka, Kanji; Watanabe, Katsumi

    2016-02-01

    The present study examined whether sequence learning led to more accurate and shorter performance time if people who are learning a sequence start over from the beginning when they make an error (i.e., practice the whole sequence) or only from the point of error (i.e., practice a part of the sequence). We used a visuomotor sequence learning paradigm with a trial-and-error procedure. In Experiment 1, we found fewer errors, and shorter performance time for those who restarted their performance from the beginning of the sequence as compared to those who restarted from the point at which an error occurred, indicating better learning of spatial and motor representations of the sequence. This might be because the learned elements were repeated when the next performance started over from the beginning. In subsequent experiments, we increased the occasions for the repetitions of learned elements by modulating the number of fresh start points in the sequence after errors. The results showed that fewer fresh start points were likely to lead to fewer errors and shorter performance time, indicating that the repetitions of learned elements enabled participants to develop stronger spatial and motor representations of the sequence. Thus, a single or two fresh start points in the sequence (i.e., starting over only from the beginning or from the beginning or midpoint of the sequence after errors) is likely to lead to more accurate and faster performance. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

    Science.gov (United States)

    Senol Cali, Damla; Kim, Jeremie S; Ghose, Saugata; Alkan, Can; Mutlu, Onur

    2018-04-02

    Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious

  1. Characterizing the D2 statistic: word matches in biological sequences.

    Science.gov (United States)

    Forêt, Sylvain; Wilson, Susan R; Burden, Conrad J

    2009-01-01

    Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT. The D2 statistic is the number of matches of words of k letters between two sequences. Recent advances have been made in the characterization of this statistic and in the approximation of its distribution. Here, these results are extended to the case of approximate word matches. We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the variance in the remaining cases. This enables the distribution of D2 to be approximated for typical situations arising in biological research. We apply these results to the identification of cis-regulatory modules, and show that this method detects such sequences with a high accuracy. The ability to approximate the distribution of D2 for both exact and approximate word matches will enable the use of this statistic in a more precise manner for sequence comparison, database searches, and identification of transcription factor binding sites.

  2. Fast sweeping algorithm for accurate solution of the TTI eikonal equation using factorization

    KAUST Repository

    bin Waheed, Umair

    2017-06-10

    Traveltime computation is essential for many seismic data processing applications and velocity analysis tools. High-resolution seismic imaging requires eikonal solvers to account for anisotropy whenever it significantly affects the seismic wave kinematics. Moreover, computation of auxiliary quantities, such as amplitude and take-off angle, rely on highly accurate traveltime solutions. However, the finite-difference based eikonal solution for a point-source initial condition has an upwind source-singularity at the source position, since the wavefront curvature is large near the source point. Therefore, all finite-difference solvers, even the high-order ones, show inaccuracies since the errors due to source-singularity spread from the source point to the whole computational domain. We address the source-singularity problem for tilted transversely isotropic (TTI) eikonal solvers using factorization. We solve a sequence of factored tilted elliptically anisotropic (TEA) eikonal equations iteratively, each time by updating the right hand side function. At each iteration, we factor the unknown TEA traveltime into two factors. One of the factors is specified analytically, such that the other factor is smooth in the source neighborhood. Therefore, through the iterative procedure we obtain accurate solution to the TTI eikonal equation. Numerical tests show significant improvement in accuracy due to factorization. The idea can be easily extended to compute accurate traveltimes for models with lower anisotropic symmetries, such as orthorhombic, monoclinic or even triclinic media.

  3. Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models

    Directory of Open Access Journals (Sweden)

    Stovgaard Kasper

    2010-08-01

    Full Text Available Abstract Background Genome sequencing projects have expanded the gap between the amount of known protein sequences and structures. The limitations of current high resolution structure determination methods make it unlikely that this gap will disappear in the near future. Small angle X-ray scattering (SAXS is an established low resolution method for routinely determining the structure of proteins in solution. The purpose of this study is to develop a method for the efficient calculation of accurate SAXS curves from coarse-grained protein models. Such a method can for example be used to construct a likelihood function, which is paramount for structure determination based on statistical inference. Results We present a method for the efficient calculation of accurate SAXS curves based on the Debye formula and a set of scattering form factors for dummy atom representations of amino acids. Such a method avoids the computationally costly iteration over all atoms. We estimated the form factors using generated data from a set of high quality protein structures. No ad hoc scaling or correction factors are applied in the calculation of the curves. Two coarse-grained representations of protein structure were investigated; two scattering bodies per amino acid led to significantly better results than a single scattering body. Conclusion We show that the obtained point estimates allow the calculation of accurate SAXS curves from coarse-grained protein models. The resulting curves are on par with the current state-of-the-art program CRYSOL, which requires full atomic detail. Our method was also comparable to CRYSOL in recognizing native structures among native-like decoys. As a proof-of-concept, we combined the coarse-grained Debye calculation with a previously described probabilistic model of protein structure, TorusDBN. This resulted in a significant improvement in the decoy recognition performance. In conclusion, the presented method shows great promise for

  4. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Directory of Open Access Journals (Sweden)

    Dewey Colin N

    2011-08-01

    Full Text Available Abstract Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost

  5. [Identification of common medicinal snakes in medicated liquor of Guangdong by COI barcode sequence].

    Science.gov (United States)

    Liao, Jing; Chao, Zhi; Zhang, Liang

    2013-11-01

    To identify the common snakes in medicated liquor of Guangdong using COI barcode sequence,and to test the feasibility. The COI barcode sequences of collected medicinal snakes were amplified and sequenced. The sequences combined with the data from GenBank were analyzed for divergence and building a neighbor-joining(NJ) tree with MEGA 5.0. The genetic distance and NJ tree demonstrated that there were 241 variable sites in these species, and the average (A + T) content of 56.2% was higher than the average (G + C) content of 43.7%. The maximum interspecific genetic distance was 0.2568, and the minimum was 0. 1519. In the NJ tree,each species formed a monophyletic clade with bootstrap supports of 100%. DNA barcoding identification method based on the COI sequence is accurate and can be applied to identify the common medicinal snakes.

  6. Protein sequencing via nanopore based devices: a nanofluidics perspective

    Science.gov (United States)

    Chinappi, Mauro; Cecconi, Fabio

    2018-05-01

    Proteins perform a huge number of central functions in living organisms, thus all the new techniques allowing their precise, fast and accurate characterization at single-molecule level certainly represent a burst in proteomics with important biomedical impact. In this review, we describe the recent progresses in the developing of nanopore based devices for protein sequencing. We start with a critical analysis of the main technical requirements for nanopore protein sequencing, summarizing some ideas and methodologies that have recently appeared in the literature. In the last sections, we focus on the physical modelling of the transport phenomena occurring in nanopore based devices. The multiscale nature of the problem is discussed and, in this respect, some of the main possible computational approaches are illustrated.

  7. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    Science.gov (United States)

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically

  8. Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models

    Directory of Open Access Journals (Sweden)

    Aeriel Belk

    2018-02-01

    Full Text Available Death investigations often include an effort to establish the postmortem interval (PMI in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head, gene markers (16S ribosomal RNA (rRNA, 18S rRNA, internal transcribed spacer regions (ITS, and taxonomic levels (sequence variants, species, genus, etc.. We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.

  9. Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

    Science.gov (United States)

    Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

    2016-04-01

    Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

  10. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    Science.gov (United States)

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  11. Reconstruction of ancestral RNA sequences under multiple structural constraints.

    Science.gov (United States)

    Tremblay-Savard, Olivier; Reinharz, Vladimir; Waldispühl, Jérôme

    2016-11-11

    Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA) families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors. In this paper, we introduce achARNement, a maximum parsimony approach that, given two alignments of homologous ncRNA families with consensus secondary structures and a phylogenetic tree, simultaneously calculates ancestral RNA sequences for these two families. We test our methodology on simulated data sets, and show that achARNement outperforms classical maximum parsimony approaches in terms of accuracy, but also reduces by several orders of magnitude the number of candidate sequences. To conclude this study, we apply our algorithms on the Glm clan and the FinP-traJ clan from the Rfam database. Our results show that our methods reconstruct small sets of high-quality candidate ancestors with better agreement to the two target structures than with classical approaches. Our program is freely available at: http://csb.cs.mcgill.ca/acharnement .

  12. Biophysical and structural considerations for protein sequence evolution

    Directory of Open Access Journals (Sweden)

    Grahnen Johan A

    2011-12-01

    Full Text Available Abstract Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.

  13. Application of Ammonium Persulfate for Selective Oxidation of Guanines for Nucleic Acid Sequencing

    Directory of Open Access Journals (Sweden)

    Yafen Wang

    2017-07-01

    Full Text Available Nucleic acids can be sequenced by a chemical procedure that partially damages the nucleotide positions at their base repetition. Many methods have been reported for the selective recognition of guanine. The accurate identification of guanine in both single and double regions of DNA and RNA remains a challenging task. Herein, we present a new, non-toxic and simple method for the selective recognition of guanine in both DNA and RNA sequences via ammonium persulfate modification. This strategy can be further successfully applied to the detection of 5-methylcytosine by using PCR.

  14. Large scale identification and categorization of protein sequences using structured logistic regression

    DEFF Research Database (Denmark)

    Pedersen, Bjørn Panella; Ifrim, Georgiana; Liboriussen, Poul

    2014-01-01

    Abstract Background Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well...... problem. Results Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known...... for further biochemical characterization and structural analysis....

  15. Intra-Genomic Internal Transcribed Spacer Region Sequence Heterogeneity and Molecular Diagnosis in Clinical Microbiology.

    Science.gov (United States)

    Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y

    2015-10-22

    Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.

  16. Using pseudoalignment and base quality to accurately quantify microbial community composition.

    Directory of Open Access Journals (Sweden)

    Mark Reppell

    2018-04-01

    Full Text Available Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.

  17. The Quest for Rare Variants: Pooled Multiplexed Next Generation Sequencing in Plants

    Directory of Open Access Journals (Sweden)

    Fabio eMarroni

    2012-06-01

    Full Text Available Next generation sequencing (NGS instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, only three research groups working in plant sciences have exploited this potentiality. They showed that pooled NGS can provide results in excellent agreement with those obtained by individual Sanger sequencing. Aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method we will explain in detail the variations in study design and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled next generation sequencing can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity and Tajima’s D. Finally we will discuss applications and future perspectives of the multiplexed NGS approach.

  18. 3D knee segmentation based on three MRI sequences from different planes.

    Science.gov (United States)

    Zhou, L; Chav, R; Cresson, T; Chartrand, G; de Guise, J

    2016-08-01

    In clinical practice, knee MRI sequences with 3.5~5 mm slice distance in sagittal, coronal, and axial planes are often requested for the knee examination since its acquisition is faster than high-resolution MRI sequence in a single plane, thereby reducing the probability of motion artifact. In order to take advantage of the three sequences from different planes, a 3D segmentation method based on the combination of three knee models obtained from the three sequences is proposed in this paper. In the method, the sub-segmentation is respectively performed with sagittal, coronal, and axial MRI sequence in the image coordinate system. With each sequence, an initial knee model is hierarchically deformed, and then the three deformed models are mapped to reference coordinate system defined by the DICOM standard and combined to obtain a patient-specific model. The experimental results verified that the three sub-segmentation results can complement each other, and their integration can compensate for the insufficiency of boundary information caused by 3.5~5 mm gap between consecutive slices. Therefore, the obtained patient-specific model is substantially more accurate than each sub-segmentation results.

  19. Adenosine stress cardiovascular magnetic resonance with variable-density spiral pulse sequences accurately detects coronary artery disease: initial clinical evaluation.

    Science.gov (United States)

    Salerno, Michael; Taylor, Angela; Yang, Yang; Kuruvilla, Sujith; Ragosta, Michael; Meyer, Craig H; Kramer, Christopher M

    2014-07-01

    Adenosine stress cardiovascular magnetic resonance perfusion imaging can be limited by motion-induced dark-rim artifacts, which may be mistaken for true perfusion abnormalities. A high-resolution variable-density spiral pulse sequence with a novel density compensation strategy has been shown to reduce dark-rim artifacts in first-pass perfusion imaging. We aimed to assess the clinical performance of adenosine stress cardiovascular magnetic resonance using this new perfusion sequence to detect obstructive coronary artery disease. Cardiovascular magnetic resonance perfusion imaging was performed during adenosine stress (140 μg/kg per minute) and at rest on a Siemens 1.5-T Avanto scanner in 41 subjects with chest pain scheduled for coronary angiography. Perfusion images were acquired during injection of 0.1 mmol/kg Gadolinium-diethylenetriaminepentacetate at 3 short-axis locations using a saturation recovery interleaved variable-density spiral pulse sequence. Significant stenosis was defined as >50% by quantitative coronary angiography. Two blinded reviewers evaluated the perfusion images for the presence of adenosine-induced perfusion abnormalities and assessed image quality using a 5-point scale (1 [poor] to 5 [excellent]). The prevalence of obstructive coronary artery disease by quantitative coronary angiography was 68%. The average sensitivity, specificity, and accuracy were 89%, 85%, and 88%, respectively, with a positive predictive value and negative predictive value of 93% and 79%, respectively. The average image quality score was 4.4±0.7, with only 1 study with more than mild dark-rim artifacts. There was good inter-reader reliability with a κ statistic of 0.67. Spiral adenosine stress cardiovascular magnetic resonance results in high diagnostic accuracy for the detection of obstructive coronary artery disease with excellent image quality and minimal dark-rim artifacts. © 2014 American Heart Association, Inc.

  20. Computational prediction of miRNA genes from small RNA sequencing data

    Directory of Open Access Journals (Sweden)

    Wenjing eKang

    2015-01-01

    Full Text Available Next-generation sequencing now for the first time allows researchers to gauge the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. miRNAs are 22 nucleotide small RNAs (sRNAs that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq, which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field.

  1. Predictive genomics: A cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data

    OpenAIRE

    Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor, Maureen

    2014-01-01

    We discuss a cancer hallmark network framework for modelling genome-sequencing data to predict cancer clonal evolution and associated clinical phenotypes. Strategies of using this framework in conjunction with genome sequencing data in an attempt to predict personalized drug targets, drug resistance, and metastasis for a cancer patient, as well as cancer risks for a healthy individual are discussed. Accurate prediction of cancer clonal evolution and clinical phenotypes will have substantial i...

  2. On precise ZAMSs, the solar color, and pre-main-sequence lithium depletion

    International Nuclear Information System (INIS)

    Vandenberg, D.A.; Poll, H.E.

    1989-01-01

    This paper describes a semiempirical main-sequence-fitting method for the determination of distances to stellar systems, which uses a ZAMS locus carefully normalized to the sun, and whose shape is defined by a quartic over the color range for (B-V)0 values between 0.2 and 1.0 such that the morphology of the Pleiades C-M diagram is accurately reproduced. Using this technique, distances were derived for a number of star clusters. It was found that the observed depletion of lithium among cool main-sequence stars in the Hyades and Pleiades can be matched quite well by the present models. Calculations also show that the depletion of Li at a fixed T(eff) along the main sequence is a sensitive function of Fe/H. 98 refs

  3. Next Generation Sequencing Methods for Diagnosis of Epilepsy Syndromes

    Directory of Open Access Journals (Sweden)

    Paul Dunn

    2018-02-01

    Full Text Available Epilepsy is a neurological disorder characterized by an increased predisposition for seizures. Although this definition suggests that it is a single disorder, epilepsy encompasses a group of disorders with diverse aetiologies and outcomes. A genetic basis for epilepsy syndromes has been postulated for several decades, with several mutations in specific genes identified that have increased our understanding of the genetic influence on epilepsies. With 70-80% of epilepsy cases identified to have a genetic cause, there are now hundreds of genes identified to be associated with epilepsy syndromes which can be analyzed using next generation sequencing (NGS techniques such as targeted gene panels, whole exome sequencing (WES and whole genome sequencing (WGS. For effective use of these methodologies, diagnostic laboratories and clinicians require information on the relevant workflows including analysis and sequencing depth to understand the specific clinical application and diagnostic capabilities of these gene sequencing techniques. As epilepsy is a complex disorder, the differences associated with each technique influence the ability to form a diagnosis along with an accurate detection of the genetic etiology of the disorder. In addition, for diagnostic testing, an important parameter is the cost-effectiveness and the specific diagnostic outcome of each technique. Here, we review these commonly used NGS techniques to determine their suitability for application to epilepsy genetic diagnostic testing.

  4. Towards accurate emergency response behavior

    International Nuclear Information System (INIS)

    Sargent, T.O.

    1981-01-01

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  5. Developing market class specific InDel markers from next generation sequence data in Phaseolus vulgaris L.

    Directory of Open Access Journals (Sweden)

    Samira eMafi Moghaddam

    2014-05-01

    Full Text Available Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L. is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6x to 5.1x coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2,687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate.

  6. Plann: A command-line application for annotating plastome sequences1

    Science.gov (United States)

    Huang, Daisie I.; Cronk, Quentin C. B.

    2015-01-01

    Premise of the study: Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Methods and Results: Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann’s output can be used in the National Center for Biotechnology Information’s tbl2asn to create a Sequin file for GenBank submission. Conclusions: Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved. PMID:26312193

  7. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

    Directory of Open Access Journals (Sweden)

    Li Gong-Hua

    2010-08-01

    Full Text Available Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB, thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm, has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96, sensitive (0.86, and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function and SPASM (a local structure alignment method; and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods. The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA. Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the

  8. Unenhanced breast MRI (STIR, T2-weighted TSE, DWIBS): An accurate and alternative strategy for detecting and differentiating breast lesions.

    Science.gov (United States)

    Telegrafo, Michele; Rella, Leonarda; Stabile Ianora, Amato Antonio; Angelelli, Giuseppe; Moschetta, Marco

    2015-10-01

    To assess the role of STIR, T2-weighted TSE and DWIBS sequences for detecting and characterizing breast lesions and to compare unenhanced (UE)-MRI results with contrast-enhanced (CE)-MRI and histological findings, having the latter as the reference standard. Two hundred eighty consecutive patients (age range, 27-73 years; mean age±standard deviation (SD), 48.8±9.8years) underwent MR examination with a diagnostic protocol including STIR, T2-weighted TSE, THRIVE and DWIBS sequences. Two radiologists blinded to both dynamic sequences and histological findings evaluated in consensus STIR, T2-weighted TSE and DWIBS sequences and after two weeks CE-MRI images searching for breast lesions. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy for UE-MRI and CE-MRI were calculated. UE-MRI results were also compared with CE- MRI. UE-MRI sequences obtained sensitivity, specificity, diagnostic accuracy, PPV and NPV values of 94%, 79%, 86%, 79% and 94%, respectively. CE-MRI sequences obtained sensitivity, specificity, diagnostic accuracy, PPV and NPV values of 98%, 83%, 90%, 84% and 98%, respectively. No statistically significant difference between UE-MRI and CE-MRI was found. Breast UE-MRI could represent an accurate diagnostic tool and a valid alternative to CE-MRI for evaluating breast lesions. STIR and DWIBS sequences allow to detect breast lesions while T2-weighted TSE sequences and ADC values could be useful for lesion characterization. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. [Study on method of tracking the active cells in image sequences based on EKF-PF].

    Science.gov (United States)

    Tang, Chunming; Liu, Ying

    2013-02-01

    In cell image sequences, due to the nonlinear and nonGaussian motion characteristics of active cells, the accurate prediction and tracking is still an unsolved problem. We applied extended Kalman particle filter (EKF-PF) here in our study, attempting to solve the problem. Firstly we confirmed the existence and positions of the active cells. Then we established a motion model and improved it via adding motion angle estimation. Next we predicted motion parameters, such as displacement, velocity, accelerated velocity and motion angle, in region centers of the cells being tracked. Finally we obtained the motion traces of active cells. There were fourteen active cells in three image sequences which have been tracked. The errors were less than 2.5 pixels when the prediction values were compared with actual values. It showed that the presented algorithm may basically reach the solution of accurate predition and tracking of the active cells.

  10. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    Science.gov (United States)

    Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

    2012-01-01

    Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  11. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    Directory of Open Access Journals (Sweden)

    Colin F Davenport

    Full Text Available Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer.The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  12. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  13. Spectrally accurate contour dynamics

    International Nuclear Information System (INIS)

    Van Buskirk, R.D.; Marcus, P.S.

    1994-01-01

    We present an exponentially accurate boundary integral method for calculation the equilibria and dynamics of piece-wise constant distributions of potential vorticity. The method represents contours of potential vorticity as a spectral sum and solves the Biot-Savart equation for the velocity by spectrally evaluating a desingularized contour integral. We use the technique in both an initial-value code and a newton continuation method. Our methods are tested by comparing the numerical solutions with known analytic results, and it is shown that for the same amount of computational work our spectral methods are more accurate than other contour dynamics methods currently in use

  14. The sequence and de novo assembly of the giant panda genome

    Science.gov (United States)

    Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2013-01-01

    Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809

  15. Discovery of candidate disease genes in ENU-induced mouse mutants by large-scale sequencing, including a splice-site mutation in nucleoredoxin.

    Directory of Open Access Journals (Sweden)

    Melissa K Boles

    2009-12-01

    Full Text Available An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated in an N-ethyl-N-nitrosourea (ENU mutation screen targeted to mouse Chromosome 11. Fifty-nine sequence variants were identified in 55 genes from 31 mutant lines. 39% of the lesions lie in coding sequences and create primarily missense mutations. The other 61% lie in noncoding regions, many of them in highly conserved sequences. A lesion in the perinatal lethal line l11Jus13 alters a consensus splice site of nucleoredoxin (Nxn, inserting 10 amino acids into the resulting protein. We conclude that point mutations can be accurately and sensitively recovered by large-scale sequencing, and that conserved noncoding regions should be included for disease mutation identification. Only seven of the candidate genes we report have been previously targeted by mutation in mice or rats, showing that despite ongoing efforts to functionally annotate genes in the mammalian genome, an enormous gap remains between phenotype and function. Our data show that the classical positional mapping approach of disease mutation identification can be extended to large target regions using high-throughput sequencing.

  16. Reconstruction of ancestral RNA sequences under multiple structural constraints

    Directory of Open Access Journals (Sweden)

    Olivier Tremblay-Savard

    2016-11-01

    Full Text Available Abstract Background Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors. Methods In this paper, we introduce achARNement, a maximum parsimony approach that, given two alignments of homologous ncRNA families with consensus secondary structures and a phylogenetic tree, simultaneously calculates ancestral RNA sequences for these two families. Results We test our methodology on simulated data sets, and show that achARNement outperforms classical maximum parsimony approaches in terms of accuracy, but also reduces by several orders of magnitude the number of candidate sequences. To conclude this study, we apply our algorithms on the Glm clan and the FinP-traJ clan from the Rfam database. Conclusions Our results show that our methods reconstruct small sets of high-quality candidate ancestors with better agreement to the two target structures than with classical approaches. Our program is freely available at: http://csb.cs.mcgill.ca/acharnement .

  17. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  18. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Directory of Open Access Journals (Sweden)

    Alberto Ferrarini

    Full Text Available Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES and whole genome sequencing (WGS viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  19. Distinct DNA methylomes of newborns and centenarians

    DEFF Research Database (Denmark)

    Heyn, Holger; Li, Ning; Ferreira, Humberto J.

    2012-01-01

    Human aging cannot be fully understood in terms of the constrained genetic setting. Epigenetic drift is an alternative means of explaining age-associated alterations. To address this issue, we performed whole-genome bisulfite sequencing (WGBS) of newborn and centenarian genomes. The centenarian DNA......-age individuals demonstrated DNA methylomes in the crossroad between the newborn and the nonagenarian/centenarian groups. Our study constitutes a unique DNA methylation analysis of the extreme points of human life at a single-nucleotide resolution level....

  20. Differential DNA methylation profile of key genes in malignant prostate epithelial cells transformed by inorganic arsenic or cadmium

    International Nuclear Information System (INIS)

    Pelch, Katherine E.; Tokar, Erik J.; Merrick, B. Alex; Waalkes, Michael P.

    2015-01-01

    Previous work shows altered methylation patterns in inorganic arsenic (iAs)- or cadmium (Cd)-transformed epithelial cells. Here, the methylation status near the transcriptional start site was assessed in the normal human prostate epithelial cell line (RWPE-1) that was malignantly transformed by 10 μM Cd for 11 weeks (CTPE) or 5 μM iAs for 29 weeks (CAsE-PE), at which time cells showed multiple markers of acquired cancer phenotype. Next generation sequencing of the transcriptome of CAsE-PE cells identified multiple dysregulated genes. Of the most highly dysregulated genes, five genes that can be relevant to the carcinogenic process (S100P, HYAL1, NTM, NES, ALDH1A1) were chosen for an in-depth analysis of the DNA methylation profile. DNA was isolated, bisulfite converted, and combined bisulfite restriction analysis was used to identify differentially methylated CpG sites, which was confirmed with bisulfite sequencing. Four of the five genes showed differential methylation in transformants relative to control cells that was inversely related to altered gene expression. Increased expression of HYAL1 (> 25-fold) and S100P (> 40-fold) in transformants was correlated with hypomethylation near the transcriptional start site. Decreased expression of NES (> 15-fold) and NTM (> 1000-fold) in transformants was correlated with hypermethylation near the transcriptional start site. ALDH1A1 expression was differentially expressed in transformed cells but was not differentially methylated relative to control. In conclusion, altered gene expression observed in Cd and iAs transformed cells may result from altered DNA methylation status. - Highlights: • Cd and iAs are known human carcinogens, yet neither appears directly mutagenic. • Prior data suggest epigenetic modification plays a role in Cd or iAs induced cancer. • Altered methylation of four misregulated genes was found in Cd or iAs transformants. • The resulting altered gene expression may be relevant to cellular

  1. Differential DNA methylation profile of key genes in malignant prostate epithelial cells transformed by inorganic arsenic or cadmium

    Energy Technology Data Exchange (ETDEWEB)

    Pelch, Katherine E.; Tokar, Erik J. [National Toxicology Program Laboratory, Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709 (United States); Merrick, B. Alex [Molecular Toxicology and Informatics Group, Biomolecular Screening Branch, Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Morrisville, NC 27560 (United States); Waalkes, Michael P., E-mail: waalkes@niehs.nih.gov [National Toxicology Program Laboratory, Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709 (United States)

    2015-08-01

    Previous work shows altered methylation patterns in inorganic arsenic (iAs)- or cadmium (Cd)-transformed epithelial cells. Here, the methylation status near the transcriptional start site was assessed in the normal human prostate epithelial cell line (RWPE-1) that was malignantly transformed by 10 μM Cd for 11 weeks (CTPE) or 5 μM iAs for 29 weeks (CAsE-PE), at which time cells showed multiple markers of acquired cancer phenotype. Next generation sequencing of the transcriptome of CAsE-PE cells identified multiple dysregulated genes. Of the most highly dysregulated genes, five genes that can be relevant to the carcinogenic process (S100P, HYAL1, NTM, NES, ALDH1A1) were chosen for an in-depth analysis of the DNA methylation profile. DNA was isolated, bisulfite converted, and combined bisulfite restriction analysis was used to identify differentially methylated CpG sites, which was confirmed with bisulfite sequencing. Four of the five genes showed differential methylation in transformants relative to control cells that was inversely related to altered gene expression. Increased expression of HYAL1 (> 25-fold) and S100P (> 40-fold) in transformants was correlated with hypomethylation near the transcriptional start site. Decreased expression of NES (> 15-fold) and NTM (> 1000-fold) in transformants was correlated with hypermethylation near the transcriptional start site. ALDH1A1 expression was differentially expressed in transformed cells but was not differentially methylated relative to control. In conclusion, altered gene expression observed in Cd and iAs transformed cells may result from altered DNA methylation status. - Highlights: • Cd and iAs are known human carcinogens, yet neither appears directly mutagenic. • Prior data suggest epigenetic modification plays a role in Cd or iAs induced cancer. • Altered methylation of four misregulated genes was found in Cd or iAs transformants. • The resulting altered gene expression may be relevant to cellular

  2. Epigenetic silencing of MicroRNA-503 regulates FANCA expression in non-small cell lung cancer cell.

    Science.gov (United States)

    Li, Ning; Zhang, Fangfang; Li, Suyun; Zhou, Suzhen

    2014-02-21

    It is reported that MicroRNA-503 (miR-503) regulates cell apoptosis, and thus modulates the resistance of non-small cell lung cancer cells (NSCLC) to cisplatin. However, the exact role of miR-503 in NSCLC remains unknown. In the present study, the level of miR-503 expression in NSCLC was evaluated using realtime PCR, and the DNA methylation status within miR-503 promoter was analyzed by Combined Bisulfite Restriction Analysis (COBRA) or bisulfite-treated DNA sequencing assays (BSP). We found that the expression of miR-503 was significantly decreased in NSCLC tissues compared to normal tissues. A statistically significant inverse association was found between miR-503 methylation status and expression of the miR-503 in tumor tissues (PFANCA) gene and represses its expression at the transcriptional level. Taken together, our results suggest that miR-503 regulates the resistance of non-small cell lung cancer cells to cisplatin at least in part by targeting FANCA. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Hypercapnic normalization of BOLD fMRI: comparison across field strengths and pulse sequences

    DEFF Research Database (Denmark)

    Cohen, Eric R.; Rostrup, Egill; Sidaros, Karam

    2004-01-01

    to be more accurately localized and quantified based on changes in venous blood oxygenation alone. The normalized BOLD signal induced by the motor task was consistent across different magnetic fields and pulse sequences, and corresponded well with cerebral blood flow measurements. Our data suggest...... size, as well as experimental, such as pulse sequence and static magnetic field strength (B(0)). Thus, it is difficult to compare task-induced fMRI signals across subjects, field strengths, and pulse sequences. This problem can be overcome by normalizing the neural activity-induced BOLD fMRI response...... for global stimulation, subjects breathed a 5% CO(2) gas mixture. Under all conditions, voxels containing primarily large veins and those containing primarily active tissue (i.e., capillaries and small veins) showed distinguishable behavior after hypercapnic normalization. This allowed functional activity...

  4. Cloning, sequencing, and expression of interferon-γ from elk in North America

    Science.gov (United States)

    Sweeney, Steven J.; Emerson, Carlene; Eriks, Inge S.

    2001-01-01

    Eradication of Mycobacterium bovis relies on accurate detection of infected animals, including potential domestic and wildlife reservoirs. Available diagnostic tests lack the sensitivity and specificity necessary for accurate detection, particularly in infected wildlife populations. Recently, an in vitro diagnostic test for cattle which measures plasma interferon-gamma (IFN-γ) levels in blood following in vitro incubation with M. bovis purified protein derivative has been enveloped. This test appears to have increased sensitivity over traditional testing. Unfortunately, it does not detect IFN-γ from Cervidae. To begin to address this problem, the IFN-γ gene from elk (Cervus elaphus) was cloned, sequenced, expressed, and characterized. cDNA was cloned from mitogen stimulated peripheral blood mononuclear cells. The predicted amino acid (aa) sequence was compared to known sequences from cattle, sheep, goats, red deer (Cervus elaphus), humans, and mice. Biological activity of the recombinant elk IFN-γ (rElkIFN-γ) was confirmed in a vesicular stomatitis virus cytopathic effect reduction assay. Production of monoclonal antibodies to IFN-γ epitopes conserved between ruminant species could provide an important tool for the development of reliable, practical diagnostic assays for detection of a delayed type hypersensitivity response to a variety of persistent infectious agents in ruminants, including M. bovis and Brucella abortus. Moreover, development of these reagents will aid investigators in studies to explore immunological responses of elk that are associated with resistance to infectious diseases.

  5. Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations.

    Directory of Open Access Journals (Sweden)

    Brian B Tuch

    Full Text Available Due to growing throughput and shrinking cost, massively parallel sequencing is rapidly becoming an attractive alternative to microarrays for the genome-wide study of gene expression and copy number alterations in primary tumors. The sequencing of transcripts (RNA-Seq should offer several advantages over microarray-based methods, including the ability to detect somatic mutations and accurately measure allele-specific expression. To investigate these advantages we have applied a novel, strand-specific RNA-Seq method to tumors and matched normal tissue from three patients with oral squamous cell carcinomas. Additionally, to better understand the genomic determinants of the gene expression changes observed, we have sequenced the tumor and normal genomes of one of these patients. We demonstrate here that our RNA-Seq method accurately measures allelic imbalance and that measurement on the genome-wide scale yields novel insights into cancer etiology. As expected, the set of genes differentially expressed in the tumors is enriched for cell adhesion and differentiation functions, but, unexpectedly, the set of allelically imbalanced genes is also enriched for these same cancer-related functions. By comparing the transcriptomic perturbations observed in one patient to his underlying normal and tumor genomes, we find that allelic imbalance in the tumor is associated with copy number mutations and that copy number mutations are, in turn, strongly associated with changes in transcript abundance. These results support a model in which allele-specific deletions and duplications drive allele-specific changes in gene expression in the developing tumor.

  6. In vivo quantitative NMR imaging of fruit tissues during growth using Spoiled Gradient Echo sequence

    DEFF Research Database (Denmark)

    Kenouche, S.; Perrier, M.; Bertin, N.

    2014-01-01

    of this study was to design a robust and accurate quantitative measurement method based on NMR imaging combined with contrast agent (CA) for mapping and quantifying water transport in growing cherry tomato fruits. A multiple flip-angle Spoiled Gradient Echo (SGE) imaging sequence was used to evaluate...

  7. Real-time UAV trajectory generation using feature points matching between video image sequences

    Science.gov (United States)

    Byun, Younggi; Song, Jeongheon; Han, Dongyeob

    2017-09-01

    Unmanned aerial vehicles (UAVs), equipped with navigation systems and video capability, are currently being deployed for intelligence, reconnaissance and surveillance mission. In this paper, we present a systematic approach for the generation of UAV trajectory using a video image matching system based on SURF (Speeded up Robust Feature) and Preemptive RANSAC (Random Sample Consensus). Video image matching to find matching points is one of the most important steps for the accurate generation of UAV trajectory (sequence of poses in 3D space). We used the SURF algorithm to find the matching points between video image sequences, and removed mismatching by using the Preemptive RANSAC which divides all matching points to outliers and inliers. The inliers are only used to determine the epipolar geometry for estimating the relative pose (rotation and translation) between image sequences. Experimental results from simulated video image sequences showed that our approach has a good potential to be applied to the automatic geo-localization of the UAVs system

  8. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.

    Science.gov (United States)

    Hargreaves, Adam D; Mulley, John F

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

  9. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  10. MRI investigation of normal fetal lung maturation using signal intensities on different imaging sequences

    International Nuclear Information System (INIS)

    Balassy, Csilla; Kasprian, Gregor; Weber, Michael; Hoermann, Marcus; Prayer, Daniela; Brugger, Peter C.; Csapo, Bence; Mittermayer, Christoph

    2007-01-01

    To purpose of this paper is to study the relation between normal lung maturation signal and changes in intensity ratios (SIR) and to determine which magnetic resonance imaging sequence provides the strongest correlation of normal lung SIs with gestational age. 126 normal singleton pregnancies (20-37 weeks) were examined with a 1.5 Tesla unit. Mean SIs for lungs, liver, and gastric fluid were assessed on six different sequences, and SIRs of lung/liver (LLSIR) and lung/gastric fluid (LGSIR) were correlated with gestational age for each sequence. To evaluate the feasibility of SIRs in the prediction of the state of the lung maturity, accuracy of the predicted SIRs (D*) was measured by calculating relative residuals (D*-D)/D for each sequence. LLSIRs showed significant changes in every sequence (p<0.05), while LGSIRs only on two sequences. Significant differences were shown for the mean of absolute residuals for both LLSIRs (p<0.001) and for LGSIRs (p=0.003). Relative residuals of LLSIRs were significantly smaller on T1-weighted sequence, whereas they were significantly higher for LGSIRs on FLAIR sequence. Fetal liver seems to be adequate reference for the investigation of lung maturation. T1-weighted sequence was the most accurate for the measurement of the lung SIs; thus, we propose to determine LLSIR on T1-weighted sequence when evaluating lung development. (orig.)

  11. MRI investigation of normal fetal lung maturation using signal intensities on different imaging sequences

    Energy Technology Data Exchange (ETDEWEB)

    Balassy, Csilla; Kasprian, Gregor; Weber, Michael; Hoermann, Marcus; Prayer, Daniela [Medical University of Vienna, Department of Radiology, Vienna (Austria); Brugger, Peter C. [Medical University of Vienna, Center of Anatomy and Cell Biology, Vienna (Austria); Csapo, Bence [Medical University of Vienna, Department of Obstetrics and Gyneocology, Vienna (Austria); Mittermayer, Christoph [Medical University of Vienna, Department of Pediatrics, Vienna (Austria)

    2007-03-15

    To purpose of this paper is to study the relation between normal lung maturation signal and changes in intensity ratios (SIR) and to determine which magnetic resonance imaging sequence provides the strongest correlation of normal lung SIs with gestational age. 126 normal singleton pregnancies (20-37 weeks) were examined with a 1.5 Tesla unit. Mean SIs for lungs, liver, and gastric fluid were assessed on six different sequences, and SIRs of lung/liver (LLSIR) and lung/gastric fluid (LGSIR) were correlated with gestational age for each sequence. To evaluate the feasibility of SIRs in the prediction of the state of the lung maturity, accuracy of the predicted SIRs (D*) was measured by calculating relative residuals (D*-D)/D for each sequence. LLSIRs showed significant changes in every sequence (p<0.05), while LGSIRs only on two sequences. Significant differences were shown for the mean of absolute residuals for both LLSIRs (p<0.001) and for LGSIRs (p=0.003). Relative residuals of LLSIRs were significantly smaller on T1-weighted sequence, whereas they were significantly higher for LGSIRs on FLAIR sequence. Fetal liver seems to be adequate reference for the investigation of lung maturation. T1-weighted sequence was the most accurate for the measurement of the lung SIs; thus, we propose to determine LLSIR on T1-weighted sequence when evaluating lung development. (orig.)

  12. Thickness of patellofemoral articular cartilage as measured on MR imaging: sequence comparison of accuracy, reproducibility, and interobserver variation

    Energy Technology Data Exchange (ETDEWEB)

    Van Leersum, M.D. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Schweitzer, M.E. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Gannon, F. [Dept. of Pathology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Vinitski, S. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Finkel, G. [Dept. of Pathology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Mitchell, D.G. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States)

    1995-08-01

    This study was undertaken to assess the accuracy, precision, and reliability of magnetic resonance (MR) measurements of articular cartilage. Fifteen cadaveric patellas were imaged in the axial plane at 1.5 T. Gradient echo and fat-suppressed FSE, T2-weighted, proton density, and T1-weighted sequences were performed. We measured each 5-mm section separately at three standardized positions, giving a total of 900 measurements. These findings were correlated with independently performed measurements of the corresponding anatomic sections. A hundred random measurements were also evaluated for reproducibility and interobserver variation. Although all sequences were highly accurate, the T1-weighted images were the most accurate, with a mean difference of 0.25 mm and a correlation coefficient of 0.85. All sequences were also highly reproducible with little inter-observer variation. In an attempt to improve the accuracy of the MR measurements further, we retrospectively evaluated all measurements with discrepancies greater than 1 mm from the specimen. All these differences were attributable to focal defects causing exaggeration of the thickness on MR imaging. (orig.)

  13. Thickness of patellofemoral articular cartilage as measured on MR imaging: sequence comparison of accuracy, reproducibility, and interobserver variation

    International Nuclear Information System (INIS)

    Van Leersum, M.D.; Schweitzer, M.E.; Gannon, F.; Vinitski, S.; Finkel, G.; Mitchell, D.G.

    1995-01-01

    This study was undertaken to assess the accuracy, precision, and reliability of magnetic resonance (MR) measurements of articular cartilage. Fifteen cadaveric patellas were imaged in the axial plane at 1.5 T. Gradient echo and fat-suppressed FSE, T2-weighted, proton density, and T1-weighted sequences were performed. We measured each 5-mm section separately at three standardized positions, giving a total of 900 measurements. These findings were correlated with independently performed measurements of the corresponding anatomic sections. A hundred random measurements were also evaluated for reproducibility and interobserver variation. Although all sequences were highly accurate, the T1-weighted images were the most accurate, with a mean difference of 0.25 mm and a correlation coefficient of 0.85. All sequences were also highly reproducible with little inter-observer variation. In an attempt to improve the accuracy of the MR measurements further, we retrospectively evaluated all measurements with discrepancies greater than 1 mm from the specimen. All these differences were attributable to focal defects causing exaggeration of the thickness on MR imaging. (orig.)

  14. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    Science.gov (United States)

    Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  15. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    Directory of Open Access Journals (Sweden)

    Heewon Seo

    Full Text Available The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7% of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  16. Towards clinical molecular diagnosis of inherited cardiac conditions: a comparison of bench-top genome DNA sequencers.

    Directory of Open Access Journals (Sweden)

    Xinzhong Li

    Full Text Available Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations.We evaluated two next-generation sequencing (NGS platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm, and sequenced on the MiSeq (Illumina and Ion Torrent PGM (Life Technologies. Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%. At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs, with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS.MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias

  17. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    Science.gov (United States)

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  18. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    Science.gov (United States)

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.

  19. The quest for rare variants: pooled multiplexed next generation sequencing in plants.

    Science.gov (United States)

    Marroni, Fabio; Pinosio, Sara; Morgante, Michele

    2012-01-01

    Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by individual Sanger sequencing. The aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method, we will explain in detail the possible experimental and analytical approaches and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled NGS can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity, and Tajima's D. Finally, we will discuss applications and future perspectives of the multiplexed NGS approach.

  20. Accurate Evaluation of Quantum Integrals

    Science.gov (United States)

    Galant, D. C.; Goorvitch, D.; Witteborn, Fred C. (Technical Monitor)

    1995-01-01

    Combining an appropriate finite difference method with Richardson's extrapolation results in a simple, highly accurate numerical method for solving a Schrodinger's equation. Important results are that error estimates are provided, and that one can extrapolate expectation values rather than the wavefunctions to obtain highly accurate expectation values. We discuss the eigenvalues, the error growth in repeated Richardson's extrapolation, and show that the expectation values calculated on a crude mesh can be extrapolated to obtain expectation values of high accuracy.

  1. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  2. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  3. Detecting false positive sequence homology: a machine learning approach.

    Science.gov (United States)

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

    2016-02-24

    Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.

  4. Information-Theoretic Properties of Auditory Sequences Dynamically Influence Expectation and Memory.

    Science.gov (United States)

    Agres, Kat; Abdallah, Samer; Pearce, Marcus

    2018-01-01

    A basic function of cognition is to detect regularities in sensory input to facilitate the prediction and recognition of future events. It has been proposed that these implicit expectations arise from an internal predictive coding model, based on knowledge acquired through processes such as statistical learning, but it is unclear how different types of statistical information affect listeners' memory for auditory stimuli. We used a combination of behavioral and computational methods to investigate memory for non-linguistic auditory sequences. Participants repeatedly heard tone sequences varying systematically in their information-theoretic properties. Expectedness ratings of tones were collected during three listening sessions, and a recognition memory test was given after each session. Information-theoretic measures of sequential predictability significantly influenced listeners' expectedness ratings, and variations in these properties had a significant impact on memory performance. Predictable sequences yielded increasingly better memory performance with increasing exposure. Computational simulations using a probabilistic model of auditory expectation suggest that listeners dynamically formed a new, and increasingly accurate, implicit cognitive model of the information-theoretic structure of the sequences throughout the experimental session. Copyright © 2017 Cognitive Science Society, Inc.

  5. Next generation sequencing reveals the hidden diversity of zooplankton assemblages.

    Directory of Open Access Journals (Sweden)

    Penelope K Lindeque

    become increasingly attractive in future if sequence reference libraries of accurately identified individuals are better populated.

  6. DNA Methylation Status of the Interspersed Repetitive Sequences for LINE-1, Alu, HERV-E, and HERV-K in Trabeculectomy Specimens from Glaucoma Eyes

    Directory of Open Access Journals (Sweden)

    Sunee Chansangpetch

    2018-01-01

    Full Text Available Background/Aims. Epigenetic mechanisms via DNA methylation may be related to glaucoma pathogenesis. This study aimed to determine the global DNA methylation level of the trabeculectomy specimens among patients with different types of glaucoma and normal subjects. Methods. Trabeculectomy sections from 16 primary open-angle glaucoma (POAG, 12 primary angle-closure glaucoma (PACG, 16 secondary glaucoma patients, and 10 normal controls were assessed for DNA methylation using combined-bisulfite restriction analysis. The percentage of global methylation level of the interspersed repetitive sequences for LINE-1, Alu, HERV-E, and HERV-K were compared between the 4 groups. Results. There were no significant differences in the methylation for LINE-1 and HERV-E between patients and normal controls. For the Alu marker, the methylation was significantly lower in all types of glaucoma patients compared to controls (POAG 52.19% versus control 52.83%, p=0.021; PACG 51.50% versus control, p=0.005; secondary glaucoma 51.95% versus control, p=0.014, whereas the methylation level of HERV-K was statistically higher in POAG patients compared to controls (POAG 49.22% versus control 48.09%, p=0.017. Conclusions. The trabeculectomy sections had relative DNA hypomethylation of Alu in all glaucoma subtypes and relative DNA hypermethylation of HERV-K in POAG patients. These methylation changes may lead to the fibrotic phenotype in the trabecular meshwork.

  7. Approaching system equilibrium with accurate or not accurate feedback information in a two-route system

    Science.gov (United States)

    Zhao, Xiao-mei; Xie, Dong-fan; Li, Qi

    2015-02-01

    With the development of intelligent transport system, advanced information feedback strategies have been developed to reduce traffic congestion and enhance the capacity. However, previous strategies provide accurate information to travelers and our simulation results show that accurate information brings negative effects, especially in delay case. Because travelers prefer to the best condition route with accurate information, and delayed information cannot reflect current traffic condition but past. Then travelers make wrong routing decisions, causing the decrease of the capacity and the increase of oscillations and the system deviating from the equilibrium. To avoid the negative effect, bounded rationality is taken into account by introducing a boundedly rational threshold BR. When difference between two routes is less than the BR, routes have equal probability to be chosen. The bounded rationality is helpful to improve the efficiency in terms of capacity, oscillation and the gap deviating from the system equilibrium.

  8. Elongation Factor-1α Accurately Reconstructs Relationships Amongst Psyllid Families (Hemiptera: Psylloidea), with Possible Diagnostic Implications.

    Science.gov (United States)

    Martoni, Francesco; Bulman, Simon R; Pitman, Andrew; Armstrong, Karen F

    2017-12-05

    The superfamily Psylloidea (Hemiptera: Sternorrhyncha) lacks a robust multigene phylogeny. This impedes our understanding of the evolution of this group of insects and, consequently, an accurate identification of individuals, of their plant host associations, and their roles as vectors of economically important plant pathogens. The conserved nuclear gene elongation factor-1 alpha (EF-1α) has been valuable as a higher-level phylogenetic marker in insects and it has also been widely used to investigate the evolution of intron/exon structure. To explore evolutionary relationships among Psylloidea, polymerase chain reaction amplification and nucleotide sequencing of a 250-bp EF-1α gene fragment was applied to psyllids belonging to five different families. Introns were detected in three individuals belonging to two families. The nine genera belonging to the family Aphalaridae all lacked introns, highlighting the possibility of using intron presence/absence as a diagnostic tool at a family level. When paired with cytochrome oxidase I gene sequences, the 250 bp EF-1α sequence appeared to be a very promising higher-level phylogenetic marker for psyllids. © The Author(s) 2017. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Directory of Open Access Journals (Sweden)

    Suldbold Jargalmaa

    2017-07-01

    Full Text Available Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II. These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species.

  10. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    Science.gov (United States)

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  11. Clinical sequencing in leukemia with the assistance of artificial intelligence.

    Science.gov (United States)

    Tojo, Arinobu

    2017-01-01

    Next generation sequencing (NGS) of cancer genomes is now becoming a prerequisite for accurate diagnosis and proper treatment in clinical oncology. Because the genomic regions for NGS expand from a certain set of genes to the whole exome or whole genome, the resulting sequence data becomes incredibly enormous and makes it quite laborious to translate the genomic data into medicine, so-called annotation and curation. We organized a clinical sequencing team and established a bidirectional (bed-to-bench and bench-to-bed) system to integrate clinical and genomic data for hematological malignancies. We also started a collaborative research project with IBM Japan to adopt the artificial intelligence Watson for Genomics (WfG) to the pipeline of medical informatics. Genomic DNA was prepared from malignant as well as normal tissues in each patient and subjected to NGS. Sequence data was analyzed using an in-house semi-automated pipeline in combination with WfG, which was used to identify candidate driver mutations and relevant pathways from which applicable drug information was deduced. Currently, we have analyzed more than 150 patients with hematological disorders, including AML and ALL, and obtained many informative findings. In this presentation, I will introduce some of the achievements we have made so far.

  12. Cardiac tumors: optimal cardiac MR sequences and spectrum of imaging appearances.

    LENUS (Irish Health Repository)

    O'Donnell, David H

    2012-02-01

    OBJECTIVE: This article reviews the optimal cardiac MRI sequences for and the spectrum of imaging appearances of cardiac tumors. CONCLUSION: Recent technologic advances in cardiac MRI have resulted in the rapid acquisition of images of the heart with high spatial and temporal resolution and excellent myocardial tissue characterization. Cardiac MRI provides optimal assessment of the location, functional characteristics, and soft-tissue features of cardiac tumors, allowing accurate differentiation of benign and malignant lesions.

  13. A theoretical justification for single molecule peptide sequencing.

    Directory of Open Access Journals (Sweden)

    Jagannath Swaminathan

    2015-02-01

    Full Text Available The proteomes of cells, tissues, and organisms reflect active cellular processes and change continuously in response to intracellular and extracellular cues. Deep, quantitative profiling of the proteome, especially if combined with mRNA and metabolite measurements, should provide an unprecedented view of cell state, better revealing functions and interactions of cell components. Molecular diagnostics and biomarker discovery should benefit particularly from the accurate quantification of proteomes, since complex diseases like cancer change protein abundances and modifications. Currently, shotgun mass spectrometry is the primary technology for high-throughput protein identification and quantification; while powerful, it lacks high sensitivity and coverage. We draw parallels with next-generation DNA sequencing and propose a strategy, termed fluorosequencing, for sequencing peptides in a complex protein sample at the level of single molecules. In the proposed approach, millions of individual fluorescently labeled peptides are visualized in parallel, monitoring changing patterns of fluorescence intensity as N-terminal amino acids are sequentially removed, and using the resulting fluorescence signatures (fluorosequences to uniquely identify individual peptides. We introduce a theoretical foundation for fluorosequencing and, by using Monte Carlo computer simulations, we explore its feasibility, anticipate the most likely experimental errors, quantify their potential impact, and discuss the broad potential utility offered by a high-throughput peptide sequencing technology.

  14. Development of taxon-specific sequence characterized amplified region (SCAR) markers based on actin sequences and DNA amplification fingerprinting (DAF): a case study in the Phoma exigua species complex.

    Science.gov (United States)

    Aveskamp, Maikel M; Woudenberg, Joyce H C; de Gruyter, Johannes; Turco, Elena; Groenewald, Johannes Z; Crous, Pedro W

    2009-05-01

    Phoma exigua is considered to be an assemblage of at least nine varieties that are mainly distinguished on the basis of host specificity and pathogenicity. However, these varieties are also reported to be weak pathogens and secondary invaders on non-host tissue. In practice, it is difficult to distinguish P. exigua from its close relatives and to correctly identify isolates up to the variety level, because of their low genetic variation and high morphological similarity. Because of quarantine issues and phytosanitary measures, a robust DNA-based tool is required for accurate and rapid identification of the separate taxa in this species complex. The present study therefore aims to develop such a tool based on unique nucleotide sequence identifiers. More than 60 strains of P. exigua and related species were compared in terms of partial actin gene sequences, or analysed using DNA amplification fingerprinting (DAF) with short, arbitrary, mini-hairpin primers. Fragments in the fingerprint unique to a single taxon were identified, purified and sequenced. Alignment of the sequence data and subsequent primer trials led to the identification of taxon-specific sequence characterized amplified regions (SCARs), and to a set of specific oligonucleotide combinations that can be used to identify these organisms in plant quarantine inspections.

  15. Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate-Pair Sequencing

    DEFF Research Database (Denmark)

    Aristidou, Constantia; Koufaris, Costas; Theodosiou, Athina

    2017-01-01

    Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged...... and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any...... can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been...

  16. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

    Directory of Open Access Journals (Sweden)

    Ehsaneddin Asgari

    Full Text Available We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec to refer to biological sequences in general with protein-vectors (ProtVec for proteins (amino-acid sequences and gene-vectors (GeneVec for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups. Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be

  17. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.

    Science.gov (United States)

    Guo, Yan; Dai, Yulin; Yu, Hui; Zhao, Shilin; Samuels, David C; Shyr, Yu

    2017-03-01

    Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re-sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results. Copyright © 2017. Published by Elsevier Inc.

  18. Recombination in the 5' leader of murine leukemia virus is accurate and influenced by sequence identity with a strong bias toward the kissing-loop dimerization region

    DEFF Research Database (Denmark)

    Mikkelsen, J G; Lund, Anders Henrik; Duch, M

    1998-01-01

    during minus-strand DNA synthesis occurred within defined areas of the genome and did not lead to misincorporations at the crossover site. The nonrandom distribution of recombination sites did not reflect a bias for specific sites due to selection at the level of marker gene expression. We address...... whether template switching is affected by the length of sequence identity, by palindromic sequences, and/or by putative stem-loop structures. Sixteen of 24 sites of recombination colocalized with the kissing-loop dimerization region, and we propose that RNA-RNA interactions between palindromic sequences...

  19. PCR-Free Enrichment of Mitochondrial DNA from Human Blood and Cell Lines for High Quality Next-Generation DNA Sequencing.

    Directory of Open Access Journals (Sweden)

    Meetha P Gould

    Full Text Available Recent advances in sequencing technology allow for accurate detection of mitochondrial sequence variants, even those in low abundance at heteroplasmic sites. Considerable sequencing cost savings can be achieved by enriching samples for mitochondrial (relative to nuclear DNA. Reduction in nuclear DNA (nDNA content can also help to avoid false positive variants resulting from nuclear mitochondrial sequences (numts. We isolate intact mitochondrial organelles from both human cell lines and blood components using two separate methods: a magnetic bead binding protocol and differential centrifugation. DNA is extracted and further enriched for mitochondrial DNA (mtDNA by an enzyme digest. Only 1 ng of the purified DNA is necessary for library preparation and next generation sequence (NGS analysis. Enrichment methods are assessed and compared using mtDNA (versus nDNA content as a metric, measured by using real-time quantitative PCR and NGS read analysis. Among the various strategies examined, the optimal is differential centrifugation isolation followed by exonuclease digest. This strategy yields >35% mtDNA reads in blood and cell lines, which corresponds to hundreds-fold enrichment over baseline. The strategy also avoids false variant calls that, as we show, can be induced by the long-range PCR approaches that are the current standard in enrichment procedures. This optimization procedure allows mtDNA enrichment for efficient and accurate massively parallel sequencing, enabling NGS from samples with small amounts of starting material. This will decrease costs by increasing the number of samples that may be multiplexed, ultimately facilitating efforts to better understand mitochondria-related diseases.

  20. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

    Science.gov (United States)

    Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza

    2013-10-01

    Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

  1. An integrative variant analysis suite for whole exome next-generation sequencing data

    Directory of Open Access Journals (Sweden)

    Challis Danny

    2012-01-01

    Full Text Available Abstract Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454. The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%. Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.

  2. Efficient use of unlabeled data for protein sequence classification: a comparative study.

    Science.gov (United States)

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-04-29

    Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.

  3. High-Throughput Sequencing, a VersatileWeapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    Directory of Open Access Journals (Sweden)

    Ségolène Caboche

    2014-04-01

    Full Text Available The recent progresses of high-throughput sequencing (HTS technologies enable easy and cost-reduced access to whole genome sequencing (WGS or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose.

  4. TagDust2: a generic method to extract reads from sequencing data.

    Science.gov (United States)

    Lassmann, Timo

    2015-01-28

    Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial. Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection. Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .

  5. Next Generation Sequencing of Actinobacteria for the Discovery of Novel Natural Products

    Science.gov (United States)

    Gomez-Escribano, Juan Pablo; Alt, Silke; Bibb, Mervyn J.

    2016-01-01

    Like many fields of the biosciences, actinomycete natural products research has been revolutionised by next-generation DNA sequencing (NGS). Hundreds of new genome sequences from actinobacteria are made public every year, many of them as a result of projects aimed at identifying new natural products and their biosynthetic pathways through genome mining. Advances in these technologies in the last five years have meant not only a reduction in the cost of whole genome sequencing, but also a substantial increase in the quality of the data, having moved from obtaining a draft genome sequence comprised of several hundred short contigs, sometimes of doubtful reliability, to the possibility of obtaining an almost complete and accurate chromosome sequence in a single contig, allowing a detailed study of gene clusters and the design of strategies for refactoring and full gene cluster synthesis. The impact that these technologies are having in the discovery and study of natural products from actinobacteria, including those from the marine environment, is only starting to be realised. In this review we provide a historical perspective of the field, analyse the strengths and limitations of the most relevant technologies, and share the insights acquired during our genome mining projects. PMID:27089350

  6. Sulfite-induced protein radical formation in LPS aerosol-challenged mice: Implications for sulfite sensitivity in human lung disease

    Directory of Open Access Journals (Sweden)

    Ashutosh Kumar

    2018-05-01

    Full Text Available Exposure to (bisulfite (HSO3– and sulfite (SO32– has been shown to induce a wide range of adverse reactions in sensitive individuals. Studies have shown that peroxidase-catalyzed oxidation of (bisulfite leads to formation of several reactive free radicals, such as sulfur trioxide anion (.SO3–, peroxymonosulfate (–O3SOO., and especially the sulfate (SO4. – anion radicals. One such peroxidase in neutrophils is myeloperoxidase (MPO, which has been shown to form protein radicals. Although formation of (bisulfite-derived protein radicals is documented in isolated neutrophils, its involvement and role in in vivo inflammatory processes, has not been demonstrated. Therefore, we aimed to investigate (bisulfite-derived protein radical formation and its mechanism in LPS aerosol-challenged mice, a model of non-atopic asthma. Using immuno-spin trapping to detect protein radical formation, we show that, in the presence of (bisulfite, neutrophils present in bronchoalveolar lavage and in the lung parenchyma exhibit, MPO-catalyzed oxidation of MPO to a protein radical. The absence of radical formation in LPS-challenged MPO- or NADPH oxidase-knockout mice indicates that sulfite-derived radical formation is dependent on both MPO and NADPH oxidase activity. In addition to its oxidation by the MPO-catalyzed pathway, (bisulfite is efficiently detoxified to sulfate by the sulfite oxidase (SOX pathway, which forms sulfate in a two-electron oxidation reaction. Since SOX activity in rodents is much higher than in humans, to better model sulfite toxicity in humans, we induced SOX deficiency in mice by feeding them a low molybdenum diet with tungstate. We found that mice treated with the SOX deficiency diet prior to exposure to (bisulfite had much higher protein radical formation than mice with normal SOX activity. Altogether, these results demonstrate the role of MPO and NADPH oxidase in (bisulfite-derived protein radical formation and show the involvement of

  7. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

    Directory of Open Access Journals (Sweden)

    Adam D. Hargreaves

    2015-11-01

    Full Text Available Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete and Sanger-based ESTs (15/29. We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

  8. Targeted genotyping-by-sequencing permits cost-effective identification and discrimination of pasture grass species and cultivars.

    Science.gov (United States)

    Pembleton, Luke W; Drayton, Michelle C; Bain, Melissa; Baillie, Rebecca C; Inch, Courtney; Spangenberg, German C; Wang, Junping; Forster, John W; Cogan, Noel O I

    2016-05-01

    A targeted amplicon-based genotyping-by-sequencing approach has permitted cost-effective and accurate discrimination between ryegrass species (perennial, Italian and inter-species hybrid), and identification of cultivars based on bulked samples. Perennial ryegrass and Italian ryegrass are the most important temperate forage species for global agriculture, and are represented in the commercial pasture seed market by numerous cultivars each composed of multiple highly heterozygous individuals. Previous studies have identified difficulties in the use of morphophysiological criteria to discriminate between these two closely related taxa. Recently, a highly multiplexed single nucleotide polymorphism (SNP)-based genotyping assay has been developed that permits accurate differentiation between both species and cultivars of ryegrasses at the genetic level. This assay has since been further developed into an amplicon-based genotyping-by-sequencing (GBS) approach implemented on a second-generation sequencing platform, allowing accelerated throughput and ca. sixfold reduction in cost. Using the GBS approach, 63 cultivars of perennial, Italian and interspecific hybrid ryegrasses, as well as intergeneric Festulolium hybrids, were genotyped. The genetic relationships between cultivars were interpreted in terms of known breeding histories and indistinct species boundaries within the Lolium genus, as well as suitability of current cultivar registration methodologies. An example of applicability to quality assurance and control (QA/QC) of seed purity is also described. Rapid, low-cost genotypic assays provide new opportunities for breeders to more fully explore genetic diversity within breeding programs, allowing the combination of novel unique genetic backgrounds. Such tools also offer the potential to more accurately define cultivar identities, allowing protection of varieties in the commercial market and supporting processes of cultivar accreditation and quality assurance.

  9. Comparison of PIV with 4D-Flow in a physiological accurate flow phantom

    Science.gov (United States)

    Sansom, Kurt; Balu, Niranjan; Liu, Haining; Aliseda, Alberto; Yuan, Chun; Canton, Maria De Gador

    2016-11-01

    Validation of 4D MRI flow sequences with planar particle image velocimetry (PIV) is performed in a physiologically-accurate flow phantom. A patient-specific phantom of a carotid artery is connected to a pulsatile flow loop to simulate the 3D unsteady flow in the cardiovascular anatomy. Cardiac-cycle synchronized MRI provides time-resolved 3D blood velocity measurements in clinical tool that is promising but lacks a robust validation framework. PIV at three different Reynolds numbers (540, 680, and 815, chosen based on +/- 20 % of the average velocity from the patient-specific CCA waveform) and four different Womersley numbers (3.30, 3.68, 4.03, and 4.35, chosen to reflect a physiological range of heart rates) are compared to 4D-MRI measurements. An accuracy assessment of raw velocity measurements and a comparison of estimated and measureable flow parameters such as wall shear stress, fluctuating velocity rms, and Lagrangian particle residence time, will be presented, with justification for their biomechanics relevance to the pathophysiology of arterial disease: atherosclerosis and intimal hyperplasia. Lastly, the framework is applied to a new 4D-Flow MRI sequence and post processing techniques to provide a quantitative assessment with the benchmarked data. Department of Education GAANN Fellowship.

  10. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Science.gov (United States)

    Robins-Browne, Roy M.; Holt, Kathryn E.; Ingle, Danielle J.; Hocking, Dianna M.; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods. PMID:27917373

  11. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    Science.gov (United States)

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to

  12. Synthesis Study Of Surfactants Sodium Ligno Sulphonate (SLS From Biomass Waste Using Fourier Transform Infra Red (FTIR

    Directory of Open Access Journals (Sweden)

    Priyanto Slamet

    2018-01-01

    Full Text Available Lignin from biomass waste (Black Liquor was isolated by using sulfuric acid 25% and sodium hydroxide solutions 2N. The obtained lignin was reacted with Sodium Bisulfite to Sodium Ligno Sulfonate (SLS. The best result was achieved at 80 ° C, pH 9, ratio of lignin and bisulfite 4: 1, for 2 hours, and 290 rpm stirring rate. The result of lignin formed was sulfonated using Sodium Bisulfite (NaHSO3 to Sodium Ligno Sulfonate (SLS whose results were tested by the role of groups in peak formation by FTIR and compared to the spectrum of Sodium Ligno Sulfonate made from pure Lignin (commercial reacted with the commercial Sodium Bisulfite. The result can be seen by the typical functional groups present in the SLS.

  13. An ongoing earthquake sequence near Dhaka, Bangladesh, from regional recordings

    Science.gov (United States)

    Howe, M.; Mondal, D. R.; Akhter, S. H.; Kim, W.; Seeber, L.; Steckler, M. S.

    2013-12-01

    of this population to earthquakes is amplified by poor infrastructure and building codes. The only event in this sequence included in the global Centroid Moment Tensor (CMT) catalog is a Mw 5.1 strike-slip event 18 km deep. At least 10 events in this sequence have been recorded globally (ISC). Many more events from the sequence have been recorded by a regional array of seismographs we have operated in Bangladesh since 2007. We apply several techniques to these data to explore source parameters and dimensions of seismogenesis in this sequence. We present both double-difference relocations and waveform modeling, which provide constraints on the source characteristics. Using the Mw 5.1 and other regional events as calibration, we obtain source parameters for several other events in the sequence. This sequence is ideal for double-difference relocation techniques because the source-receiver paths of the events in the sequence, recorded regionally, are very similar. The event relocation enables us to obtain accurate estimates of fault dimensions of this source. By combining accurate spatial dimensions of the source, the depth range of seismogenesis for the source zone, and well-constrained source parameters of events within the sequence, it we assess the maximum size of possible ruptures in this source.

  14. Molecular Profiling of Microbial Communities from Contaminated Sources: Use of Subtractive Cloning Methods and rDNA Spacer Sequences; FINAL

    International Nuclear Information System (INIS)

    Robb, Frank T.

    2001-01-01

    The major objective of this research was to provide appropriate sequences and assemble a DNA array of oligonucleotides to be used for rapid profiling of microbial populations from polluted areas and other areas of interest. The sequences to be assigned to the DNA array were chosen from cloned genomic DNA taken from groundwater sites having well characterized pollutant histories at Hanford Nuclear Plant and Lawrence Livermore Site 300. Glass-slide arrays were made and tested; and a new multiplexed, bead-based method was developed that uses nucleic acid hybridization on the surface of microscopic polystyrene spheres to identify specific sequences in heterogeneous mixtures of DNA sequences. The test data revealed considerable strain variation between sample sites showing a striking distribution of sequences. It also suggests that diversity varies greatly with bioremediation, and that there are many bacterial intergenic spacer region sequences that can indicate its effects. The bead method exhibited superior sequence discrimination and has features for easier and more accurate measurement

  15. Epigenetic Loss of MLH1 Expression in Normal Human Hematopoietic Stem Cell Clones is Defined by the Promoter CpG Methylation Pattern Observed by High-Throughput Methylation Specific Sequencing.

    Science.gov (United States)

    Kenyon, Jonathan; Nickel-Meester, Gabrielle; Qing, Yulan; Santos-Guasch, Gabriela; Drake, Ellen; PingfuFu; Sun, Shuying; Bai, Xiaodong; Wald, David; Arts, Eric; Gerson, Stanton L

    Normal human hematopoietic stem and progenitor cells (HPC) lose expression of MLH1 , an important mismatch repair (MMR) pathway gene, with age. Loss of MMR leads to replication dependent mutational events and microsatellite instability observed in secondary acute myelogenous leukemia and other hematologic malignancies. Epigenetic CpG methylation upstream of the MLH1 promoter is a contributing factor to acquired loss of MLH1 expression in tumors of the epithelia and proximal mucosa. Using single molecule high-throughput bisulfite sequencing we have characterized the CpG methylation landscape from -938 to -337 bp upstream of the MLH1 transcriptional start site (position +0), from 30 hematopoietic colony forming cell clones (CFC) either expressing or not expressing MLH1 . We identify a correlation between MLH1 promoter methylation and loss of MLH1 expression. Additionally, using the CpG site methylation frequencies obtained in this study we were able to generate a classification algorithm capable of sorting the expressing and non-expressing CFC. Thus, as has been previously described for many tumor cell types, we report for the first time a correlation between the loss of MLH1 expression and increased MLH1 promoter methylation in CFC derived from CD34 + selected hematopoietic stem and progenitor cells.

  16. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-08-01

    RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.

  17. Genetic diagnosis of Duchenne and Becker muscular dystrophy using next-generation sequencing technology: comprehensive mutational search in a single platform.

    Science.gov (United States)

    Lim, Byung Chan; Lee, Seungbok; Shin, Jong-Yeon; Kim, Jong-Il; Hwang, Hee; Kim, Ki Joong; Hwang, Yong Seung; Seo, Jeong-Sun; Chae, Jong Hee

    2011-11-01

    Duchenne muscular dystrophy or Becker muscular dystrophy might be a suitable candidate disease for application of next-generation sequencing in the genetic diagnosis because the complex mutational spectrum and the large size of the dystrophin gene require two or more analytical methods and have a high cost. The authors tested whether large deletions/duplications or small mutations, such as point mutations or short insertions/deletions of the dystrophin gene, could be predicted accurately in a single platform using next-generation sequencing technology. A custom solution-based target enrichment kit was designed to capture whole genomic regions of the dystrophin gene and other muscular-dystrophy-related genes. A multiplexing strategy, wherein four differently bar-coded samples were captured and sequenced together in a single lane of the Illumina Genome Analyser, was applied. The study subjects were 25 16 with deficient dystrophin expression without a large deletion/duplication and 9 with a known large deletion/duplication. Nearly 100% of the exonic region of the dystrophin gene was covered by at least eight reads with a mean read depth of 107. Pathogenic small mutations were identified in 15 of the 16 patients without a large deletion/duplication. Using these 16 patients as the standard, the authors' method accurately predicted the deleted or duplicated exons in the 9 patients with known mutations. Inclusion of non-coding regions and paired-end sequence analysis enabled accurate identification by increasing the read depth and providing information about the breakpoint junction. The current method has an advantage for the genetic diagnosis of Duchenne muscular dystrophy and Becker muscular dystrophy wherein a comprehensive mutational search may be feasible using a single platform.

  18. MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

    Science.gov (United States)

    Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B

    2017-03-01

    to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at 99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The

  19. REMap: Operon map of M. tuberculosis based on RNA sequence data.

    Science.gov (United States)

    Pelly, Shaaretha; Winglee, Kathryn; Xia, Fang Fang; Stevens, Rick L; Bishai, William R; Lamichhane, Gyanu

    2016-07-01

    A map of the transcriptional organization of genes of an organism is a basic tool that is necessary to understand and facilitate a more accurate genetic manipulation of the organism. Operon maps are largely generated by computational prediction programs that rely on gene conservation and genome architecture and may not be physiologically relevant. With the widespread use of RNA sequencing (RNAseq), the prediction of operons based on actual transcriptome sequencing rather than computational genomics alone is much needed. Here, we report a validated operon map of Mycobacterium tuberculosis, developed using RNAseq data from both the exponential and stationary phases of growth. At least 58.4% of M. tuberculosis genes are organized into 749 operons. Our prediction algorithm, REMap (RNA Expression Mapping of operons), considers the many cases of transcription coverage of intergenic regions, and avoids dependencies on functional annotation and arbitrary assumptions about gene structure. As a result, we demonstrate that REMap is able to more accurately predict operons, especially those that contain long intergenic regions or functionally unrelated genes, than previous operon prediction programs. The REMap algorithm is publicly available as a user-friendly tool that can be readily modified to predict operons in other bacteria. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. gCUP: rapid GPU-based HIV-1 co-receptor usage prediction for next-generation sequencing.

    Science.gov (United States)

    Olejnik, Michael; Steuwer, Michel; Gorlatch, Sergei; Heider, Dominik

    2014-11-15

    Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. The source code can be downloaded at http://www.heiderlab.de d.heider@wz-straubing.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  2. TE-Locate: A Tool to Locate and Group Transposable Element Occurrences Using Paired-End Next-Generation Sequencing Data

    OpenAIRE

    Platzer, Alexander; Nizhynska, Viktoria; Long, Quan

    2012-01-01

    Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool that uses paired-end reads to identify the novel locations of known TEs. TE-Locate can utilize either a database of TE sequences, or annotated TEs within the reference sequence of interest. This makes TE-Locate usefu...

  3. Effects of stacking sequence on impact damage resistance and residual strength for quasi-isotropic laminates

    Science.gov (United States)

    Dost, Ernest F.; Ilcewicz, Larry B.; Avery, William B.; Coxon, Brian R.

    1991-01-01

    Residual strength of an impacted composite laminate is dependent on details of the damage state. Stacking sequence was varied to judge its effect on damage caused by low-velocity impact. This was done for quasi-isotropic layups of a toughened composite material. Experimental observations on changes in the impact damage state and postimpact compressive performance were presented for seven different laminate stacking sequences. The applicability and limitations of analysis compared to experimental results were also discussed. Postimpact compressive behavior was found to be a strong function of the laminate stacking sequence. This relationship was found to depend on thickness, stacking sequence, size, and location of sublaminates that comprise the impact damage state. The postimpact strength for specimens with a relatively symmetric distribution of damage through the laminate thickness was accurately predicted by models that accounted for sublaminate stability and in-plane stress redistribution. An asymmetric distribution of damage in some laminate stacking sequences tended to alter specimen stability. Geometrically nonlinear finite element analysis was used to predict this behavior.

  4. Computational identification of MoRFs in protein sequences.

    Science.gov (United States)

    Malhis, Nawar; Gsponer, Jörg

    2015-06-01

    Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is the binding of molecular recognition features (MoRFs) to globular protein domains in a process known as a disorder-to-order transition. Predicting the location of MoRFs in protein sequences with high accuracy remains an important computational challenge. In this study, we introduce MoRFCHiBi, a new computational approach for fast and accurate prediction of MoRFs in protein sequences. MoRFCHiBi combines the outcomes of two support vector machine (SVM) models that take advantage of two different kernels with high noise tolerance. The first, SVMS, is designed to extract maximal information from the general contrast in amino acid compositions between MoRFs, their surrounding regions (Flanks), and the remainders of the sequences. The second, SVMT, is used to identify similarities between regions in a query sequence and MoRFs of the training set. We evaluated the performance of our predictor by comparing its results with those of two currently available MoRF predictors, MoRFpred and ANCHOR. Using three test sets that have previously been collected and used to evaluate MoRFpred and ANCHOR, we demonstrate that MoRFCHiBi outperforms the other predictors with respect to different evaluation metrics. In addition, MoRFCHiBi is downloadable and fast, which makes it useful as a component in other computational prediction tools. http://www.chibi.ubc.ca/morf/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. Model morphing and sequence assignment after molecular replacement.

    Science.gov (United States)

    Terwilliger, Thomas C; Read, Randy J; Adams, Paul D; Brunger, Axel T; Afonine, Pavel V; Hung, Li-Wei

    2013-11-01

    A procedure termed `morphing' for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.

  6. Approaches for the accurate definition of geological time boundaries

    Science.gov (United States)

    Schaltegger, Urs; Baresel, Björn; Ovtcharova, Maria; Goudemand, Nicolas; Bucher, Hugo

    2015-04-01

    Which strategies lead to the most precise and accurate date of a given geological boundary? Geological units are usually defined by the occurrence of characteristic taxa and hence boundaries between these geological units correspond to dramatic faunal and/or floral turnovers and they are primarily defined using first or last occurrences of index species, or ideally by the separation interval between two consecutive, characteristic associations of fossil taxa. These boundaries need to be defined in a way that enables their worldwide recognition and correlation across different stratigraphic successions, using tools as different as bio-, magneto-, and chemo-stratigraphy, and astrochronology. Sedimentary sequences can be dated in numerical terms by applying high-precision chemical-abrasion, isotope-dilution, thermal-ionization mass spectrometry (CA-ID-TIMS) U-Pb age determination to zircon (ZrSiO4) in intercalated volcanic ashes. But, though volcanic activity is common in geological history, ashes are not necessarily close to the boundary we would like to date precisely and accurately. In addition, U-Pb zircon data sets may be very complex and difficult to interpret in terms of the age of ash deposition. To overcome these difficulties we use a multi-proxy approach we applied to the precise and accurate dating of the Permo-Triassic and Early-Middle Triassic boundaries in South China. a) Dense sampling of ashes across the critical time interval and a sufficiently large number of analysed zircons per ash sample can guarantee the recognition of all system complexities. Geochronological datasets from U-Pb dating of volcanic zircon may indeed combine effects of i) post-crystallization Pb loss from percolation of hydrothermal fluids (even using chemical abrasion), with ii) age dispersion from prolonged residence of earlier crystallized zircon in the magmatic system. As a result, U-Pb dates of individual zircons are both apparently younger and older than the depositional age

  7. The complete sequence of human chromosome 5

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Martin, Joel; Terry, Astrid; Couronne, Olivier; Grimwood, Jane; Lowry, State; Gordon, Laurie A.; Scott, Duncan; Xie, Gary; Huang, Wayne; Hellsten, Uffe; Tran-Gyamfi, Mary; She, Xinwei; Prabhakar, Shyam; Aerts, Andrea; Altherr, Michael; Bajorek, Eva; Black, Stacey; Branscomb, Elbert; Caoile, Chenier; Challacombe, Jean F.; Chan, Yee Man; Denys, Mirian; Detter, Chris; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstenin, David; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Israni, Sanjay; Jett, Jamie; Kadner, Kristen; Kimbal, Heather; Kobayashi, Arthur; Lopez, Frederick; Lou, Yunian; Martinez, Diego; Medina, Catherine; Morgan, Jenna; Nandkeshwar, Richard; Noonan, James P.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Priest, James; Ramirez, Lucia; Rash, Sam; Retterer, James; Rodriguez, Alex; Rogers, Stephanie; Salamov, Asaf; Salazar, Angelica; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wheeler, Jeremy; Wu, Kevin; Yang, Joan; Dickson, Mark; Cheng, Jan-Fang; Eichler, Evan E.; Olsen, Anne; Pennacchio, Len A.; Rokhsar, Daniel S.; Richardson, Paul; Lucas, Susan M.; Myers, Richard M.; Rubin, Edward M.

    2004-04-15

    Chromosome 5 is one of the largest human chromosomes yet has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding and syntenic conservation with non-mammalian vertebrates, suggesting they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-encoding genes including the protocadherin and interleukin gene families and the first complete versions of each of the large chromosome 5 specific internal duplications. These duplications are very recent evolutionary events and play a likely mechanistic role, since deletions of these regions are the cause of debilitating disorders including spinal muscular atrophy (SMA).

  8. Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator.

    Science.gov (United States)

    Lin, Y; Rajan, V; Moret, B M E

    2011-09-01

    The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.

  9. Is sequence awareness mandatory for perceptual sequence learning: An assessment using a pure perceptual sequence learning design.

    Science.gov (United States)

    Deroost, Natacha; Coomans, Daphné

    2018-02-01

    We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

    Science.gov (United States)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

    2018-06-05

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

  11. Bayesian reconstruction of photon interaction sequences for high-resolution PET detectors

    Energy Technology Data Exchange (ETDEWEB)

    Pratx, Guillem; Levin, Craig S [Molecular Imaging Program at Stanford, Department of Radiology, Stanford, CA (United States)], E-mail: cslevin@stanford.edu

    2009-09-07

    Realizing the full potential of high-resolution positron emission tomography (PET) systems involves accurately positioning events in which the annihilation photon deposits all its energy across multiple detector elements. Reconstructing the complete sequence of interactions of each photon provides a reliable way to select the earliest interaction because it ensures that all the interactions are consistent with one another. Bayesian estimation forms a natural framework to maximize the consistency of the sequence with the measurements while taking into account the physics of {gamma}-ray transport. An inherently statistical method, it accounts for the uncertainty in the measured energy and position of each interaction. An algorithm based on maximum a posteriori (MAP) was evaluated for computer simulations. For a high-resolution PET system based on cadmium zinc telluride detectors, 93.8% of the recorded coincidences involved at least one photon multiple-interactions event (PMIE). The MAP estimate of the first interaction was accurate for 85.2% of the single photons. This represents a two-fold reduction in the number of mispositioned events compared to minimum pair distance, a simpler yet efficient positioning method. The point-spread function of the system presented lower tails and higher peak value when MAP was used. This translated into improved image quality, which we quantified by studying contrast and spatial resolution gains.

  12. Stress triggering and the Canterbury earthquake sequence

    Science.gov (United States)

    Steacy, Sandy; Jiménez, Abigail; Holden, Caroline

    2014-01-01

    The Canterbury earthquake sequence, which includes the devastating Christchurch event of 2011 February, has to date led to losses of around 40 billion NZ dollars. The location and severity of the earthquakes was a surprise to most inhabitants as the seismic hazard model was dominated by an expected Mw > 8 earthquake on the Alpine fault and an Mw 7.5 earthquake on the Porters Pass fault, 150 and 80 km to the west of Christchurch. The sequence to date has included an Mw = 7.1 earthquake and 3 Mw ≥ 5.9 events which migrated from west to east. Here we investigate whether the later events are consistent with stress triggering and whether a simple stress map produced shortly after the first earthquake would have accurately indicated the regions where the subsequent activity occurred. We find that 100 per cent of M > 5.5 earthquakes occurred in positive stress areas computed using a slip model for the first event that was available within 10 d of its occurrence. We further find that the stress changes at the starting points of major slip patches of post-Darfield main events are consistent with triggering although this is not always true at the hypocentral locations. Our results suggest that Coulomb stress changes contributed to the evolution of the Canterbury sequence and we note additional areas of increased stress in the Christchurch region and on the Porters Pass fault.

  13. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  14. Model morphing and sequence assignment after molecular replacement

    Energy Technology Data Exchange (ETDEWEB)

    Terwilliger, Thomas C., E-mail: terwilliger@lanl.gov [Los Alamos National Laboratory, Mail Stop M888, Los Alamos, NM 87545 (United States); Read, Randy J. [University of Cambridge, Cambridge Institute for Medical Research, Cambridge CB2 0XY (United Kingdom); Adams, Paul D. [Lawrence Berkeley National Laboratory, One Cyclotron Road, Bldg 64R0121, Berkeley, CA 94720 (United States); Brunger, Axel T. [Stanford University, 318 Campus Drive West, Stanford, CA 94305 (United States); Afonine, Pavel V. [Lawrence Berkeley National Laboratory, One Cyclotron Road, Bldg 64R0121, Berkeley, CA 94720 (United States); Hung, Li-Wei [Los Alamos National Laboratory, Mail Stop M888, Los Alamos, NM 87545 (United States)

    2013-11-01

    A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence. A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.

  15. Model morphing and sequence assignment after molecular replacement

    International Nuclear Information System (INIS)

    Terwilliger, Thomas C.; Read, Randy J.; Adams, Paul D.; Brunger, Axel T.; Afonine, Pavel V.; Hung, Li-Wei

    2013-01-01

    A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence. A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package

  16. Chromosomal deletion, promoter hypermethylation and downregulation of FYN in prostate cancer

    DEFF Research Database (Denmark)

    Sørensen, Karina Dalsgaard; Borre, Michael; Ørntoft, Torben Falck

    2008-01-01

    prostate hyperplasia (BPH), as well as in 6 prostate adenocarcinoma cell lines compared with that in BPH-1 cells. By immunohistochemistry, FYN protein was detected in nonmalignant prostate epithelium, but not in cancerous glands. Moreover, genomic bisulfite sequencing revealed frequent aberrant methylation......, consistent with gene silencing, was detected in 2 of 18 tumors (11%). No methylation was found in BPH-1 cells or nonmalignant prostate tissue samples (0 of 7). These results indicate that FYN is downregulated in prostate cancer by both chromosomal deletion and promoter hypermethylation, and therefore...

  17. Accurate Fall Detection in a Top View Privacy Preserving Configuration.

    Science.gov (United States)

    Ricciuti, Manola; Spinsante, Susanna; Gambi, Ennio

    2018-05-29

    Fall detection is one of the most investigated themes in the research on assistive solutions for aged people. In particular, a false-alarm-free discrimination between falls and non-falls is indispensable, especially to assist elderly people living alone. Current technological solutions designed to monitor several types of activities in indoor environments can guarantee absolute privacy to the people that decide to rely on them. Devices integrating RGB and depth cameras, such as the Microsoft Kinect, can ensure privacy and anonymity, since the depth information is considered to extract only meaningful information from video streams. In this paper, we propose an accurate fall detection method investigating the depth frames of the human body using a single device in a top-view configuration, with the subjects located under the device inside a room. Features extracted from depth frames train a classifier based on a binary support vector machine learning algorithm. The dataset includes 32 falls and 8 activities considered for comparison, for a total of 800 sequences performed by 20 adults. The system showed an accuracy of 98.6% and only one false positive.

  18. Histoimmunogenetics Markup Language 1.0: Reporting next generation sequencing-based HLA and KIR genotyping.

    Science.gov (United States)

    Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin

    2015-12-01

    We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Automatic discovery of cross-family sequence features associated with protein function

    Directory of Open Access Journals (Sweden)

    Krings Andrea

    2006-01-01

    Full Text Available Abstract Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for

  20. Evaluation of MRI sequences for quantitative T1 brain mapping

    Science.gov (United States)

    Tsialios, P.; Thrippleton, M.; Glatz, A.; Pernet, C.

    2017-11-01

    T1 mapping constitutes a quantitative MRI technique finding significant application in brain imaging. It allows evaluation of contrast uptake, blood perfusion, volume, providing a more specific biomarker of disease progression compared to conventional T1-weighted images. While there are many techniques for T1-mapping there is a wide range of reported T1-values in tissues, raising the issue of protocols reproducibility and standardization. The gold standard for obtaining T1-maps is based on acquiring IR-SE sequence. Widely used alternative sequences are IR-SE-EPI, VFA (DESPOT), DESPOT-HIFI and MP2RAGE that speed up scanning and fitting procedures. A custom MRI phantom was used to assess the reproducibility and accuracy of the different methods. All scans were performed using a 3T Siemens Prisma scanner. The acquired data processed using two different codes. The main difference was observed for VFA (DESPOT) which grossly overestimated T1 relaxation time by 214 ms [126 270] compared to the IR-SE sequence. MP2RAGE and DESPOT-HIFI sequences gave slightly shorter time than IR-SE (~20 to 30ms) and can be considered as alternative and time-efficient methods for acquiring accurate T1 maps of the human brain, while IR-SE-EPI gave identical result, at a cost of a lower image quality.

  1. Mandible-first sequence in bimaxillary orthognathic surgery: a systematic review.

    Science.gov (United States)

    Borba, A M; Borges, A H; Cé, P S; Venturi, B A; Naclério-Homem, M G; Miloro, M

    2016-04-01

    The sequencing of bimaxillary orthognathic surgery remains controversial, although the traditional maxilla-first approach is performed routinely. The goal of this study was to present a systematic review of the mandible-first sequence in bimaxillary orthognathic surgery, to provide data that may assist in the decision as to which jaw should undergo osteotomy first in bimaxillary orthognathic surgery cases. A literature search was conducted for articles published in the English language, reporting the use of the altered sequence for bimaxillary orthognathic surgery (mandible-first), using the following descriptors: 'orthognathic' and 'double-jaw', 'orthognathic' and 'two-jaw', 'orthognathic' and 'mandible-first', 'orthognathic' and 'bimaxillary'. Eight hundred eighty-seven abstracts were initially identified and were evaluated for inclusion according to the proposed inclusion criteria. After evaluation of these abstracts and relevant references, six publications met the criteria for consideration. Performing mandible-first surgery in bimaxillary orthognathic cases dates back to the 1970s; however the decision regarding the jaw to be operated on first seems to rely on accurate preoperative planning based upon the surgeon's experience and preference. While there appear to be significant theoretical advantages to support the use of the altered orthognathic sequence (mandible-first), future prospective studies on its reliability, accuracy, and short- and long-term outcomes are required. Copyright © 2015 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.

  2. Ultra-fast evaluation of protein energies directly from sequence.

    Directory of Open Access Journals (Sweden)

    Gevorg Grigoryan

    2006-06-01

    Full Text Available The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 10(7 compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1-4.7 kcal/mol, R2 = 0.7-1.0. Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets-a coiled coil, a zinc finger, and a WW domain-as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages

  3. Advances of Single-Cell Sequencing Technique in Tumors

    Directory of Open Access Journals (Sweden)

    Ji-feng FENG

    2017-03-01

    Full Text Available With the completion of human genome project (HGP and the international HapMap project as well as rapid development of high-throughput biochip technology, whole genomic sequencing-targeted analysis of genomic structures has been primarily finished. Application of single cell for the analysis of the whole genomics is not only economical in material collection, but more importantly, the cell will be more purified, and the laboratory results will be more accurate and reliable. Therefore, exploration and analysis of hereditary information of single tumor cells has become the dream of all researchers in the field of basic research of tumors. At present, single-cell sequencing (SCS on malignancies has been widely used in the studies of pathogeneses of multiple malignancies, such as glioma, renal cancer and hematologic neoplasms, and in the studies of the metastatic mechanism of breast cancer by some researchers. This study mainly reviewed the SCS, the mechanisms and the methods of SCS in isolating tumor cells, and application of SCS technique in tumor-related basic research and clinical treatment.

  4. HGVS Recommendations for the Description of Sequence Variants: 2016 Update.

    Science.gov (United States)

    den Dunnen, Johan T; Dalgleish, Raymond; Maglott, Donna R; Hart, Reece K; Greenblatt, Marc S; McGowan-Jordan, Jean; Roux, Anne-Francoise; Smith, Timothy; Antonarakis, Stylianos E; Taschner, Peter E M

    2016-06-01

    The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen. © 2016 WILEY PERIODICALS, INC.

  5. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    Science.gov (United States)

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  6. Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

    Science.gov (United States)

    Wang, Chaolong; Zhan, Xiaowei; Liang, Liming; Abecasis, Gonçalo R.; Lin, Xihong

    2015-01-01

    Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources. PMID:26027497

  7. A new and accurate fault location algorithm for combined transmission lines using Adaptive Network-Based Fuzzy Inference System

    Energy Technology Data Exchange (ETDEWEB)

    Sadeh, Javad; Afradi, Hamid [Electrical Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, P.O. Box: 91775-1111, Mashhad (Iran)

    2009-11-15

    This paper presents a new and accurate algorithm for locating faults in a combined overhead transmission line with underground power cable using Adaptive Network-Based Fuzzy Inference System (ANFIS). The proposed method uses 10 ANFIS networks and consists of 3 stages, including fault type classification, faulty section detection and exact fault location. In the first part, an ANFIS is used to determine the fault type, applying four inputs, i.e., fundamental component of three phase currents and zero sequence current. Another ANFIS network is used to detect the faulty section, whether the fault is on the overhead line or on the underground cable. Other eight ANFIS networks are utilized to pinpoint the faults (two for each fault type). Four inputs, i.e., the dc component of the current, fundamental frequency of the voltage and current and the angle between them, are used to train the neuro-fuzzy inference systems in order to accurately locate the faults on each part of the combined line. The proposed method is evaluated under different fault conditions such as different fault locations, different fault inception angles and different fault resistances. Simulation results confirm that the proposed method can be used as an efficient means for accurate fault location on the combined transmission lines. (author)

  8. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks

    Science.gov (United States)

    Fitzpatrick, Margaret A.; Hauser, Alan R.

    2015-01-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts. PMID:26699703

  9. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    Science.gov (United States)

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  10. TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.

    Science.gov (United States)

    Chang, Jia-Ming; Di Tommaso, Paolo; Lefort, Vincent; Gascuel, Olivier; Notredame, Cedric

    2015-07-01

    This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Preferential 5-Methylcytosine Oxidation in the Linker Region of Reconstituted Positioned Nucleosomes by Tet1 Protein.

    Science.gov (United States)

    Kizaki, Seiichiro; Zou, Tingting; Li, Yue; Han, Yong-Woon; Suzuki, Yuki; Harada, Yoshie; Sugiyama, Hiroshi

    2016-11-07

    Tet (ten-eleven translocation) family proteins oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC), and are suggested to be involved in the active DNA demethylation pathway. In this study, we reconstituted positioned mononucleosomes using CpG-methylated 382 bp DNA containing the Widom 601 sequence and recombinant histone octamer, and subjected the nucleosome to treatment with Tet1 protein. The sites of oxidized methylcytosine were identified by bisulfite sequencing. We found that, for the oxidation reaction, Tet1 protein prefers mCs located in the linker region of the nucleosome compared with those located in the core region. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    Directory of Open Access Journals (Sweden)

    Arthur W Pightling

    Full Text Available The wide availability of whole-genome sequencing (WGS and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i depth of sequencing coverage, ii choice of reference-guided short-read sequence assembler, iii choice of reference genome, and iv whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT, using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming. We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers

  13. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.

    Science.gov (United States)

    Ghanem, Mostafa; El-Gazzar, Mohamed

    2018-05-01

    Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  14. Synthesis Study Of Surfactants Sodium Ligno Sulphonate (SLS) From Biomass Waste Using Fourier Transform Infra Red (FTIR)

    OpenAIRE

    Priyanto Slamet; Pramudono Bambang; Kusworo Tutuk Djoko; Suherman; Aji Hapsoro Aruno; Untoro Edi; Ratu Puspa

    2018-01-01

    Lignin from biomass waste (Black Liquor) was isolated by using sulfuric acid 25% and sodium hydroxide solutions 2N. The obtained lignin was reacted with Sodium Bisulfite to Sodium Ligno Sulfonate (SLS). The best result was achieved at 80 ° C, pH 9, ratio of lignin and bisulfite 4: 1, for 2 hours, and 290 rpm stirring rate. The result of lignin formed was sulfonated using Sodium Bisulfite (NaHSO3) to Sodium Ligno Sulfonate (SLS) whose results were tested by the role of groups in peak formation...

  15. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data.

    Science.gov (United States)

    Salehi, Sohrab; Steif, Adi; Roth, Andrew; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2017-03-01

    Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.

  16. An accurate evaluation of the performance of asynchronous DS-CDMA systems with zero-correlation-zone coding in Rayleigh fading

    Science.gov (United States)

    Walker, Ernest; Chen, Xinjia; Cooper, Reginald L.

    2010-04-01

    An arbitrarily accurate approach is used to determine the bit-error rate (BER) performance for generalized asynchronous DS-CDMA systems, in Gaussian noise with Raleigh fading. In this paper, and the sequel, new theoretical work has been contributed which substantially enhances existing performance analysis formulations. Major contributions include: substantial computational complexity reduction, including a priori BER accuracy bounding; an analytical approach that facilitates performance evaluation for systems with arbitrary spectral spreading distributions, with non-uniform transmission delay distributions. Using prior results, augmented by these enhancements, a generalized DS-CDMA system model is constructed and used to evaluated the BER performance, in a variety of scenarios. In this paper, the generalized system modeling was used to evaluate the performance of both Walsh- Hadamard (WH) and Walsh-Hadamard-seeded zero-correlation-zone (WH-ZCZ) coding. The selection of these codes was informed by the observation that WH codes contain N spectral spreading values (0 to N - 1), one for each code sequence; while WH-ZCZ codes contain only two spectral spreading values (N/2 - 1,N/2); where N is the sequence length in chips. Since these codes span the spectral spreading range for DS-CDMA coding, by invoking an induction argument, the generalization of the system model is sufficiently supported. The results in this paper, and the sequel, support the claim that an arbitrary accurate performance analysis for DS-CDMA systems can be evaluated over the full range of binary coding, with minimal computational complexity.

  17. Anatomical landmarks and skin markers are not reliable for accurate labeling of thoracic vertebrae on MRI

    International Nuclear Information System (INIS)

    Shabshin, Nogah; Schweitzer, Mark E.; Carrino, John A.

    2010-01-01

    Background: Numbering of the thoracic spine on MRI can be tedious if C2 and L5-S1 are not included and may lead to errors in lesion level. Purpose: To determine whether anatomic landmarks or external markers are reliable as an aid for accurate numbering of thoracic vertebrae on MRI. Material and Methods: Sixty-seven thoracic spine MR studies of 67 patients (30 males, 37 females, age range 18-83 years) were studied, composed of 52 consecutive MR studies and an additional 15 MRI in which vitamin E markers were placed over the skin. In the 52 thoracic MR examinations potential numbering aids such as the level of the sternal apex, pulmonary artery, aortic arch, and osseous or disc abnormalities were numbered on both cervical localizer (standard of reference) and thoracic sagittal images. The additional 15 examinations in which vitamin E markers were placed over the skin were evaluated for consistency in the level of the markers on different sequences in the same exam. Results: The sternal apex level ranged from T2 to T5 [T3 in 28/51 patients (55%), T2 in 10/51 (20%)]. The aortic arch level ranged from T2 to T4 [T4 in 18/48 (38%) and T3 in 17 (35%)]. Pulmonary artery level ranged from T4 to T6-7 disc [T5 in 20/52 patients (38%) and T6 in 14/52 (27%)]. In 3 of 12 patients who had abnormalities in a vertebral body or disc as definite point reference, the non-localizer image mislabelled the level. In 11/15 (73%) patients with vitamin E markers that were placed over the upper thoracic spine, the results showed consistency in the level of the markers in relation to the reference points or consistent inter-marker gap between the sequences. Conclusion: There are only two reliable ways to accurately define the levels if no landmarking feature is available on the magnet. The first is by including C2 in the thoracic sequence of a diagnostic quality, and the second is by using an abnormality in the discs or vertebral bodies as a point of reference

  18. Anatomical landmarks and skin markers are not reliable for accurate labeling of thoracic vertebrae on MRI

    Energy Technology Data Exchange (ETDEWEB)

    Shabshin, Nogah (Dept. of Diagnostic Imaging, Chaim Sheba Medical Center, Tel-HaShomer (Israel)), e-mail: shabshin@gmail.com; Schweitzer, Mark E. (Dept. of Diagnostic Imaging, Ottawa Hospital and Univ. of Ottawa, Ottawa (Canada)); Carrino, John A. (Dept. of Radiology, Johns Hopkins Univ. School of Medicine, Baltimore, MD (United States))

    2010-11-15

    Background: Numbering of the thoracic spine on MRI can be tedious if C2 and L5-S1 are not included and may lead to errors in lesion level. Purpose: To determine whether anatomic landmarks or external markers are reliable as an aid for accurate numbering of thoracic vertebrae on MRI. Material and Methods: Sixty-seven thoracic spine MR studies of 67 patients (30 males, 37 females, age range 18-83 years) were studied, composed of 52 consecutive MR studies and an additional 15 MRI in which vitamin E markers were placed over the skin. In the 52 thoracic MR examinations potential numbering aids such as the level of the sternal apex, pulmonary artery, aortic arch, and osseous or disc abnormalities were numbered on both cervical localizer (standard of reference) and thoracic sagittal images. The additional 15 examinations in which vitamin E markers were placed over the skin were evaluated for consistency in the level of the markers on different sequences in the same exam. Results: The sternal apex level ranged from T2 to T5 [T3 in 28/51 patients (55%), T2 in 10/51 (20%)]. The aortic arch level ranged from T2 to T4 [T4 in 18/48 (38%) and T3 in 17 (35%)]. Pulmonary artery level ranged from T4 to T6-7 disc [T5 in 20/52 patients (38%) and T6 in 14/52 (27%)]. In 3 of 12 patients who had abnormalities in a vertebral body or disc as definite point reference, the non-localizer image mislabelled the level. In 11/15 (73%) patients with vitamin E markers that were placed over the upper thoracic spine, the results showed consistency in the level of the markers in relation to the reference points or consistent inter-marker gap between the sequences. Conclusion: There are only two reliable ways to accurately define the levels if no landmarking feature is available on the magnet. The first is by including C2 in the thoracic sequence of a diagnostic quality, and the second is by using an abnormality in the discs or vertebral bodies as a point of reference

  19. SWANS: A Prototypic SCALE Criticality Sequence for Automated Optimization Using the SWAN Methodology

    International Nuclear Information System (INIS)

    Greenspan, E.

    2001-01-01

    SWANS is a new prototypic analysis sequence that provides an intelligent, semi-automatic search for the maximum k eff of a given amount of specified fissile material, or of the minimum critical mass. It combines the optimization strategy of the SWAN code with the composition-dependent resonance self-shielded cross sections of the SCALE package. For a given system composition arrived at during the iterative optimization process, the value of k eff is as accurate and reliable as obtained using the CSAS1X Sequence of SCALE-4.4. This report describes how SWAN is integrated within the SCALE system to form the new prototypic optimization sequence, describes the optimization procedure, provides a user guide for SWANS, and illustrates its application to five different types of problems. In addition, the report illustrates that resonance self-shielding might have a significant effect on the maximum k eff value a given fissile material mass can have

  20. SWANS: A Prototypic SCALE Criticality Sequence for Automated Optimization Using the SWAN Methodology

    Energy Technology Data Exchange (ETDEWEB)

    Greenspan, E.

    2001-01-11

    SWANS is a new prototypic analysis sequence that provides an intelligent, semi-automatic search for the maximum k{sub eff} of a given amount of specified fissile material, or of the minimum critical mass. It combines the optimization strategy of the SWAN code with the composition-dependent resonance self-shielded cross sections of the SCALE package. For a given system composition arrived at during the iterative optimization process, the value of k{sub eff} is as accurate and reliable as obtained using the CSAS1X Sequence of SCALE-4.4. This report describes how SWAN is integrated within the SCALE system to form the new prototypic optimization sequence, describes the optimization procedure, provides a user guide for SWANS, and illustrates its application to five different types of problems. In addition, the report illustrates that resonance self-shielding might have a significant effect on the maximum k{sub eff} value a given fissile material mass can have.

  1. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

    Science.gov (United States)

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-01-01

    Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465

  2. Molecular analysis of the metabolic rates of discrete subsurface populations of sulfate reducers

    Energy Technology Data Exchange (ETDEWEB)

    Miletto, M.; Williams, K.H.; N' Guessan, A.L.; Lovley, D.R.

    2011-04-01

    Elucidating the in situ metabolic activity of phylogenetically diverse populations of sulfate-reducing microorganisms that populate anoxic sedimentary environments is key to understanding subsurface ecology. Previous pure culture studies have demonstrated that transcript abundance of dissimilatory (bi)sulfite reductase genes is correlated with the sulfate reducing activity of individual cells. To evaluate whether expression of these genes was diagnostic for subsurface communities, dissimilatory (bi)sulfite reductase gene transcript abundance in phylogenetically distinct sulfate-reducing populations was quantified during a field experiment in which acetate was added to uranium-contaminated groundwater. Analysis of dsrAB sequences prior to the addition of acetate indicated that Desulfobacteraceae, Desulfobulbaceae, and Syntrophaceae-related sulfate reducers were the most abundant. Quantifying dsrB transcripts of the individual populations suggested that Desulfobacteraceae initially had higher dsrB transcripts per cell than Desulfobulbaceae or Syntrophaceae populations, and that the activity of Desulfobacteraceae increased further when the metabolism of dissimilatory metal reducers competing for the added acetate declined. In contrast, dsrB transcript abundance in Desulfobulbaceae and Syntrophaceae remained relatively constant, suggesting a lack of stimulation by added acetate. The indication of higher sulfate-reducing activity in the Desulfobacteraceae was consistent with the finding that Desulfobacteraceae became the predominant component of the sulfate-reducing community. Discontinuing acetate additions resulted in a decline in dsrB transcript abundance in the Desulfobacteraceae. These results suggest that monitoring transcripts of dissimilatory (bi)sulfite reductase genes in distinct populations of sulfate reducers can provide insight into the relative rates of metabolism of different components of the sulfate-reducing community and their ability to respond to

  3. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

    Science.gov (United States)

    Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M

    2015-11-01

    Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.

  4. "Polymeromics": Mass spectrometry based strategies in polymer science toward complete sequencing approaches: a review.

    Science.gov (United States)

    Altuntaş, Esra; Schubert, Ulrich S

    2014-01-15

    Mass spectrometry (MS) is the most versatile and comprehensive method in "OMICS" sciences (i.e. in proteomics, genomics, metabolomics and lipidomics). The applications of MS and tandem MS (MS/MS or MS(n)) provide sequence information of the full complement of biological samples in order to understand the importance of the sequences on their precise and specific functions. Nowadays, the control of polymer sequences and their accurate characterization is one of the significant challenges of current polymer science. Therefore, a similar approach can be very beneficial for characterizing and understanding the complex structures of synthetic macromolecules. MS-based strategies allow a relatively precise examination of polymeric structures (e.g. their molar mass distributions, monomer units, side chain substituents, end-group functionalities, and copolymer compositions). Moreover, tandem MS offer accurate structural information from intricate macromolecular structures; however, it produces vast amount of data to interpret. In "OMICS" sciences, the software application to interpret the obtained data has developed satisfyingly (e.g. in proteomics), because it is not possible to handle the amount of data acquired via (tandem) MS studies on the biological samples manually. It can be expected that special software tools will improve the interpretation of (tandem) MS output from the investigations of synthetic polymers as well. Eventually, the MS/MS field will also open up for polymer scientists who are not MS-specialists. In this review, we dissect the overall framework of the MS and MS/MS analysis of synthetic polymers into its key components. We discuss the fundamentals of polymer analyses as well as recent advances in the areas of tandem mass spectrometry, software developments, and the overall future perspectives on the way to polymer sequencing, one of the last Holy Grail in polymer science. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. Epigenetic Silencing of the Protocadherin Family Member PCDH-γ-All in Astrocytomas

    Directory of Open Access Journals (Sweden)

    Anke Waha

    2005-03-01

    Full Text Available In a microarray-based methylation analysis of astrocytomas [World Health Organization (WHO grade II], we identified a CpG island within the first exon of the protocadherin-γ subfamily A11 (PCDH-γ-A11 gene that showed hypermethylation compared to normal brain tissue. Bisulfite sequencing and combined bisulfite restriction analysis (COBRA was performed to screen low- and high-grade astrocytomas for the methylation status of this CpG island. Hypermethylation was detected in 30 of 34 (88% astrocytomas (WHO grades II and III, 20 of 23 (87% glioblastomas (WHO grade IV, 8 of 8 (100% glioma cell lines. There was a highly significant correlation (P = .00028 between PCDH-γ-A11 hypermethylation and decreased transcription as determined by competitive reverse transcription polymerase chain reaction in WHO grades II and III astrocytomas. After treatment of glioma cell lines with a demethylating agent, transcription of PCDH-γ-A11 was restored. In summary, we have identified PCDH-γ-A11 as a new target silenced epigenetically in astrocytic gliomas. The inactivation of this cell-cell contact molecule might be involved in the invasive growth of astrocytoma cells into normal brain parenchyma.

  6. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  7. Neural correlates of skill acquisition: decreased cortical activity during a serial interception sequence learning task.

    Science.gov (United States)

    Gobel, Eric W; Parrish, Todd B; Reber, Paul J

    2011-10-15

    Learning of complex motor skills requires learning of component movements as well as the sequential structure of their order and timing. Using a Serial Interception Sequence Learning (SISL) task, participants learned a sequence of precisely timed interception responses through training with a repeating sequence. Following initial implicit learning of the repeating sequence, functional MRI data were collected during performance of that known sequence and compared with activity evoked during novel sequences of actions, novel timing patterns, or both. Reduced activity was observed during the practiced sequence in a distributed bilateral network including extrastriate occipital, parietal, and premotor cortical regions. These reductions in evoked activity likely reflect improved efficiency in visuospatial processing, spatio-motor integration, motor planning, and motor execution for the trained sequence, which is likely supported by nondeclarative skill learning. In addition, the practiced sequence evoked increased activity in the left ventral striatum and medial prefrontal cortex, while the posterior cingulate was more active during periods of better performance. Many prior studies of perceptual-motor skill learning have found increased activity in motor areas of the frontal cortex (e.g., motor and premotor cortex, SMA) and striatal areas (e.g., the putamen). The change in activity observed here (i.e., decreased activity across a cortical network) may reflect skill learning that is predominantly expressed through more accurate performance rather than decreased reaction time. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. STELLAR LOCUS REGRESSION: ACCURATE COLOR CALIBRATION AND THE REAL-TIME DETERMINATION OF GALAXY CLUSTER PHOTOMETRIC REDSHIFTS

    International Nuclear Information System (INIS)

    High, F. William; Stubbs, Christopher W.; Rest, Armin; Stalder, Brian; Challis, Peter

    2009-01-01

    We present stellar locus regression (SLR), a method of directly adjusting the instrumental broadband optical colors of stars to bring them into accord with a universal stellar color-color locus, producing accurately calibrated colors for both stars and galaxies. This is achieved without first establishing individual zero points for each passband, and can be performed in real-time at the telescope. We demonstrate how SLR naturally makes one wholesale correction for differences in instrumental response, for atmospheric transparency, for atmospheric extinction, and for Galactic extinction. We perform an example SLR treatment of Sloan Digital Sky Survey data over a wide range of Galactic dust values and independently recover the direction and magnitude of the canonical Galactic reddening vector with 14-18 mmag rms uncertainties. We then isolate the effect of atmospheric extinction, showing that SLR accounts for this and returns precise colors over a wide range of air mass, with 5-14 mmag rms residuals. We demonstrate that SLR-corrected colors are sufficiently accurate to allow photometric redshift estimates for galaxy clusters (using red sequence galaxies) with an uncertainty σ(z)/(1 + z) = 0.6% per cluster for redshifts 0.09 < z < 0.25. Finally, we identify our objects in the 2MASS all-sky catalog, and produce i-band zero points typically accurate to 18 mmag using only SLR. We offer open-source access to our IDL routines, validated and verified for the implementation of this technique, at http://stellar-locus-regression.googlecode.com.

  9. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Directory of Open Access Journals (Sweden)

    Jonathan Z Li

    Full Text Available The impact of raltegravir-resistant HIV-1 minority variants (MVs on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs.A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser.Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001. Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454.In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  10. Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats.

    Science.gov (United States)

    Armour, John A L; Palla, Raquel; Zeeuwen, Patrick L J M; den Heijer, Martin; Schalkwijk, Joost; Hollox, Edward J

    2007-01-01

    Recent work has demonstrated an unexpected prevalence of copy number variation in the human genome, and has highlighted the part this variation may play in predisposition to common phenotypes. Some important genes vary in number over a high range (e.g. DEFB4, which commonly varies between two and seven copies), and have posed formidable technical challenges for accurate copy number typing, so that there are no simple, cheap, high-throughput approaches suitable for large-scale screening. We have developed a simple comparative PCR method based on dispersed repeat sequences, using a single pair of precisely designed primers to amplify products simultaneously from both test and reference loci, which are subsequently distinguished and quantified via internal sequence differences. We have validated the method for the measurement of copy number at DEFB4 by comparison of results from >800 DNA samples with copy number measurements by MAPH/REDVR, MLPA and array-CGH. The new Paralogue Ratio Test (PRT) method can require as little as 10 ng genomic DNA, appears to be comparable in accuracy to the other methods, and for the first time provides a rapid, simple and inexpensive method for copy number analysis, suitable for application to typing thousands of samples in large case-control association studies.

  11. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

    Science.gov (United States)

    Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

    2012-09-11

    Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  12. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  13. The Use of Artificial Neural Networks in Prediction of Congenital CMV Outcome from Sequence Data

    Directory of Open Access Journals (Sweden)

    Ravit Arav-Boger

    2008-01-01

    Full Text Available A large number of CMV strains has been reported to circulate in the human population, and the biological significance of these strains is currently an active area of research. The analysis of complex genetic information may be limited using conventional phylogenetic techniques. We constructed artificial neural networks to determine their feasibility in predicting the outcome of congenital CMV disease (defined as presence of CMV symptoms at birth based on two data sets: 54 sequences of CMV gene UL144 obtained from 54 amniotic fluids of women who contracted acute CMV infection during their pregnancy, and 80 sequences of 4 genes (US28, UL144, UL146 and UL147 obtained from urine, saliva or blood of 20 congenitally infected infants that displayed different outcomes at birth. When data from all four genes was used in the 20-infants’ set, the artificial neural network model accurately identified outcome in 90% of cases. While US28 and UL147 had low yield in predicting outcome, UL144 and UL146 predicted outcome in 80% and 85% respectively when used separately. The model identified specific nucleotide positions that were highly relevant to prediction of outcome. The artificial neural network classified genotypes in agreement with classic phylogenetic analysis. We suggest that artificial neural networks can accurately and efficiently analyze sequences obtained from larger cohorts to determine specific outcomes.The ANN training and analysis code is commercially available from Optimal Neural Informatics (Pikesville, MD.

  14. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  15. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Hyaline articular cartilage: relaxation times, pulse-sequence parameters and MR appearance at 1.5 T

    Energy Technology Data Exchange (ETDEWEB)

    Chalkias, S.M. [Dept. of Radiology, A.H.E.P.A. General Hospital of the Aristotelian Univ., Thessaloniki (Greece); Pozzi-Mucelli, R.S. [Dept. of Radiology, Univ. of Trieste (Italy); Pozzi-Mucelli, M. [Orthopaedic Clinic, Univ. of Trieste (Italy); Frezza, F. [Dept. of Radiology, Univ. of Trieste (Italy); Longo, R. [Dept. of Radiology, Univ. of Trieste (Italy)

    1994-08-01

    In order to optimize the parameters for the best visualization of the internal architecture of the hyaline articular cartilage a study both ex vivo and in vivo was performed. Accurate T1 and T2 relaxation times of articular cartilage were obtained with a particular mixed sequence and then used for the creation of isocontrast intensity graphs. These graphs subsequently allowed in all pulse sequences (spin echo, SE and gradient echo, GRE) the best combination of repetition time (TR), echo time (TE) and flip angle (FA) for optimization of signal differences between MR cartilage zones. For SE sequences maximum contrast between cartilage zones can be obtained by using a long TR (> 1,500 ms) with a short TE (< 30 ms), whereas for GRE sequences maximum contrast is obtained with the shortest TE (< 15 ms) combined with a relatively long TR (> 400 ms) and an FA greater than 40 . A trilaminar appearance was demonstrated with a superficial and deep hypointense zone in all sequences and an intermediate zone that was moderately hyperintense on SE T1-weighted images, slightly more hyperintense on proton density Rho and SE T2-weighted images and even more hyperintense on GRE images. (orig.)

  17. Implementation of Whole Genome Sequencing (WGS for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC in the United States

    Directory of Open Access Journals (Sweden)

    Rebecca L Lindsey

    2016-05-01

    Full Text Available Shiga toxin-producing Escherichia coli (STEC is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE (www.genomicepidemiology.org and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x

  18. Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States

    Science.gov (United States)

    Lindsey, Rebecca L.; Pouseele, Hannes; Chen, Jessica C.; Strockbine, Nancy A.; Carleton, Heather A.

    2016-01-01

    Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different

  19. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis; FINAL

    International Nuclear Information System (INIS)

    Simon, M. I.; Kim, U.-J.

    2002-01-01

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year

  20. Aberrant methylation of the M-type phospholipase A2 receptor gene in leukemic cells

    International Nuclear Information System (INIS)

    Menschikowski, Mario; Platzbecker, Uwe; Hagelgans, Albert; Vogel, Margot; Thiede, Christian; Schönefeldt, Claudia; Lehnert, Renate; Eisenhofer, Graeme; Siegert, Gabriele

    2012-01-01

    The M-type phospholipase A2 receptor (PLA2R1) plays a crucial role in several signaling pathways and may act as tumor-suppressor. This study examined the expression and methylation of the PLA2R1 gene in Jurkat and U937 leukemic cell lines and its methylation in patients with myelodysplastic syndrome (MDS) or acute leukemia. Sites of methylation of the PLA2R1 locus were identified by sequencing bisulfite-modified DNA fragments. Methylation specific-high resolution melting (MS-HRM) analysis was then carried out to quantify PLA2R1 methylation at 5-CpG sites identified with differences in methylation between healthy control subjects and leukemic patients using sequencing of bisulfite-modified genomic DNA. Expression of PLA2R1 was found to be completely down-regulated in Jurkat and U937 cells, accompanied by complete methylation of PLA2R1 promoter and down-stream regions; PLA2R1 was re-expressed after exposure of cells to 5-aza-2´-deoxycytidine. MS-HRM analysis of the PLA2R1 locus in patients with different types of leukemia indicated an average methylation of 28.9% ± 17.8%, compared to less than 9% in control subjects. In MDS patients the extent of PLA2R1 methylation significantly increased with disease risk. Furthermore, measurements of PLA2R1 methylation appeared useful for predicting responsiveness to the methyltransferase inhibitor, azacitidine, as a pre-emptive treatment to avoid hematological relapse in patients with high-risk MDS or acute myeloid leukemia. The study shows for the first time that PLA2R1 gene sequences are a target of hypermethylation in leukemia, which may have pathophysiological relevance for disease evolution in MDS and leukemogenesis

  1. DNA methylation results depend on DNA integrity – role of post mortem interval

    Directory of Open Access Journals (Sweden)

    Mathias eRhein

    2015-05-01

    Full Text Available Major questions of neurological and psychiatric mechanisms involve the brain functions on a molecular level and cannot be easily addressed due to limitations in access to tissue samples. Post mortem studies are able to partly bridge the gap between brain tissue research retrieved from animal trials and the information derived from peripheral analysis (e.g. measurements in blood cells in patients. Here, we wanted to know how fast DNA degradation is progressing under controlled conditions in order to define thresholds for tissue quality to be used in respective trials. Our focus was on the applicability of partly degraded samples for bisulfite sequencing and the determination of simple means to define cut-off values.After opening the brain cavity, we kept two consecutive pig skulls at ambient temperature (19-21°C and removed cortex tissue up to a post mortem interval (PMI of 120h. We calculated the percentage of degradation on DNA gel electrophoresis of brain DNA to estimate quality and relate this estimation spectrum to the quality of human post-mortem control samples. Functional DNA quality was investigated by bisulfite sequencing of two functionally relevant genes for either the serotonin receptor 5 (SLC6A4 or aldehyde dehydrogenase 2 (ALDH2.Testing our approach in a heterogeneous collective of human blood and brain samples, we demonstrate integrity of measurement quality below the threshold of 72h PMI.While sequencing technically worked for all timepoints irrespective of conceivable DNA degradation, there is a good correlation between variance of methylation to degradation levels documented in the gel (R2=0.4311, p=0.0392 for advancing post mortem intervals (PMI. This otherwise elusive phenomenon is an important prerequisite for the interpretation and evaluation of samples prior to in-depth processing via an affordable and easy assay to estimate identical sample quality and thereby comparable methylation measurements.

  2. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis

    Directory of Open Access Journals (Sweden)

    Thiele Bernhard

    2011-05-01

    Full Text Available Abstract Background Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4 variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Methods Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Results Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%, and defining a minority cutoff of 5%, the results were concordant in all but one isolate. Conclusions The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  3. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.

    Science.gov (United States)

    Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander

    2011-05-13

    Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  4. Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing.

    Science.gov (United States)

    Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A

    2017-05-23

    In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.

  5. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  6. Accurate and rapid modeling of iron-bleomycin-induced DNA damage using tethered duplex oligonucleotides and electrospray ionization ion trap mass spectrometric analysis.

    Science.gov (United States)

    Harsch, A; Marzilli, L A; Bunt, R C; Stubbe, J; Vouros, P

    2000-05-01

    Bleomycin B(2)(BLM) in the presence of iron [Fe(II)] and O(2)catalyzes single-stranded (ss) and double-stranded (ds) cleavage of DNA. Electrospray ionization ion trap mass spectrometry was used to monitor these cleavage processes. Two duplex oligonucleotides containing an ethylene oxide tether between both strands were used in this investigation, allowing facile monitoring of all ss and ds cleavage events. A sequence for site-specific binding and cleavage by Fe-BLM was incorporated into each analyte. One of these core sequences, GTAC, is a known hot-spot for ds cleavage, while the other sequence, GGCC, is a hot-spot for ss cleavage. Incubation of each oligo-nucleotide under anaerobic conditions with Fe(II)-BLM allowed detection of the non-covalent ternary Fe-BLM/oligonucleotide complex in the gas phase. Cleavage studies were then performed utilizing O(2)-activated Fe(II)-BLM. No work-up or separation steps were required and direct MS and MS/MS analyses of the crude reaction mixtures confirmed sequence-specific Fe-BLM-induced cleavage. Comparison of the cleavage patterns for both oligonucleotides revealed sequence-dependent preferences for ss and ds cleavages in accordance with previously established gel electrophoresis analysis of hairpin oligonucleotides. This novel methodology allowed direct, rapid and accurate determination of cleavage profiles of model duplex oligonucleotides after exposure to activated Fe-BLM.

  7. Calculation of T2 relaxation time from ultrafast single shot sequences for differentiation of liver tumors. Comparison of echo-planar, HASTE, and spin-echo sequences

    International Nuclear Information System (INIS)

    Abe, Yasuko; Yamashita, Yasuyuki; Tang, Yi; Namimoto, Tomohiro; Takahashi, Mutsumasa

    2000-01-01

    The purpose of this study was to evaluate the accuracy of T2 calculation from single shot imaging sequences such as echo-planar imaging (EPI) and half-Fourier single shot turbo spin-echo (HASTE) imaging. For the phantom study, we prepared vials containing different concentrations of agarose, copper sulfate, and nickel chloride. The temperature of the phantom was kept at 22 deg C. MR images were obtained with a 1.5-Tesla superconductive magnet. Spin-echo (SE)-type EPI and HASTE sequences with different TEs were obtained for T2 calculation, and the T2 values were compared with those obtained from the Carr-Purcell-Meiborm-Gill (CPMG) sequence. The clinical study group consisted of 30 consecutive patients referred for MR imaging to characterize focal liver lesions. A total of 40 focal liver lesions were evaluated, including 25 primary or metastatic solid masses and 15 non-solid lesions. Single shot SE-type EPI and HASTE were both performed with TEs of 64 and 90 msec. In the phantom study, the T2 values obtained from both single shot sequences showed significant correlations with those from the CPMG sequence (T2 on EPI vs. T2 on CPMG: r=0.98, p<0.01; T2 on HASTE vs. T2 on CPMG: r=0.99, p<0.01). In the clinical study, mean T2 values for liver calculated from EPI (42 msec) were significantly shorter than those calculated from the HASTE sequence (58 msec) (p<0.001). Mean T2 values for solid tumors were 95 msec with HASTE and 72 msec with EPI, and mean T2 values for non-solid lesions were 128 msec with HASTE and 159 msec with EPI. Although mean T2 values between solid and non-solid lesions were significantly different for both EPI and HASTE sequences (p=0.01 for HASTE, p<0.001 for EPI), the overlap of solid and non-solid lesions was less frequent in EPI than in HASTE. With single shot sequences, it is possible to obtain the T2 values that show excellent correlation with the CPMG sequence. Although both HASTE and EPI are useful to calculate T2 values, EPI appears to be more

  8. Sequence exploration reveals information bias among molecular markers used in phylogenetic reconstruction for Colletotrichum species.

    Science.gov (United States)

    Rampersad, Sephra N; Hosein, Fazeeda N; Carrington, Christine Vf

    2014-01-01

    The Colletotrichum gloeosporioides species complex is among the most destructive fungal plant pathogens in the world, however, identification of isolates of quarantine importance to the intra-specific level is confounded by a number of factors that affect phylogenetic reconstruction. Information bias and quality parameters were investigated to determine whether nucleotide sequence alignments and phylogenetic trees accurately reflect the genetic diversity and phylogenetic relatedness of individuals. Sequence exploration of GAPDH, ACT, TUB2 and ITS markers indicated that the query sequences had different patterns of nucleotide substitution but were without evidence of base substitution saturation. Regions of high entropy were much more dispersed in the ACT and GAPDH marker alignments than for the ITS and TUB2 markers. A discernible bimodal gap in the genetic distance frequency histograms was produced for the ACT and GAPDH markers which indicated successful separation of intra- and inter-specific sequences in the data set. Overall, analyses indicated clear differences in the ability of these markers to phylogenetically separate individuals to the intra-specific level which coincided with information bias.

  9. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  10. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  11. Locus Reference Genomic sequences: An improved basis for describing human DNA variants

    KAUST Repository

    Dalgleish, Raymond; Flicek, Paul; Cunningham, Fiona; Astashyn, Alex; Tully, Raymond E; Proctor, Glenn; Chen, Yuan; McLaren, William M; Larsson, Pontus; Vaughan, Brendan W; Bé roud, Christophe; Dobson, Glen; Lehvä slaiho, Heikki; Taschner, Peter EM; den Dunnen, Johan T; Devereau, Andrew; Birney, Ewan; Brookes, Anthony J; Maglott, Donna R

    2010-01-01

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.

  12. Locus Reference Genomic sequences: An improved basis for describing human DNA variants

    KAUST Repository

    Dalgleish, Raymond

    2010-04-15

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.

  13. Evaluation of Methyl-Binding Domain Based Enrichment Approaches Revisited.

    Directory of Open Access Journals (Sweden)

    Karolina A Aberg

    Full Text Available Methyl-binding domain (MBD enrichment followed by deep sequencing (MBD-seq, is a robust and cost efficient approach for methylome-wide association studies (MWAS. MBD-seq has been demonstrated to be capable of identifying differentially methylated regions, detecting previously reported robust associations and producing findings that replicate with other technologies such as targeted pyrosequencing of bisulfite converted DNA. There are several kits commercially available that can be used for MBD enrichment. Our previous work has involved MethylMiner (Life Technologies, Foster City, CA, USA that we chose after careful investigation of its properties. However, in a recent evaluation of five commercially available MBD-enrichment kits the performance of the MethylMiner was deemed poor. Given our positive experience with MethylMiner, we were surprised by this report. In an attempt to reproduce these findings we here have performed a direct comparison of MethylMiner with MethylCap (Diagenode Inc, Denville, NJ, USA, the best performing kit in that study. We find that both MethylMiner and MethylCap are two well performing MBD-enrichment kits. However, MethylMiner shows somewhat better enrichment efficiency and lower levels of background "noise". In addition, for the purpose of MWAS where we want to investigate the majority of CpGs, we find MethylMiner to be superior as it allows tailoring the enrichment to the regions where most CpGs are located. Using targeted bisulfite sequencing we confirmed that sites where methylation was detected by either MethylMiner or by MethylCap indeed were methylated.

  14. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  15. Evaluation of MeDIP-chip in the context of whole-genome bisulfite sequencing (WGBS-seq) in Arabidopsis

    NARCIS (Netherlands)

    Wardenaar, René; Liu, Haiyin; Colot, Vincent; Colomé-Tatché, Maria; Johannes, Frank

    2013-01-01

    Studies of DNA methylation in Arabidopsis have rapidly advanced from the analysis of a single reference accession to investigations of large populations. The goal of emerging population studies is to detect differentially methylated regions (DMRs) at the genome-wide scale, and to relate this

  16. Whole transcriptome sequencing enables discovery and analysis of viruses in archived primary central nervous system lymphomas.

    Directory of Open Access Journals (Sweden)

    Christopher DeBoever

    Full Text Available Primary central nervous system lymphomas (PCNSL have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV, JC polyomavirus (JCV, and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples.

  17. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  18. Discrepancy between Hepatitis C Virus Genotypes and NS4-Based Serotypes: Association with Their Subgenomic Sequences

    Directory of Open Access Journals (Sweden)

    Nan Nwe Win

    2017-01-01

    Full Text Available Determination of hepatitis C virus (HCV genotypes plays an important role in the direct-acting agent era. Discrepancies between HCV genotyping and serotyping assays are occasionally observed. Eighteen samples with discrepant results between genotyping and serotyping methods were analyzed. HCV serotyping and genotyping were based on the HCV nonstructural 4 (NS4 region and 5′-untranslated region (5′-UTR, respectively. HCV core and NS4 regions were chosen to be sequenced and were compared with the genotyping and serotyping results. Deep sequencing was also performed for the corresponding HCV NS4 regions. Seventeen out of 18 discrepant samples could be sequenced by the Sanger method. Both HCV core and NS4 sequences were concordant with that of genotyping in the 5′-UTR in all 17 samples. In cloning analysis of the HCV NS4 region, there were several amino acid variations, but each sequence was much closer to the peptide with the same genotype. Deep sequencing revealed that minor clones with different subgenotypes existed in two of the 17 samples. Genotyping by genome amplification showed high consistency, while several false reactions were detected by serotyping. The deep sequencing method also provides accurate genotyping results and may be useful for analyzing discrepant cases. HCV genotyping should be correctly determined before antiviral treatment.

  19. Dispersal of thermophilic Desulfotomaculum endospores into Baltic Sea sediments over thousands of years

    DEFF Research Database (Denmark)

    Rezende, Julia Rosa de; Kjeldsen, Kasper Urup; Hubert, Casey RJ

    2013-01-01

    S rRNA and of the dissimilatory (bi)sulfite reductase (dsrAB) genes within the endospore-forming SRB genus Desulfotomaculum. The thermophilic Desulfotomaculum community in Aarhus Bay sediments consisted of at least 23 species-level 16S rRNA sequence phylotypes. In two cases, pairs of identical 16S r......RNA and dsrAB sequences in Arctic surface sediment 3000km away showed that the same phylotypes are present in both locations. Radiotracer-enhanced most probable number analysis revealed that the abundance of endospores of thermophilic SRB in Aarhus Bay sediment was ca. 104 per cm3 at the surface and decreased...... in water masses preceding sedimentation. The sources of these thermophiles remain enigmatic, but at least one source may be common to both Aarhus Bay and Arctic sediments....

  20. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    OpenAIRE

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-01-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amin...

  1. Observing complex action sequences: The role of the fronto-parietal mirror neuron system.

    Science.gov (United States)

    Molnar-Szakacs, Istvan; Kaplan, Jonas; Greenfield, Patricia M; Iacoboni, Marco

    2006-11-15

    A fronto-parietal mirror neuron network in the human brain supports the ability to represent and understand observed actions allowing us to successfully interact with others and our environment. Using functional magnetic resonance imaging (fMRI), we wanted to investigate the response of this network in adults during observation of hierarchically organized action sequences of varying complexity that emerge at different developmental stages. We hypothesized that fronto-parietal systems may play a role in coding the hierarchical structure of object-directed actions. The observation of all action sequences recruited a common bilateral network including the fronto-parietal mirror neuron system and occipito-temporal visual motion areas. Activity in mirror neuron areas varied according to the motoric complexity of the observed actions, but not according to the developmental sequence of action structures, possibly due to the fact that our subjects were all adults. These results suggest that the mirror neuron system provides a fairly accurate simulation process of observed actions, mimicking internally the level of motoric complexity. We also discuss the results in terms of the links between mirror neurons, language development and evolution.

  2. RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method

    KAUST Repository

    Ganesan, Pugalenthi; Kandaswamy, Krishna Kumar Umar; Chou -, Kuochen; Vivekanandan, Saravanan; Kolatkar, Prasanna R.

    2012-01-01

    Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf

  3. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  4. Rapid detection of SMARCB1 sequence variation using high resolution melting

    Directory of Open Access Journals (Sweden)

    Ashley David M

    2009-12-01

    Full Text Available Abstract Background Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM, for detecting sequence variations in SMARCB1. Methods Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. Results The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4% showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA. A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. Conclusions This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to

  5. Rapid detection of SMARCB1 sequence variation using high resolution melting

    International Nuclear Information System (INIS)

    Dagar, Vinod; Chow, Chung-Wo; Ashley, David M; Algar, Elizabeth M

    2009-01-01

    Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM), for detecting sequence variations in SMARCB1. Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4%) showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA). A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to facilitate whole gene mutation screening

  6. Identification of a novel LMF1 nonsense mutation responsible for severe hypertriglyceridemia by targeted next-generation sequencing.

    Science.gov (United States)

    Cefalù, Angelo B; Spina, Rossella; Noto, Davide; Ingrassia, Valeria; Valenti, Vincenza; Giammanco, Antonina; Fayer, Francesca; Misiano, Gabriella; Cocorullo, Gianfranco; Scrimali, Chiara; Palesano, Ornella; Altieri, Grazia I; Ganci, Antonina; Barbagallo, Carlo M; Averna, Maurizio R

    Severe hypertriglyceridemia (HTG) may result from mutations in genes affecting the intravascular lipolysis of triglyceride (TG)-rich lipoproteins. The aim of this study was to develop a targeted next-generation sequencing panel for the molecular diagnosis of disorders characterized by severe HTG. We developed a targeted customized panel for next-generation sequencing Ion Torrent Personal Genome Machine to capture the coding exons and intron/exon boundaries of 18 genes affecting the main pathways of TG synthesis and metabolism. We sequenced 11 samples of patients with severe HTG (TG>885 mg/dL-10 mmol/L): 4 positive controls in whom pathogenic mutations had previously been identified by Sanger sequencing and 7 patients in whom the molecular defect was still unknown. The customized panel was accurate, and it allowed to confirm genetic variants previously identified in all positive controls with primary severe HTG. Only 1 patient of 7 with HTG was found to be carrier of a homozygous pathogenic mutation of the third novel mutation of LMF1 gene (c.1380C>G-p.Y460X). The clinical and molecular familial cascade screening allowed the identification of 2 additional affected siblings and 7 heterozygous carriers of the mutation. We showed that our targeted resequencing approach for genetic diagnosis of severe HTG appears to be accurate, less time consuming, and more economical compared with traditional Sanger resequencing. The identification of pathogenic mutations in candidate genes remains challenging and clinical resequencing should mainly intended for patients with strong clinical criteria for monogenic severe HTG. Copyright © 2017 National Lipid Association. Published by Elsevier Inc. All rights reserved.

  7. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  8. Molecular diagnosis of glycogen storage disease and disorders with overlapping clinical symptoms by massive parallel sequencing.

    Science.gov (United States)

    Vega, Ana I; Medrano, Celia; Navarrete, Rosa; Desviat, Lourdes R; Merinero, Begoña; Rodríguez-Pombo, Pilar; Vitoria, Isidro; Ugarte, Magdalena; Pérez-Cerdá, Celia; Pérez, Belen

    2016-10-01

    Glycogen storage disease (GSD) is an umbrella term for a group of genetic disorders that involve the abnormal metabolism of glycogen; to date, 23 types of GSD have been identified. The nonspecific clinical presentation of GSD and the lack of specific biomarkers mean that Sanger sequencing is now widely relied on for making a diagnosis. However, this gene-by-gene sequencing technique is both laborious and costly, which is a consequence of the number of genes to be sequenced and the large size of some genes. This work reports the use of massive parallel sequencing to diagnose patients at our laboratory in Spain using either a customized gene panel (targeted exome sequencing) or the Illumina Clinical-Exome TruSight One Gene Panel (clinical exome sequencing (CES)). Sequence variants were matched against biochemical and clinical hallmarks. Pathogenic mutations were detected in 23 patients. Twenty-two mutations were recognized (mostly loss-of-function mutations), including 11 that were novel in GSD-associated genes. In addition, CES detected five patients with mutations in ALDOB, LIPA, NKX2-5, CPT2, or ANO5. Although these genes are not involved in GSD, they are associated with overlapping phenotypic characteristics such as hepatic, muscular, and cardiac dysfunction. These results show that next-generation sequencing, in combination with the detection of biochemical and clinical hallmarks, provides an accurate, high-throughput means of making genetic diagnoses of GSD and related diseases.Genet Med 18 10, 1037-1043.

  9. Accurate and Simple Calibration of DLP Projector Systems

    DEFF Research Database (Denmark)

    Wilm, Jakob; Olesen, Oline Vinter; Larsen, Rasmus

    2014-01-01

    does not rely on an initial camera calibration, and so does not carry over the error into projector calibration. A radial interpolation scheme is used to convert features coordinates into projector space, thereby allowing for a very accurate procedure. This allows for highly accurate determination...

  10. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  11. Towards rationally redesigning bacterial signaling systems using information encoded in abundant sequence data

    Science.gov (United States)

    Cheng, Ryan; Morcos, Faruck; Levine, Herbert; Onuchic, Jose

    2014-03-01

    An important challenge in biology is to distinguish the subset of residues that allow bacterial two-component signaling (TCS) proteins to preferentially interact with their correct TCS partner such that they can bind and transfer signal. Detailed knowledge of this information would allow one to search sequence-space for mutations that can systematically tune the signal transmission between TCS partners as well as re-encode a TCS protein to preferentially transfer signals to a non-partner. Motivated by the notion that this detailed information is found in sequence data, we explore the mutual sequence co-evolution between signaling partners to infer how mutations can positively or negatively alter their interaction. Using Direct Coupling Analysis (DCA) for determining evolutionarily conserved interprotein interactions, we apply a DCA-based metric to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in the in vitro phosphotransfer. Our methodology serves as a potential framework for the rational design of TCS systems as well as a framework for the system-level study of protein-protein interactions in sequence-rich systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1308264).

  12. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    Science.gov (United States)

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  13. Coval: improving alignment quality and variant calling accuracy for next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Shunichi Kosugi

    Full Text Available Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in 'targeted' alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.

  14. Genome-wide DNA methylation patterns and transcription analysis in sheep muscle.

    Directory of Open Access Journals (Sweden)

    Christine Couldrey

    Full Text Available DNA methylation plays a central role in regulating many aspects of growth and development in mammals through regulating gene expression. The development of next generation sequencing technologies have paved the way for genome-wide, high resolution analysis of DNA methylation landscapes using methodology known as reduced representation bisulfite sequencing (RRBS. While RRBS has proven to be effective in understanding DNA methylation landscapes in humans, mice, and rats, to date, few studies have utilised this powerful method for investigating DNA methylation in agricultural animals. Here we describe the utilisation of RRBS to investigate DNA methylation in sheep Longissimus dorsi muscles. RRBS analysis of ∼1% of the genome from Longissimus dorsi muscles provided data of suitably high precision and accuracy for DNA methylation analysis, at all levels of resolution from genome-wide to individual nucleotides. Combining RRBS data with mRNAseq data allowed the sheep Longissimus dorsi muscle methylome to be compared with methylomes from other species. While some species differences were identified, many similarities were observed between DNA methylation patterns in sheep and other more commonly studied species. The RRBS data presented here highlights the complexity of epigenetic regulation of genes. However, the similarities observed across species are promising, in that knowledge gained from epigenetic studies in human and mice may be applied, with caution, to agricultural species. The ability to accurately measure DNA methylation in agricultural animals will contribute an additional layer of information to the genetic analyses currently being used to maximise production gains in these species.

  15. Profile hidden Markov models for the detection of viruses within metagenomic sequence data.

    Directory of Open Access Journals (Sweden)

    Peter Skewes-Cox

    Full Text Available Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs ("vFams" to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3

  16. Critical role of cerebellar fastigial nucleus in programming sequences of saccades

    Science.gov (United States)

    King, Susan A.; Schneider, Rosalyn M.; Serra, Alessandro; Leigh, R. John

    2011-01-01

    The cerebellum plays an important role in programming accurate saccades. Cerebellar lesions affecting the ocular motor region of the fastigial nucleus (FOR) cause saccadic hypermetria; however, if a second target is presented before a saccade can be initiated (double-step paradigm), saccade hypermetria may be decreased. We tested the hypothesis that the cerebellum, especially FOR, plays a pivotal role in programming sequences of saccades. We studied patients with saccadic hypermetria due either to genetic cerebellar ataxia or surgical lesions affecting FOR and confirmed that the gain of initial saccades made to double-step stimuli was reduced compared with the gain of saccades to single target jumps. Based on measurements of the intersaccadic interval, we found that the ability to perform parallel processing of saccades was reduced or absent in all of our patients with cerebellar disease. Our results support the crucial role of the cerebellum, especially FOR, in programming sequences of saccades. PMID:21950988

  17. Highly accurate surface maps from profilometer measurements

    Science.gov (United States)

    Medicus, Kate M.; Nelson, Jessica D.; Mandina, Mike P.

    2013-04-01

    Many aspheres and free-form optical surfaces are measured using a single line trace profilometer which is limiting because accurate 3D corrections are not possible with the single trace. We show a method to produce an accurate fully 2.5D surface height map when measuring a surface with a profilometer using only 6 traces and without expensive hardware. The 6 traces are taken at varying angular positions of the lens, rotating the part between each trace. The output height map contains low form error only, the first 36 Zernikes. The accuracy of the height map is ±10% of the actual Zernike values and within ±3% of the actual peak to valley number. The calculated Zernike values are affected by errors in the angular positioning, by the centering of the lens, and to a small effect, choices made in the processing algorithm. We have found that the angular positioning of the part should be better than 1?, which is achievable with typical hardware. The centering of the lens is essential to achieving accurate measurements. The part must be centered to within 0.5% of the diameter to achieve accurate results. This value is achievable with care, with an indicator, but the part must be edged to a clean diameter.

  18. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  19. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    Science.gov (United States)

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the

  20. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng

    2015-12-03

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  1. A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction

    KAUST Repository

    Chen, Peng; Hu, ShanShan; Zhang, Jun; Gao, Xin; Li, Jinyan; Xia, Junfeng; Wang, Bing

    2015-01-01

    Background: Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures

  2. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  3. Decoding sequence learning from single-trial intracranial EEG in humans.

    Directory of Open Access Journals (Sweden)

    Marzia De Lucia

    Full Text Available We propose and validate a multivariate classification algorithm for characterizing changes in human intracranial electroencephalographic data (iEEG after learning motor sequences. The algorithm is based on a Hidden Markov Model (HMM that captures spatio-temporal properties of the iEEG at the level of single trials. Continuous intracranial iEEG was acquired during two sessions (one before and one after a night of sleep in two patients with depth electrodes implanted in several brain areas. They performed a visuomotor sequence (serial reaction time task, SRTT using the fingers of their non-dominant hand. Our results show that the decoding algorithm correctly classified single iEEG trials from the trained sequence as belonging to either the initial training phase (day 1, before sleep or a later consolidated phase (day 2, after sleep, whereas it failed to do so for trials belonging to a control condition (pseudo-random sequence. Accurate single-trial classification was achieved by taking advantage of the distributed pattern of neural activity. However, across all the contacts the hippocampus contributed most significantly to the classification accuracy for both patients, and one fronto-striatal contact for one patient. Together, these human intracranial findings demonstrate that a multivariate decoding approach can detect learning-related changes at the level of single-trial iEEG. Because it allows an unbiased identification of brain sites contributing to a behavioral effect (or experimental condition at the level of single subject, this approach could be usefully applied to assess the neural correlates of other complex cognitive functions in patients implanted with multiple electrodes.

  4. Utilization of deletion bins to anchor and order sequences along the wheat 7B chromosome.

    Science.gov (United States)

    Belova, Tatiana; Grønvold, Lars; Kumar, Ajay; Kianian, Shahryar; He, Xinyao; Lillemo, Morten; Springer, Nathan M; Lien, Sigbjørn; Olsen, Odd-Arne; Sandve, Simen R

    2014-09-01

    A total of 3,671 sequence contigs and scaffolds were mapped to deletion bins on wheat chromosome 7B providing a foundation for developing high-resolution integrated physical map for this chromosome. Bread wheat (Triticum aestivum L.) has a large, complex and highly repetitive genome which is challenging to assemble into high quality pseudo-chromosomes. As part of the international effort to sequence the hexaploid bread wheat genome by the international wheat genome sequencing consortium (IWGSC) we are focused on assembling a reference sequence for chromosome 7B. The successful completion of the reference chromosome sequence is highly dependent on the integration of genetic and physical maps. To aid the integration of these two types of maps, we have constructed a high-density deletion bin map of chromosome 7B. Using the 270 K Nimblegen comparative genomic hybridization (CGH) array on a set of cv. Chinese spring deletion lines, a total of 3,671 sequence contigs and scaffolds (~7.8 % of chromosome 7B physical length) were mapped into nine deletion bins. Our method of genotyping deletions on chromosome 7B relied on a model-based clustering algorithm (Mclust) to accurately predict the presence or absence of a given genomic sequence in a deletion line. The bin mapping results were validated using three different approaches, viz. (a) PCR-based amplification of randomly selected bin mapped sequences (b) comparison with previously mapped ESTs and (c) comparison with a 7B genetic map developed in the present study. Validation of the bin mapping results suggested a high accuracy of the assignment of 7B sequence contigs and scaffolds to the 7B deletion bins.

  5. OPAL: prediction of MoRF regions in intrinsically disordered protein sequences.

    Science.gov (United States)

    Sharma, Ronesh; Raicar, Gaurav; Tsunoda, Tatsuhiko; Patil, Ashwini; Sharma, Alok

    2018-06-01

    Intrinsically disordered proteins lack stable 3-dimensional structure and play a crucial role in performing various biological functions. Key to their biological function are the molecular recognition features (MoRFs) located within long disordered regions. Computationally identifying these MoRFs from disordered protein sequences is a challenging task. In this study, we present a new MoRF predictor, OPAL, to identify MoRFs in disordered protein sequences. OPAL utilizes two independent sources of information computed using different component predictors. The scores are processed and combined using common averaging method. The first score is computed using a component MoRF predictor which utilizes composition and sequence similarity of MoRF and non-MoRF regions to detect MoRFs. The second score is calculated using half-sphere exposure (HSE), solvent accessible surface area (ASA) and backbone angle information of the disordered protein sequence, using information from the amino acid properties of flanks surrounding the MoRFs to distinguish MoRF and non-MoRF residues. OPAL is evaluated using test sets that were previously used to evaluate MoRF predictors, MoRFpred, MoRFchibi and MoRFchibi-web. The results demonstrate that OPAL outperforms all the available MoRF predictors and is the most accurate predictor available for MoRF prediction. It is available at http://www.alok-ai-lab.com/tools/opal/. ashwini@hgc.jp or alok.sharma@griffith.edu.au. Supplementary data are available at Bioinformatics online.

  6. Foundations of Sequence-to-Sequence Modeling for Time Series

    OpenAIRE

    Kuznetsov, Vitaly; Mariet, Zelda

    2018-01-01

    The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practiti...

  7. Detection of M-Sequences from Spike Sequence in Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Yoshi Nishitani

    2012-01-01

    Full Text Available In circuit theory, it is well known that a linear feedback shift register (LFSR circuit generates pseudorandom bit sequences (PRBS, including an M-sequence with the maximum period of length. In this study, we tried to detect M-sequences known as a pseudorandom sequence generated by the LFSR circuit from time series patterns of stimulated action potentials. Stimulated action potentials were recorded from dissociated cultures of hippocampal neurons grown on a multielectrode array. We could find several M-sequences from a 3-stage LFSR circuit (M3. These results show the possibility of assembling LFSR circuits or its equivalent ones in a neuronal network. However, since the M3 pattern was composed of only four spike intervals, the possibility of an accidental detection was not zero. Then, we detected M-sequences from random spike sequences which were not generated from an LFSR circuit and compare the result with the number of M-sequences from the originally observed raster data. As a result, a significant difference was confirmed: a greater number of “0–1” reversed the 3-stage M-sequences occurred than would have accidentally be detected. This result suggests that some LFSR equivalent circuits are assembled in neuronal networks.

  8. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  9. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing

    KAUST Repository

    Zhang, Runxuan

    2017-04-05

    Alternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.

  10. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing

    KAUST Repository

    Zhang, Runxuan; Calixto, Cristiane  P.  G.; Marquez, Yamile; Venhuizen, Peter; Tzioutziou, Nikoleta A.; Guo, Wenbin; Spensley, Mark; Entizne, Juan Carlos; Lewandowska, Dominika; ten  Have, Sara; Frei  dit  Frey, Nicolas; Hirt, Heribert; James, Allan B.; Nimmo, Hugh G.; Barta, Andrea; Kalyna, Maria; Brown, John  W.  S.

    2017-01-01

    Alternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.

  11. Nucleotide and amino acid sequences of a coat protein of an Ukrainian isolate of Potato virus Y: comparison with homologous sequences of other isolates and phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Budzanivska I. G.

    2014-03-01

    Full Text Available Aim. Identification of the widespread Ukrainian isolate(s of PVY (Potato virus Y in different potato cultivars and subsequent phylogenetic analysis of detected PVY isolates based on NA and AA sequences of coat protein. Methods. ELISA, RT-PCR, DNA sequencing and phylogenetic analysis. Results. PVY has been identified serologically in potato cultivars of Ukrainian selection. In this work we have optimized a method for total RNA extraction from potato samples and offered a sensitive and specific PCR-based test system of own design for diagnostics of the Ukrainian PVY isolates. Part of the CP gene of the Ukrainian PVY isolate has been sequenced and analyzed phylogenetically. It is demonstrated that the Ukrainian isolate of Potato virus Y (CP gene has a higher percentage of homology with the recombinant isolates (strains of this pathogen (approx. 98.8– 99.8 % of homology for both nucleotide and translated amino acid sequences of the CP gene. The Ukrainian isolate of PVY is positioned in the separate cluster together with the isolates found in Syria, Japan and Iran; these isolates possibly have common origin. The Ukrainian PVY isolate is confirmed to be recombinant. Conclusions. This work underlines the need and provides the means for accurate monitoring of Potato virus Y in the agroecosystems of Ukraine. Most importantly, the phylogenetic analysis demonstrated the recombinant nature of this PVY isolate which has been attributed to the strain group O, subclade N:O.

  12. Random Tagging Genotyping by Sequencing (rtGBS, an Unbiased Approach to Locate Restriction Enzyme Sites across the Target Genome.

    Directory of Open Access Journals (Sweden)

    Elena Hilario

    Full Text Available Genotyping by sequencing (GBS is a restriction enzyme based targeted approach developed to reduce the genome complexity and discover genetic markers when a priori sequence information is unavailable. Sufficient coverage at each locus is essential to distinguish heterozygous from homozygous sites accurately. The number of GBS samples able to be pooled in one sequencing lane is limited by the number of restriction sites present in the genome and the read depth required at each site per sample for accurate calling of single-nucleotide polymorphisms. Loci bias was observed using a slight modification of the Elshire et al.some restriction enzyme sites were represented in higher proportions while others were poorly represented or absent. This bias could be due to the quality of genomic DNA, the endonuclease and ligase reaction efficiency, the distance between restriction sites, the preferential amplification of small library restriction fragments, or bias towards cluster formation of small amplicons during the sequencing process. To overcome these issues, we have developed a GBS method based on randomly tagging genomic DNA (rtGBS. By randomly landing on the genome, we can, with less bias, find restriction sites that are far apart, and undetected by the standard GBS (stdGBS method. The study comprises two types of biological replicates: six different kiwifruit plants and two independent DNA extractions per plant; and three types of technical replicates: four samples of each DNA extraction, stdGBS vs. rtGBS methods, and two independent library amplifications, each sequenced in separate lanes. A statistically significant unbiased distribution of restriction fragment size by rtGBS showed that this method targeted 49% (39,145 of BamH I sites shared with the reference genome, compared to only 14% (11,513 by stdGBS.

  13. Next-Generation Sequencing Analysis and Algorithms for PDX and CDX Models.

    Science.gov (United States)

    Khandelwal, Garima; Girotti, María Romina; Smowton, Christopher; Taylor, Sam; Wirth, Christopher; Dynowski, Marek; Frese, Kristopher K; Brady, Ged; Dive, Caroline; Marais, Richard; Miller, Crispin

    2017-08-01

    Patient-derived xenograft (PDX) and circulating tumor cell-derived explant (CDX) models are powerful methods for the study of human disease. In cancer research, these methods have been applied to multiple questions, including the study of metastatic progression, genetic evolution, and therapeutic drug responses. As PDX and CDX models can recapitulate the highly heterogeneous characteristics of a patient tumor, as well as their response to chemotherapy, there is considerable interest in combining them with next-generation sequencing to monitor the genomic, transcriptional, and epigenetic changes that accompany oncogenesis. When used for this purpose, their reliability is highly dependent on being able to accurately distinguish between sequencing reads that originate from the host, and those that arise from the xenograft itself. Here, we demonstrate that failure to correctly identify contaminating host reads when analyzing DNA- and RNA-sequencing (DNA-Seq and RNA-Seq) data from PDX and CDX models is a major confounding factor that can lead to incorrect mutation calls and a failure to identify canonical mutation signatures associated with tumorigenicity. In addition, a highly sensitive algorithm and open source software tool for identifying and removing contaminating host sequences is described. Importantly, when applied to PDX and CDX models of melanoma, these data demonstrate its utility as a sensitive and selective tool for the correction of PDX- and CDX-derived whole-exome and RNA-Seq data. Implications: This study describes a sensitive method to identify contaminating host reads in xenograft and explant DNA- and RNA-Seq data and is applicable to other forms of deep sequencing. Mol Cancer Res; 15(8); 1012-6. ©2017 AACR . ©2017 American Association for Cancer Research.

  14. When Is Network Lasso Accurate?

    Directory of Open Access Journals (Sweden)

    Alexander Jung

    2018-01-01

    Full Text Available The “least absolute shrinkage and selection operator” (Lasso method has been adapted recently for network-structured datasets. In particular, this network Lasso method allows to learn graph signals from a small number of noisy signal samples by using the total variation of a graph signal for regularization. While efficient and scalable implementations of the network Lasso are available, only little is known about the conditions on the underlying network structure which ensure network Lasso to be accurate. By leveraging concepts of compressed sensing, we address this gap and derive precise conditions on the underlying network topology and sampling set which guarantee the network Lasso for a particular loss function to deliver an accurate estimate of the entire underlying graph signal. We also quantify the error incurred by network Lasso in terms of two constants which reflect the connectivity of the sampled nodes.

  15. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus.

    Science.gov (United States)

    Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan

    2017-01-01

    The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus , occurring in 48 of the 61 Ilarvirus -positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus -like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus -like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus -like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the

  16. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus

    Directory of Open Access Journals (Sweden)

    Wycliff M. Kinoti

    2017-06-01

    Full Text Available The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV was the most frequently detected Ilarvirus, occurring in 48 of the 61 Ilarvirus-positive trees and Prune dwarf virus (PDV and Apple mosaic virus (ApMV were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus-like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus-like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus-like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples

  17. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    Science.gov (United States)

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-04-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amino acid polypeptide of 31,372 daltons.

  18. Chromosomal location and nucleotide sequence of the Escherichia coli dapA gene.

    Science.gov (United States)

    Richaud, F; Richaud, C; Ratet, P; Patte, J C

    1986-01-01

    In Escherichia coli, the first enzyme of the diaminopimelate and lysine pathway is dihydrodipicolinate synthetase, which is feedback-inhibited by lysine and encoded by the dapA gene. The location of the dapA gene on the bacterial chromosome has been determined accurately with respect to the neighboring purC and dapE genes. The complete nucleotide sequence and the transcriptional start of the dapA gene were determined. The results show that dapA consists of a single cistron encoding a 292-amino acid polypeptide of 31,372 daltons. Images PMID:3514578

  19. A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences.

    Science.gov (United States)

    Groussin, M; Boussau, B; Gouy, M

    2013-07-01

    Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor.

  20. The DNA methylome of human peripheral blood mononuclear cells.

    Directory of Open Access Journals (Sweden)

    Yingrui Li

    2010-11-01

    Full Text Available DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand, we report a comprehensive (92.62% methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and 80% displayed allele-specific expression (ASE. These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.