WorldWideScience

Sample records for gro-seq reads produced

  1. An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

    Science.gov (United States)

    Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D

    2017-01-01

    We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.

  2. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

    Science.gov (United States)

    Song, Li; Florea, Liliana

    2015-01-01

    Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

  3. Mapping RNA-seq Reads with STAR.

    Science.gov (United States)

    Dobin, Alexander; Gingeras, Thomas R

    2015-09-03

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.

  4. CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome.

    Science.gov (United States)

    Zhang, Zijun; Xing, Yi

    2017-09-19

    Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Myxococcus xanthus DK1622 Coordinates Expressions of the Duplicate groEL and Single groES Genes for Synergistic Functions of GroELs and GroES

    Directory of Open Access Journals (Sweden)

    Yue-zhong Li

    2017-04-01

    Full Text Available Chaperonin GroEL (Cpn60 requires cofactor GroES (Cpn10 for protein refolding in bacteria that possess single groEL and groES genes in a bicistronic groESL operon. Among 4,861 completely-sequenced prokaryotic genomes, 884 possess duplicate groEL genes and 770 possess groEL genes with no neighboring groES. It is unclear whether stand-alone groEL requires groES in order to function and, if required, how duplicate groEL genes and unequal groES genes balance their expressions. In Myxococcus xanthus DK1622, we determined that, while duplicate groELs were alternatively deletable, the single groES that clusters with groEL1 was essential for cell survival. Either GroEL1 or GroEL2 required interactions with GroES for in vitro and in vivo functions. Deletion of groEL1 or groEL2 resulted in decreased expressions of both groEL and groES; and ectopic complementation of groEL recovered not only the groEL but also groES expressions. The addition of an extra groES gene upstream groEL2 to form a bicistronic operon had almost no influence on groES expression and the cell survival rate, whereas over-expression of groES using a self-replicating plasmid simultaneously increased the groEL expressions. The results indicated that M. xanthus DK1622 cells coordinate expressions of the duplicate groEL and single groES genes for synergistic functions of GroELs and GroES. We proposed a potential regulation mechanism for the expression coordination.

  6. Rcount: simple and flexible RNA-Seq read counting.

    Science.gov (United States)

    Schmid, Marc W; Grossniklaus, Ueli

    2015-02-01

    Analysis of differential gene expression by RNA sequencing (RNA-Seq) is frequently done using feature counts, i.e. the number of reads mapping to a gene. However, commonly used count algorithms (e.g. HTSeq) do not address the problem of reads aligning with multiple locations in the genome (multireads) or reads aligning with positions where two or more genes overlap (ambiguous reads). Rcount specifically addresses these issues. Furthermore, Rcount allows the user to assign priorities to certain feature types (e.g. higher priority for protein-coding genes compared to rRNA-coding genes) or to add flanking regions. Rcount provides a fast and easy-to-use graphical user interface requiring no command line or programming skills. It is implemented in C++ using the SeqAn (www.seqan.de) and the Qt libraries (qt-project.org). Source code and 64 bit binaries for (Ubuntu) Linux, Windows (7) and MacOSX are released under the GPLv3 license and are freely available on github.com/MWSchmid/Rcount. marcschmid@gmx.ch Test data, genome annotation files, useful Python and R scripts and a step-by-step user guide (including run-time and memory usage tests) are available on github.com/MWSchmid/Rcount. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data.

    Directory of Open Access Journals (Sweden)

    Tsutomu Ikegami

    Full Text Available A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.

  8. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    A 2.7 kb DNA fragment encoding the 60 kDa common antigen (CA) and a 13 kDa protein of Legionella micdadei was sequenced. Two open reading frames of 57,677 and 10,456 Da were identified, corresponding to the heat shock proteins GroEL and GroES, respectively. Typical -35, -10, and Shine-Dalgarno heat...

  9. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.

    Science.gov (United States)

    Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi

    2018-03-09

    High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.

  10. Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data

    Directory of Open Access Journals (Sweden)

    Duan Jialei

    2012-08-01

    Full Text Available Abstract Background Rapid advances in next-generation sequencing methods have provided new opportunities for transcriptome sequencing (RNA-Seq. The unprecedented sequencing depth provided by RNA-Seq makes it a powerful and cost-efficient method for transcriptome study, and it has been widely used in model organisms and non-model organisms to identify and quantify RNA. For non-model organisms lacking well-defined genomes, de novo assembly is typically required for downstream RNA-Seq analyses, including SNP discovery and identification of genes differentially expressed by phenotypes. Although RNA-Seq has been successfully used to sequence many non-model organisms, the results of de novo assembly from short reads can still be improved by using recent bioinformatic developments. Results In this study, we used 212.6 million pair-end reads, which accounted for 16.2 Gb, to assemble the hexaploid wheat transcriptome. Two state-of-the-art assemblers, Trinity and Trans-ABySS, which use the single and multiple k-mer methods, respectively, were used, and the whole de novo assembly process was divided into the following four steps: pre-assembly, merging different samples, removal of redundancy and scaffolding. We documented every detail of these steps and how these steps influenced assembly performance to gain insight into transcriptome assembly from short reads. After optimization, the assembled transcripts were comparable to Sanger-derived ESTs in terms of both continuity and accuracy. We also provided considerable new wheat transcript data to the community. Conclusions It is feasible to assemble the hexaploid wheat transcriptome from short reads. Special attention should be paid to dealing with multiple samples to balance the spectrum of expression levels and redundancy. To obtain an accurate overview of RNA profiling, removal of redundancy may be crucial in de novo assembly.

  11. Gro2mat: a package to efficiently read gromacs output in MATLAB.

    Science.gov (United States)

    Dien, Hung; Deane, Charlotte M; Knapp, Bernhard

    2014-07-30

    Molecular dynamics (MD) simulations are a state-of-the-art computational method used to investigate molecular interactions at atomic scale. Interaction processes out of experimental reach can be monitored using MD software, such as Gromacs. Here, we present the gro2mat package that allows fast and easy access to Gromacs output files from Matlab. Gro2mat enables direct parsing of the most common Gromacs output formats including the binary xtc-format. No openly available Matlab parser currently exists for this format. The xtc reader is orders of magnitudes faster than other available pdb/ascii workarounds. Gro2mat is especially useful for scientists with an interest in quick prototyping of new mathematical and statistical approaches for Gromacs trajectory analyses. © 2014 Wiley Periodicals, Inc. Copyright © 2014 Wiley Periodicals, Inc.

  12. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.

    Science.gov (United States)

    Law, Charity W; Chen, Yunshun; Shi, Wei; Smyth, Gordon K

    2014-02-03

    New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.

  13. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

    Science.gov (United States)

    Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz

    2011-07-01

    Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

  14. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

    Directory of Open Access Journals (Sweden)

    Dongjun Chung

    2011-07-01

    Full Text Available Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads. This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads. Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

  15. MiSeq: A Next Generation Sequencing Platform for Genomic Analysis.

    Science.gov (United States)

    Ravi, Rupesh Kanchi; Walton, Kendra; Khosroheidari, Mahdieh

    2018-01-01

    MiSeq, Illumina's integrated next generation sequencing instrument, uses reversible-terminator sequencing-by-synthesis technology to provide end-to-end sequencing solutions. The MiSeq instrument is one of the smallest benchtop sequencers that can perform onboard cluster generation, amplification, genomic DNA sequencing, and data analysis, including base calling, alignment and variant calling, in a single run. It performs both single- and paired-end runs with adjustable read lengths from 1 × 36 base pairs to 2 × 300 base pairs. A single run can produce output data of up to 15 Gb in as little as 4 h of runtime and can output up to 25 M single reads and 50 M paired-end reads. Thus, MiSeq provides an ideal platform for rapid turnaround time. MiSeq is also a cost-effective tool for various analyses focused on targeted gene sequencing (amplicon sequencing and target enrichment), metagenomics, and gene expression studies. For these reasons, MiSeq has become one of the most widely used next generation sequencing platforms. Here, we provide a protocol to prepare libraries for sequencing using the MiSeq instrument and basic guidelines for analysis of output data from the MiSeq sequencing run.

  16. HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data

    Directory of Open Access Journals (Sweden)

    Maher Christopher A

    2010-07-01

    Full Text Available Abstract Background Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP with next-generation sequencing, (ChIP-Seq. This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method. Results Here we introduce HPeak, a Hidden Markov model (HMM-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage. Conclusions Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.

  17. ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

    Science.gov (United States)

    Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk

    2014-03-01

    RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net

  18. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance

    2013-01-01

    RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

  19. GroEL-GroES assisted folding of multiple recombinant proteins simultaneously over-expressed in Escherichia coli.

    Science.gov (United States)

    Goyal, Megha; Chaudhuri, Tapan K

    2015-07-01

    Folding of aggregation prone recombinant proteins through co-expression of chaperonin GroEL and GroES has been a popular practice in the effort to optimize preparation of functional protein in Escherichia coli. Considering the demand for functional recombinant protein products, it is desirable to apply the chaperone assisted protein folding strategy for enhancing the yield of properly folded protein. Toward the same direction, it is also worth attempting folding of multiple recombinant proteins simultaneously over-expressed in E. coli through the assistance of co-expressed GroEL-ES. The genesis of this thinking was originated from the fact that cellular GroEL and GroES assist in the folding of several endogenous proteins expressed in the bacterial cell. Here we present the experimental findings from our study on co-expressed GroEL-GroES assisted folding of simultaneously over-expressed proteins maltodextrin glucosidase (MalZ) and yeast mitochondrial aconitase (mAco). Both proteins mentioned here are relatively larger and aggregation prone, mostly form inclusion bodies, and undergo GroEL-ES assisted folding in E. coli cells during over-expression. It has been reported that the relative yield of properly folded functional forms of MalZ and mAco with the exogenous GroEL-ES assistance were comparable with the results when these proteins were overexpressed alone. This observation is quite promising and highlights the fact that GroEL and GroES can assist in the folding of multiple substrate proteins simultaneously when over-expressed in E. coli. This method might be a potential tool for enhanced production of multiple functional recombinant proteins simultaneously in E. coli. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.

    Science.gov (United States)

    Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K

    2016-01-01

    In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  1. Mycobacteria contain two groEL genes: the second Mycobacterium leprae groEL gene is arranged in an operon with groES

    NARCIS (Netherlands)

    Rinke de Wit, T. F.; Bekelie, S.; Osland, A.; Miko, T. L.; Hermans, P. W.; van Soolingen, D.; Drijfhout, J. W.; Schöningh, R.; Janson, A. A.; Thole, J. E.

    1992-01-01

    In contrast to other bacterial species, mycobacteria were thus far considered to contain groEL and groES genes that are present on separate loci on their chromosomes, Here, by screening a Mycobacterium leprae lambda gt11 expression library with serum from an Ethiopian lepromatous leprosy patient,

  2. Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection.

    Science.gov (United States)

    Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz

    2016-02-24

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

  3. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    Science.gov (United States)

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  4. In silico engineering of aggregation-prone recombinant proteins for substrate recognition by the chaperonin GroEL.

    Science.gov (United States)

    Kumar, Vipul; Punetha, Ankita; Sundar, Durai; Chaudhuri, Tapan K

    2012-01-01

    Molecular chaperones appear to have been evolved to facilitate protein folding in the cell through entrapment of folding intermediates on the interior of a large cavity formed between GroEL and its co-chaperonin GroES. They bind newly synthesized or non-native polypeptides through hydrophobic interactions and prevent their aggregation. Some proteins do not interact with GroEL, hence even though they are aggregation prone, cannot be assisted by GroEL for their folding. In this study, we have attempted to engineer these non-substrate proteins to convert them as the substrate for GroEL, without compromising on their function. We have used a computational biology approach to generate mutants of the selected proteins by selectively mutating residues in the hydrophobic patch, similar to GroES mobile loop region that are responsible for interaction with GroEL, and compared with the wild counterparts for calculation of their instability and aggregation propensities. The energies of the newly designed mutants were computed through molecular dynamics simulations. We observed increased aggregation propensity of some of the mutants formed after replacing charged amino acid residues with hydrophobic ones in the well defined hydrophobic patch, raising the possibility of their binding ability to GroEL. The newly generated mutants may provide potential substrates for Chaperonin GroEL, which can be experimentally generated and tested for their tendency of aggregation, interactions with GroEL and the possibility of chaperone-assisted folding to produce functional proteins.

  5. Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.

    Science.gov (United States)

    Bai, Yongsheng; Kinne, Jeff; Donham, Brandon; Jiang, Feng; Ding, Lizhong; Hassler, Justin R; Kaufman, Randal J

    2016-08-22

    Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bioinformatics approaches in conjunction with next-generation technologies. During endoplasmic reticulum (ER) stress, the gene encoding the RNase Ire1α is known to splice out a short 26 nt region from the mRNA of the transcription factor Xbp1 non-canonically within the cytosol. This causes an open reading frame-shift that induces expression of many downstream genes in reaction to ER stress as part of the unfolded protein response (UPR). We previously published an algorithm termed "Read-Split-Walk" (RSW) to identify non-canonical splicing regions using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygote and knockout mouse embryonic fibroblast cell lines. In this study, we have developed an improved algorithm "Read-Split-Run" (RSR) for detecting genome-wide Ire1α-targeted genes with non-canonical spliced regions at a faster speed. We applied the RSR algorithm using different combinations of several parameters to the previously RSW tested mouse embryonic fibroblast cells (MEF) and the human Encyclopedia of DNA Elements (ENCODE) RNA-Seq data. We also compared the performance of RSR with two other alternative splicing events identification tools (TopHat (Trapnell et al., Bioinformatics 25:1105-1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012)) utilizing the context of the spliced Xbp1 mRNA as a positive control in the data sets we identified it to be the top cleavage target present in Ire1α (+/-) but absent in Ire1α (-/-) MEF samples and this comparison was also extended to human ENCODE RNA-Seq data. Proof of principle came in our results by the fact that the 26 nt non-conventional splice site in Xbp1 was detected as the top hit by our new RSR

  6. Effect of method of deduplication on estimation of differential gene expression using RNA-seq

    Directory of Open Access Journals (Sweden)

    Anna V. Klepikova

    2017-03-01

    Full Text Available Background RNA-seq is a useful tool for analysis of gene expression. However, its robustness is greatly affected by a number of artifacts. One of them is the presence of duplicated reads. Results To infer the influence of different methods of removal of duplicated reads on estimation of gene expression in cancer genomics, we analyzed paired samples of hepatocellular carcinoma (HCC and non-tumor liver tissue. Four protocols of data analysis were applied to each sample: processing without deduplication, deduplication using a method implemented in SAMtools, and deduplication based on one or two molecular indices (MI. We also analyzed the influence of sequencing layout (single read or paired end and read length. We found that deduplication without MI greatly affects estimated expression values; this effect is the most pronounced for highly expressed genes. Conclusion The use of unique molecular identifiers greatly improves accuracy of RNA-seq analysis, especially for highly expressed genes. We developed a set of scripts that enable handling of MI and their incorporation into RNA-seq analysis pipelines. Deduplication without MI affects results of differential gene expression analysis, producing a high proportion of false negative results. The absence of duplicate read removal is biased towards false positives. In those cases where using MI is not possible, we recommend using paired-end sequencing layout.

  7. A packet switched communications system for GRO

    Science.gov (United States)

    Husain, Shabu; Yang, Wen-Hsing; Vadlamudi, Rani; Valenti, Joseph

    1993-11-01

    This paper describes the packet switched Instrumenters Communication System (ICS) that was developed for the Command Management Facility at GSFC to support the Gamma Ray Observatory (GRO) spacecraft. The GRO ICS serves as a vital science data acquisition link to the GRO scientists to initiate commands for their spacecraft instruments. The system is ready to send and receive messages at any time, 24 hours a day and seven days a week. The system is based on X.25 and the International Standard Organization's (ISO) 7-layer Open Systems Interconnection (OSI) protocol model and has client and server components. The components of the GRO ICS are discussed along with how the Communications Subsystem for Interconnection (CSFI) and Network Control Program Packet Switching Interface (NPSI) software are used in the system.

  8. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees: 5 approved

    Directory of Open Access Journals (Sweden)

    Yunshun Chen

    2016-08-01

    Full Text Available In recent years, RNA sequencing (RNA-seq has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  9. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts.

    Science.gov (United States)

    Ryan, Michael C; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N

    2012-09-15

    SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. mryan@insilico.us.com Supplementary data are available at Bioinformatics online.

  10. MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies

    Directory of Open Access Journals (Sweden)

    Pankaj Kumar

    2015-01-01

    Full Text Available The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments. Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA, Biosample, Bioprojects, and Gene Expression Omnibus (GEO. Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe “MetaRNA-Seq,” a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.

  11. Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

    Science.gov (United States)

    Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent

    2017-01-01

    The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when

  12. An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.

    Science.gov (United States)

    Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit

    2016-05-26

    Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.

  13. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Yunshun Chen

    2016-06-01

    Full Text Available In recent years, RNA sequencing (RNA-seq has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  14. Dataset concerning GroEL chaperonin interaction with proteins

    Directory of Open Access Journals (Sweden)

    V.V. Marchenkov

    2016-03-01

    Full Text Available GroEL chaperonin is well-known to interact with a wide variety of polypeptide chains. Here we show the data related to our previous work (http://dx.doi.org/10.1016/j.pep.2015.11.020 [1], and concerning the interaction of GroEL with native (lysozyme, α-lactalbumin and denatured (lysozyme, α-lactalbumin and pepsin proteins in solution. The use of affinity chromatography on the base of denatured pepsin for GroEL purification from fluorescent impurities is represented as well.

  15. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.

    Directory of Open Access Journals (Sweden)

    Chandra Shekhar Pareek

    Full Text Available RNA-seq is a useful next-generation sequencing (NGS technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits.The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM SNP genotyping assay. The

  16. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.

    Science.gov (United States)

    Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Pierzchała, Mariusz; Feng, Yaping; Kadarmideen, Haja N; Kumar, Dibyendu

    2017-01-01

    RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF) and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits. The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel) positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs) with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM) SNP genotyping assay. The comprehensive

  17. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    Science.gov (United States)

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  18. Limitations and possibilities of low cell number ChIP-seq

    Directory of Open Access Journals (Sweden)

    Gilfillan Gregor D

    2012-11-01

    Full Text Available Abstract Background Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq offers high resolution, genome-wide analysis of DNA-protein interactions. However, current standard methods require abundant starting material in the range of 1–20 million cells per immunoprecipitation, and remain a bottleneck to the acquisition of biologically relevant epigenetic data. Using a ChIP-seq protocol optimised for low cell numbers (down to 100,000 cells / IP, we examined the performance of the ChIP-seq technique on a series of decreasing cell numbers. Results We present an enhanced native ChIP-seq method tailored to low cell numbers that represents a 200-fold reduction in input requirements over existing protocols. The protocol was tested over a range of starting cell numbers covering three orders of magnitude, enabling determination of the lower limit of the technique. At low input cell numbers, increased levels of unmapped and duplicate reads reduce the number of unique reads generated, and can drive up sequencing costs and affect sensitivity if ChIP is attempted from too few cells. Conclusions The optimised method presented here considerably reduces the input requirements for performing native ChIP-seq. It extends the applicability of the technique to isolated primary cells and rare cell populations (e.g. biobank samples, stem cells, and in many cases will alleviate the need for cell culture and any associated alteration of epigenetic marks. However, this study highlights a challenge inherent to ChIP-seq from low cell numbers: as cell input numbers fall, levels of unmapped sequence reads and PCR-generated duplicate reads rise. We discuss a number of solutions to overcome the effects of reducing cell number that may aid further improvements to ChIP performance.

  19. A non-parametric peak calling algorithm for DamID-Seq.

    Directory of Open Access Journals (Sweden)

    Renhua Li

    Full Text Available Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS of double sex (DSX-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq. One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only. After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1 reads resampling; 2 reads scaling (normalization and computing signal-to-noise fold changes; 3 filtering; 4 Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC. We also used irreproducible discovery rate (IDR analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  20. A non-parametric peak calling algorithm for DamID-Seq.

    Science.gov (United States)

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  1. Hyb-Seq: Combining Target Enrichment and Genome Skimming for Plant Phylogenomics

    Directory of Open Access Journals (Sweden)

    Kevin Weitemier

    2014-08-01

    Full Text Available Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics.

  2. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea.

    Directory of Open Access Journals (Sweden)

    Hajime Muraguchi

    Full Text Available The basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC. To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.

  3. Gene expression profiling of human breast tissue samples using SAGE-Seq.

    Science.gov (United States)

    Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley

    2010-12-01

    We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.

  4. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Directory of Open Access Journals (Sweden)

    Dewey Colin N

    2011-08-01

    Full Text Available Abstract Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost

  5. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.

    Science.gov (United States)

    Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho

    2015-10-28

    Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ

  6. Transcriptome Analysis of the Thymus in Short-Term Calorie-Restricted Mice Using RNA-seq

    Directory of Open Access Journals (Sweden)

    Zehra Omeroğlu Ulu

    2018-01-01

    Full Text Available Calorie restriction (CR, which is a factor that expands lifespan and an important player in immune response, is an effective protective method against cancer development. Thymus, which plays a critical role in the development of the immune system, reacts to nutrition deficiency quickly. RNA-seq-based transcriptome sequencing was performed to thymus tissues of MMTV-TGF-α mice subjected to ad libitum (AL, chronic calorie restriction (CCR, and intermittent calorie restriction (ICR diets in this study. Three cDNA libraries were sequenced using Illumina HiSeq™ 4000 to produce 100 base pair-end reads. On average, 105 million clean reads were mapped and in total 6091 significantly differentially expressed genes (DEGs were identified (p<0.05. These DEGs were clustered into Gene Ontology (GO categories. The expression pattern revealed by RNA-seq was validated by quantitative real-time PCR (qPCR analysis of four important genes, which are leptin, ghrelin, Igf1, and adinopectin. RNA-seq data has been deposited in NCBI Gene Expression Omnibus (GEO database (GSE95371. We report the use of RNA sequencing to find DEGs that are affected by different feeding regimes in the thymus.

  7. Transcriptome Analysis of the Thymus in Short-Term Calorie-Restricted Mice Using RNA-seq

    Science.gov (United States)

    Omeroğlu Ulu, Zehra; Ulu, Salih; Dogan, Soner; Guvenc Tuna, Bilge

    2018-01-01

    Calorie restriction (CR), which is a factor that expands lifespan and an important player in immune response, is an effective protective method against cancer development. Thymus, which plays a critical role in the development of the immune system, reacts to nutrition deficiency quickly. RNA-seq-based transcriptome sequencing was performed to thymus tissues of MMTV-TGF-α mice subjected to ad libitum (AL), chronic calorie restriction (CCR), and intermittent calorie restriction (ICR) diets in this study. Three cDNA libraries were sequenced using Illumina HiSeq™ 4000 to produce 100 base pair-end reads. On average, 105 million clean reads were mapped and in total 6091 significantly differentially expressed genes (DEGs) were identified (p < 0.05). These DEGs were clustered into Gene Ontology (GO) categories. The expression pattern revealed by RNA-seq was validated by quantitative real-time PCR (qPCR) analysis of four important genes, which are leptin, ghrelin, Igf1, and adinopectin. RNA-seq data has been deposited in NCBI Gene Expression Omnibus (GEO) database (GSE95371). We report the use of RNA sequencing to find DEGs that are affected by different feeding regimes in the thymus. PMID:29511668

  8. Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

    Science.gov (United States)

    Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629

  9. Chaperonin GroEL/GroES Over-Expression Promotes Aminoglycoside Resistance and Reduces Drug Susceptibilities in Escherichia coli Following Exposure to Sublethal Aminoglycoside Doses

    DEFF Research Database (Denmark)

    Goltermann, Lise; Sarusie, Menachem V; Bentin, Thomas

    2016-01-01

    Antibiotic resistance is an increasing challenge to modern healthcare. Aminoglycoside antibiotics cause translation corruption and protein misfolding and aggregation in Escherichia coli. We previously showed that chaperonin GroEL/GroES depletion and over-expression sensitize and promote short...

  10. [Prokaryotic expression of Leptospira interrogans groEL gene and immunoprotection of its products in hamsters].

    Science.gov (United States)

    Li, Xiaoyu; Wang, Yinhuan; Yan, Jie; Cheng, Dongqing

    2013-03-01

    To construct a prokaryotic expression system of groEL gene of Leptospira interrogans serogroup Icterohaemorrhagia serovar Lai strain Lai, and to determine the immunoprotective effect of recombinant GroEL protein (rGroEL) in LVG hamsters. The groEL gene was amplified by high fidelity PCR and the amplification products were then sequenced. A prokaryotic expression system of groEL gene was constructed using routine genetic engineering technique. SDS-PAGE plus Bio-Rad Gel Image Analyzer was applied to examine the expression and dissolubility of rGroEL protein while Ni-NTA affinity chromatography was used to extract the expressed rGroEL. The immunoprotective rate in rGroEL-immunized LVG hamsters was determined after challenge with L.interrogans strain Lai. The cross agglutination titers of sera from immunized hamsters with different L.interrogans serogroups were detected using MAT. The nucleotide and amino acid sequences of the cloned groEL gene were the same as those reported in GenBank. The constructed prokaryotic expression system of groEL gene expressed soluble rGroEL. The immunoprotective rates of 100 and 200 μg rGroEL in LVG hamsters were 50.0 % and 75.0%, respectively. The sera from the rGroEL-immunized LVG hamsters agglutinated all the L.interrogans serogroups tested with different levels. The GroEL protein is a genus-specific immunoprotective antigen of L.interrogans and can be used to develop an universal genetically engineering vaccine of Leptospira.

  11. Differential T-cell recognition of native and recombinant Mycobacterium tuberculosis GroES

    DEFF Research Database (Denmark)

    Rosenkrands, I; Weldingh, K; Ravn, P

    1999-01-01

    Mycobacterium tuberculosis GroES was purified from culture filtrate, and its identity was confirmed by immunoblot analysis and N-terminal sequencing. Comparing the immunological recognition of native and recombinant GroES, we found that whereas native GroES elicited a strong proliferative response...

  12. PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.

    Science.gov (United States)

    Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng

    2018-05-01

    The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.

  13. The Chaperonin GroEL: A Versatile Tool for Applied Biotechnology Platforms

    Directory of Open Access Journals (Sweden)

    Pierce T. O'Neil

    2018-05-01

    Full Text Available The nucleotide-free chaperonin GroEL is capable of capturing transient unfolded or partially unfolded states that flicker in and out of existence due to large-scale protein dynamic vibrational modes. In this work, three short vignettes are presented to highlight our continuing advances in the application of GroEL biosensor biolayer interferometry (BLI technologies and includes expanded uses of GroEL as a molecular scaffold for electron microscopy determination. The first example presents an extension of the ability to detect dynamic pre-aggregate transients in therapeutic protein solutions where the assessment of the kinetic stability of any folded protein or, as shown herein, quantitative detection of mutant-type protein when mixed with wild-type native counterparts. Secondly, using a BLI denaturation pulse assay with GroEL, the comparison of kinetically controlled denaturation isotherms of various von Willebrand factor (vWF triple A domain mutant-types is shown. These mutant-types are single point mutations that locally disorder the A1 platelet binding domain resulting in one gain of function and one loss of function phenotype. Clear, separate, and reproducible kinetic deviations in the mutant-type isotherms exist when compared with the wild-type curve. Finally, expanding on previous electron microscopy (EM advances using GroEL as both a protein scaffold surface and a release platform, examples are presented where GroEL-protein complexes can be imaged using electron microscopy tilt series and the low-resolution structures of aggregation-prone proteins that have interacted with GroEL. The ability of GroEL to bind hydrophobic regions and transient partially folded states allows one to employ this unique molecular chaperone both as a versatile structural scaffold and as a sensor of a protein's folded states.

  14. Sequence analysis of the Legionella micdadei groELS operon

    DEFF Research Database (Denmark)

    Hindersson, P; Høiby, N; Bangsborg, Jette Marie

    1991-01-01

    shock expression signals were identified upstream of the L. micdadei groEL gene. Further upstream, a poly-T region, also a feature of the sigma 32-regulated Escherichia coli groELS heat shock operon, was found. Despite the high degree of homology of the expression signals in E. coli and L. micdadei...

  15. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

    Science.gov (United States)

    Wala, Jeremiah; Beroukhim, Rameen

    2017-03-01

    We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  16. Zur Beteiligung und Bedeutung von Großeltern in strittigen Sorgerechtsverfahren

    OpenAIRE

    Speidel, Leonie

    2009-01-01

    Die Beteiligung und Bedeutung von Großeltern in strittigen Sorgerechtsverfahren wurde anhand von 28 kinder- und jugendpsychiatrischen Gutachten über 36 Kinder analysiert. Die Beteiligung der Großeltern an familiären Interaktionen sowie ihre vielfältigen Rollen und Funktionen in der Familie nehmen nach der Trennung zu. Ihre Bedeutsamkeit steigt vor allem für ihre Enkel an. Vor der Separation unterstützen die Großeltern die Eltern in finanziellen Bereichen, im Haushalt und stundenweise i...

  17. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Bruno, Vincent M.; Fang, Zhide; Meng, Xiandong; Blow, Matthew; Zhang, Tao; Sherlock, Gavin; Snyder, Michael; Wang, Zhong

    2010-11-19

    Background: Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied. Results: Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95percent) and reconstruct full-length genes for the majority of the existing gene models (54.3percent). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics. Conclusions: These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

  18. Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis.

    Directory of Open Access Journals (Sweden)

    Wei Zhang

    2015-12-01

    Full Text Available High-throughput mRNA sequencing (RNA-Seq is widely used for transcript quantification of gene isoforms. Since RNA-Seq data alone is often not sufficient to accurately identify the read origins from the isoforms for quantification, we propose to explore protein domain-domain interactions as prior knowledge for integrative analysis with RNA-Seq data. We introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Based on our observation that the abundances of the neighboring isoforms by domain-domain interactions in the network are positively correlated, Net-RSTQ models the expression of the neighboring transcripts as Dirichlet priors on the likelihood of the observed read alignments against the transcripts in one gene. The transcript abundances of all the genes are then jointly estimated with alternating optimization of multiple EM problems. In simulation Net-RSTQ effectively improved isoform transcript quantifications when isoform co-expressions correlate with their interactions. qRT-PCR results on 25 multi-isoform genes in a stem cell line, an ovarian cancer cell line, and a breast cancer cell line also showed that Net-RSTQ estimated more consistent isoform proportions with RNA-Seq data. In the experiments on the RNA-Seq data in The Cancer Genome Atlas (TCGA, the transcript abundances estimated by Net-RSTQ are more informative for patient sample classification of ovarian cancer, breast cancer and lung cancer. All experimental results collectively support that Net-RSTQ is a promising approach for isoform quantification. Net-RSTQ toolbox is available at http://compbio.cs.umn.edu/Net-RSTQ/.

  19. Predicting stimulation-dependent enhancer-promoter interactions from ChIP-Seq time course data

    Directory of Open Access Journals (Sweden)

    Tomasz Dzida

    2017-09-01

    Full Text Available We have developed a machine learning approach to predict stimulation-dependent enhancer-promoter interactions using evidence from changes in genomic protein occupancy over time. The occupancy of estrogen receptor alpha (ERα, RNA polymerase (Pol II and histone marks H2AZ and H3K4me3 were measured over time using ChIP-Seq experiments in MCF7 cells stimulated with estrogen. A Bayesian classifier was developed which uses the correlation of temporal binding patterns at enhancers and promoters and genomic proximity as features to predict interactions. This method was trained using experimentally determined interactions from the same system and was shown to achieve much higher precision than predictions based on the genomic proximity of nearest ERα binding. We use the method to identify a genome-wide confident set of ERα target genes and their regulatory enhancers genome-wide. Validation with publicly available GRO-Seq data demonstrates that our predicted targets are much more likely to show early nascent transcription than predictions based on genomic ERα binding proximity alone.

  20. The ChIP-Seq tools and web server: a resource for analyzing ChIP-seq and other types of genomic data.

    Science.gov (United States)

    Ambrosini, Giovanna; Dreos, René; Kumar, Sunil; Bucher, Philipp

    2016-11-18

    ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .

  1. How to orient the functional GroEL-SR1 mutant for atomic force microscopy investigations

    International Nuclear Information System (INIS)

    Schiener, Jens; Witt, Susanne; Hayer-Hartl, Manajit; Guckenberger, Reinhard

    2005-01-01

    We present high-resolution atomic force microscopy (AFM) imaging of the single-ring mutant of the chaperonin GroEL (SR-EL) from Escherichia coli in buffer solution. The native GroEL is generally unsuitable for AFM scanning as it is easily being bisected by forces exerted by the AFM tip. The single-ring mutant of GroEL with its simplified composition, but unaltered capability of binding substrates and the co-chaperone GroES, is a more suited system for AFM studies. We worked out a scheme to systematically investigate both the apical and the equatorial faces of SR-EL, as it binds in a preferred orientation to hydrophilic mica and hydrophobic highly ordered pyrolytic graphite. High-resolution topographical imaging and the interaction of the co-chaperone GroES were used to assign the orientations of SR-EL in comparison with the physically bisected GroEL. The usage of SR-EL facilitates single molecule studies on the folding cycle of the GroE system using AFM

  2. SSP: an interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads.

    Science.gov (United States)

    Safikhani, Zhaleh; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz

    2013-01-01

    Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp. © 2013.

  3. Quantifying the impact of inter-site heterogeneity on the distribution of ChIP-seq data

    Directory of Open Access Journals (Sweden)

    Jonathan eCairns

    2014-11-01

    Full Text Available Chromatin Immunoprecipitation followed by sequencing (ChIP-seq is a valuable tool for epigenetic studies. Analysis of the data arising from ChIP-seq experiments often requires implicit or explicit statistical modelling of the read counts. The simple Poisson model is attractive, but does not provide a good fit to observed ChIP-seq data. Researchers therefore often either extend to a more general model (e.g. the Negative Binomial, and/or exclude regions of the genome that do not conform to the model. Since many modelling strategies employed for ChIP-seq data reduce to fitting a mixture of Poisson distributions, we explore the problem of inferring the optimal mixing distribution. We apply the Constrained Newton Method (CNM, which suggests the Negative Binomial - Negative Binomial (NB-NB mixture model as a candidate for modelling ChIP-seq data. We illustrate fitting the NB-NB model with an accelerated EM algorithm on four data sets from three species. Zero-inflated models have been suggested as an approach to improve model fit for ChIP-seq data. We show that the NB-NB mixture model requires no zero-inflation and suggest that in some cases the need for zero inflation is driven by the model's inability to cope with both artefactual large read counts and the frequently observed very low read counts.We see that the CNM-based approach is a useful diagnostic for the assessment of model fit and inference in ChIP-seq data and beyond. Use of the suggested NB-NB mixture model will be of value not only when calling peaks or otherwise modelling ChIP-seq data, but also when simulating data or constructing blacklists de novo.

  4. groE mutants of Escherichia coli are defective in umuDC-dependent UV mutagenesis

    International Nuclear Information System (INIS)

    Donnelly, C.E.; Walker, G.C.

    1989-01-01

    Overexpression of the SOS-inducible umuDC operon of Escherichia coli results in the inability of these cells to grow at 30 degrees C. Mutations in several heat shock genes suppress this cold sensitivity. Suppression of umuD+C+-dependent cold sensitivity appears to occur by two different mechanisms. We show that mutations in lon and dnaK heat shock genes suppress cold sensitivity in a lexA-dependent manner. In contrast, mutations in groES, groEL, and rpoH heat shock genes suppress cold sensitivity regardless of the transcriptional regulation of the umuDC genes. We have also found that mutations in groES and groEL genes are defective in umuDC-dependent UV mutagenesis. This defect can be suppressed by increased expression of the umuDC operon. The mechanism by which groE mutations affect umuDC gene product function may be related to the stability of the UmuC protein, since the half-life of this protein is shortened because of mutations at the groE locus

  5. Differential conformational modulations of MreB folding upon interactions with GroEL/ES and TRiC chaperonin components

    Science.gov (United States)

    Moparthi, Satish Babu; Carlsson, Uno; Vincentelli, Renaud; Jonsson, Bengt-Harald; Hammarström, Per; Wenger, Jérôme

    2016-01-01

    Here, we study and compare the mechanisms of action of the GroEL/GroES and the TRiC chaperonin systems on MreB client protein variants extracted from E. coli. MreB is a homologue to actin in prokaryotes. Single-molecule fluorescence correlation spectroscopy (FCS) and time-resolved fluorescence polarization anisotropy report the binding interaction of folding MreB with GroEL, GroES and TRiC. Fluorescence resonance energy transfer (FRET) measurements on MreB variants quantified molecular distance changes occurring during conformational rearrangements within folding MreB bound to chaperonins. We observed that the MreB structure is rearranged by a binding-induced expansion mechanism in TRiC, GroEL and GroES. These results are quantitatively comparable to the structural rearrangements found during the interaction of β-actin with GroEL and TRiC, indicating that the mechanism of chaperonins is conserved during evolution. The chaperonin-bound MreB is also significantly compacted after addition of AMP-PNP for both the GroEL/ES and TRiC systems. Most importantly, our results showed that GroES may act as an unfoldase by inducing a dramatic initial expansion of MreB (even more than for GroEL) implicating a role for MreB folding, allowing us to suggest a delivery mechanism for GroES to GroEL in prokaryotes. PMID:27328749

  6. Mimicking the action of GroEL in molecular dynamics simulations : Application to the refinement of protein structures

    NARCIS (Netherlands)

    Fan, H; Mark, AE

    Bacterial chaperonin, GroEL, together with its co-chaperonin, GroES, facilitates the folding of a variety of polypeptides. Experiments suggest that GroEL stimulates protein folding by multiple cycles of binding and release. Misfolded proteins first bind to an exposed hydrophobic surface on GroEL.

  7. Chaperonin GroE-facilitated refolding of disulfide-bonded and reduced Taka-amylase A from Aspergillus oryzae.

    Science.gov (United States)

    Kawata, Y; Hongo, K; Mizobata, T; Nagai, J

    1998-12-01

    The refolding characteristics of Taka-amylase A (TAA) from Aspergillus oryzae in the presence of the chaperonin GroE were studied in terms of activity and fluorescence. Disulfide-bonded (intact) TAA and non-disulfide-bonded (reduced) TAA were unfolded in guanidine hydrochloride and refolded by dilution into buffer containing GroE. The intermediates of both intact and reduced enzymes were trapped by GroEL in the absence of nucleotide. Upon addition of nucleotides such as ATP, ADP, CTP or UTP, the intermediates were released from GroEL and recovery of activity was detected. In both cases, the refolding yields in the presence of GroEL and ATP were higher than spontaneous recoveries. Fluorescence studies of intrinsic tryptophan and a hydrophobic probe, 8-anilinonaphthalene-1-sulfonate, suggested that the intermediates trapped by GroEL assumed conformations with different hydrophobic properties. The presence of protein disulfide isomerase or reduced and oxidized forms of glutathione in addition to GroE greatly enhanced the refolding reaction of reduced TAA. These findings suggest that GroE has an ability to recognize folding intermediates of TAA protein and facilitate refolding, regardless of the existence or absence of disulfide bonds in the protein.

  8. Characterizing and annotating the genome using RNA-seq data.

    Science.gov (United States)

    Chen, Geng; Shi, Tieliu; Shi, Leming

    2017-02-01

    Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts (especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome- guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.

  9. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    Science.gov (United States)

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  10. PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.

    Science.gov (United States)

    Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai

    2012-02-15

    RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.

  11. A statistical method for the detection of alternative splicing using RNA-seq.

    Directory of Open Access Journals (Sweden)

    Liguo Wang

    2010-01-01

    Full Text Available Deep sequencing of transcriptome (RNA-seq provides unprecedented opportunity to interrogate plausible mRNA splicing patterns by mapping RNA-seq reads to exon junctions (thereafter junction reads. In most previous studies, exon junctions were detected by using the quantitative information of junction reads. The quantitative criterion (e.g. minimum of two junction reads, although is straightforward and widely used, usually results in high false positive and false negative rates, owning to the complexity of transcriptome. Here, we introduced a new metric, namely Minimal Match on Either Side of exon junction (MMES, to measure the quality of each junction read, and subsequently implemented an empirical statistical model to detect exon junctions. When applied to a large dataset (>200M reads consisting of mouse brain, liver and muscle mRNA sequences, and using independent transcripts databases as positive control, our method was proved to be considerably more accurate than previous ones, especially for detecting junctions originated from low-abundance transcripts. Our results were also confirmed by real time RT-PCR assay. The MMES metric can be used either in this empirical statistical model or in other more sophisticated classifiers, such as logistic regression.

  12. Porphyromonas gingivalis GroEL induces osteoclastogenesis of periodontal ligament cells and enhances alveolar bone resorption in rats.

    Directory of Open Access Journals (Sweden)

    Feng-Yen Lin

    Full Text Available Porphyromonas gingivalis is a major periodontal pathogen that contains a variety of virulence factors. The antibody titer to P. gingivalis GroEL, a homologue of HSP60, is significantly higher in periodontitis patients than in healthy control subjects, suggesting that P. gingivalis GroEL is a potential stimulator of periodontal disease. However, the specific role of GroEL in periodontal disease remains unclear. Here, we investigated the effect of P. gingivalis GroEL on human periodontal ligament (PDL cells in vitro, as well as its effect on alveolar bone resorption in rats in vivo. First, we found that stimulation of PDL cells with recombinant GroEL increased the secretion of the bone resorption-associated cytokines interleukin (IL-6 and IL-8, potentially via NF-κB activation. Furthermore, GroEL could effectively stimulate PDL cell migration, possibly through activation of integrin α1 and α2 mRNA expression as well as cytoskeletal reorganization. Additionally, GroEL may be involved in osteoclastogenesis via receptor activator of nuclear factor κ-B ligand (RANKL activation and alkaline phosphatase (ALP mRNA inhibition in PDL cells. Finally, we inoculated GroEL into rat gingiva, and the results of microcomputed tomography (micro-CT and histomorphometric assays indicated that the administration of GroEL significantly increased inflammation and bone loss. In conclusion, P. gingivalis GroEL may act as a potent virulence factor, contributing to osteoclastogenesis of PDL cells and resulting in periodontal disease with alveolar bone resorption.

  13. LookSeq: a browser-based viewer for deep sequencing data.

    Science.gov (United States)

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  14. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

    Directory of Open Access Journals (Sweden)

    Ram Vinay Pandey

    Full Text Available With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/

  15. Gro/TLE enables embryonic stem cell differentiation by repressing pluripotent gene expression

    DEFF Research Database (Denmark)

    Laing, Adam F; Lowell, Sally; Brickman, Joshua M

    2015-01-01

    Gro/TLE proteins (TLE1-4) are a family of transcriptional corepressors acting downstream of multiple signalling pathways. Several TLEs are expressed in a dynamic manner throughout embryonic development and at high levels in embryonic stem cells (ESCs). Here we find that Gro/TLE is not required...

  16. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

    Science.gov (United States)

    Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

    2016-03-01

    Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Purification, crystallization and structure determination of native GroEL from Escherichia coli lacking bound potassium ions

    International Nuclear Information System (INIS)

    Kiser, Philip D.; Lodowski, David T.; Palczewski, Krzysztof

    2007-01-01

    A 3.02 Å crystal structure of native GroEL from E. coli is presented. GroEL is a member of the ATP-dependent chaperonin family that promotes the proper folding of many cytosolic bacterial proteins. The structures of GroEL in a variety of different states have been determined using X-ray crystallography and cryo-electron microscopy. In this study, a 3.02 Å crystal structure of the native GroEL complex from Escherichia coli is presented. The complex was purified and crystallized in the absence of potassium ions, which allowed evaluation of the structural changes that may occur in response to cognate potassium-ion binding by comparison to the previously determined wild-type GroEL structure (PDB code http://www.rcsb.org/pdb/explore.do?structureId), in which potassium ions were observed in all 14 subunits. In general, the structure is similar to the previously determined wild-type GroEL crystal structure with some differences in regard to temperature-factor distribution

  18. Overexpression of heat shock GroEL stress protein in leptospiral biofilm.

    Science.gov (United States)

    Vinod Kumar, K; Lall, Chandan; Vimal Raj, R; Vedhagiri, K; Kartick, C; Surya, P; Natarajaseenivasan, K; Vijayachari, P

    2017-01-01

    Leptospira is the causative agent of leptospirosis, which is an emerging zoonotic disease. Recent studies on Leptospira have demonstrated biofilm formation on abiotic surfaces. The protein expressed in the biofilm was investigated by using SDS-PAGE and immunoblotting in combination with MALDI-TOF mass spectrometry. The proteins expressed in Leptospira biofilm and planktonic cells was analyzed and compared. Among these proteins, one (60 kDa) was found to overexpress in biofilm as compared to the planktonic cells. MALDI-TOF analysis identified this protein as stress and heat shock chaperone GroEL. Our findings demonstrate that GroEL is associated with Leptospira biofilm. GroEL is conserved, highly immunogenic and a prominent stress response protein in pathogenic Leptospira spp., which may have clinical relevance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. GRoW Buffalo Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Bohm, Martha [Univ. at Buffalo, NY (United States)

    2016-04-17

    This document provides final reporting on the GRoW Home, University at Buffalo's entry to the 2015 Solar Decathlon competition in Irvine, CA. The report summarizes fundraising efforts, documents media outreach, lists online presence, analyzes the organizer's communication, describes post-competition life of the house and future employment plans for student team members. Last, it suggests improvements for future decathlons.

  20. RNA-seq analysis of early hepatic response to handling and confinement stress in rainbow trout.

    Directory of Open Access Journals (Sweden)

    Sixin Liu

    Full Text Available Fish under intensive rearing conditions experience various stressors which have negative impacts on survival, growth, reproduction and fillet quality. Identifying and characterizing the molecular mechanisms underlying stress responses will facilitate the development of strategies that aim to improve animal welfare and aquaculture production efficiency. In this study, we used RNA-seq to identify transcripts which are differentially expressed in the rainbow trout liver in response to handling and confinement stress. These stressors were selected due to their relevance in aquaculture production. Total RNA was extracted from the livers of individual fish in five tanks having eight fish each, including three tanks of fish subjected to a 3 hour handling and confinement stress and two control tanks. Equal amount of total RNA of six individual fish was pooled by tank to create five RNA-seq libraries which were sequenced in one lane of Illumina HiSeq 2000. Three sequencing runs were conducted to obtain a total of 491,570,566 reads which were mapped onto the previously generated stress reference transcriptome to identify 316 differentially expressed transcripts (DETs. Twenty one DETs were selected for qPCR to validate the RNA-seq approach. The fold changes in gene expression identified by RNA-seq and qPCR were highly correlated (R(2 = 0.88. Several gene ontology terms including transcription factor activity and biological process such as glucose metabolic process were enriched among these DETs. Pathways involved in response to handling and confinement stress were implicated by mapping the DETs to reference pathways in the KEGG database.Raw RNA-seq reads have been submitted to the NCBI Short Read Archive under accession number SRP022881.All customized scripts described in this paper are available from Dr. Guangtu Gao or the corresponding author.

  1. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    Science.gov (United States)

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  2. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.

    Science.gov (United States)

    Evans, Ciaran; Hardin, Johanna; Stoebel, Daniel M

    2017-02-27

    RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  3. IsoSeq analysis and functional annotation of the infratentorial ependymoma tumor tissue on PacBio RSII platform.

    Science.gov (United States)

    Singh, Neetu; Sahu, Dinesh Kumar; Chowdhry, Rebecca; Mishra, Archana; Goel, Madhu Mati; Faheem, Mohd; Srivastava, Chhitij; Ojha, Bal Krishna; Gupta, Devendra Kumar; Kant, Ravi

    2016-02-01

    Here, we sequenced and functionally annotated the long reads (1-2 kb) cDNAs library of an infratentorial ependymoma tumor tissue on PacBio RSII by Iso-Seq protocol using SMRT technology. 577 MB, data was generated from the brain tissues of ependymoma tumor patient, producing 1,19,313 high-quality reads assembled into 19,878 contigs using Celera assembler followed by Quiver pipelines, which produced 2952 unique protein accessions in the nr protein database and 307 KEGG pathways. Additionally, when we compared GO terms of second and third level with alternative splicing data obtained through HTA Array2.0. We identified four and twelve transcript cluster IDs in Level-2 and Level-3 scores respectively with alternative splicing index predicting mainly the major pathways of hallmarks of cancer. Out of these transcript cluster IDs only transcript cluster IDs of gene PNMT, SNN and LAMB1 showed Reads Per Kilobase of exon model per Million mapped reads (RPKM) values at gene-level expression (GE) and transcript-level (TE) track. Most importantly, brain-specific genes--PNMT, SNN and LAMB1 show their involvement in Ependymoma.

  4. JVM: Java Visual Mapping tool for next generation sequencing read.

    Science.gov (United States)

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  5. SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq Data

    Directory of Open Access Journals (Sweden)

    Yuxiang Tan

    2015-01-01

    Full Text Available The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

  6. A New Improved and Extended Version of the Multicell Bacterial Simulator gro.

    Science.gov (United States)

    Gutiérrez, Martín; Gregorio-Godoy, Paula; Pérez Del Pulgar, Guillermo; Muñoz, Luis E; Sáez, Sandra; Rodríguez-Patón, Alfonso

    2017-08-18

    gro is a cell programming language developed in Klavins Lab for simulating colony growth and cell-cell communication. It is used as a synthetic biology prototyping tool for simulating multicellular biocircuits and microbial consortia. In this work, we present several extensions made to gro that improve the performance of the simulator, make it easier to use, and provide new functionalities. The new version of gro is between 1 and 2 orders of magnitude faster than the original version. It is able to grow microbial colonies with up to 10 5 cells in less than 10 min. A new library, CellEngine, accelerates the resolution of spatial physical interactions between growing and dividing cells by implementing a new shoving algorithm. A genetic library, CellPro, based on Probabilistic Timed Automata, simulates gene expression dynamics using simplified and easy to compute digital proteins. We also propose a more convenient language specification layer, ProSpec, based on the idea that proteins drive cell behavior. CellNutrient, another library, implements Monod-based growth and nutrient uptake functionalities. The intercellular signaling management was improved and extended in a library called CellSignals. Finally, bacterial conjugation, another local cell-cell communication process, was added to the simulator. To show the versatility and potential outreach of this version of gro, we provide studies and novel examples ranging from synthetic biology to evolutionary microbiology. We believe that the upgrades implemented for gro have made it into a powerful and fast prototyping tool capable of simulating a large variety of systems and synthetic biology designs.

  7. Mining RNA-seq data for infections and contaminations.

    Directory of Open Access Journals (Sweden)

    Thomas Bonfert

    Full Text Available RNA sequencing (RNA-seq provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

  8. COBRA-Seq: Sensitive and Quantitative Methylome Profiling

    Directory of Open Access Journals (Sweden)

    Hilal Varinli

    2015-10-01

    Full Text Available Combined Bisulfite Restriction Analysis (COBRA quantifies DNA methylation at a specific locus. It does so via digestion of PCR amplicons produced from bisulfite-treated DNA, using a restriction enzyme that contains a cytosine within its recognition sequence, such as TaqI. Here, we introduce COBRA-seq, a genome wide reduced methylome method that requires minimal DNA input (0.1–1.0 mg and can either use PCR or linear amplification to amplify the sequencing library. Variants of COBRA-seq can be used to explore CpG-depleted as well as CpG-rich regions in vertebrate DNA. The choice of enzyme influences enrichment for specific genomic features, such as CpG-rich promoters and CpG islands, or enrichment for less CpG dense regions such as enhancers. COBRA-seq coupled with linear amplification has the additional advantage of reduced PCR bias by producing full length fragments at high abundance. Unlike other reduced representative methylome methods, COBRA-seq has great flexibility in the choice of enzyme and can be multiplexed and tuned, to reduce sequencing costs and to interrogate different numbers of sites. Moreover, COBRA-seq is applicable to non-model organisms without the reference genome and compatible with the investigation of non-CpG methylation by using restriction enzymes containing CpA, CpT, and CpC in their recognition site.

  9. Folding and unfolding pathway of chaperonin GroEL monomer and elucidation of thermodynamic parameters.

    Science.gov (United States)

    Puri, Sarita; Chaudhuri, Tapan K

    2017-03-01

    The conformation and thermodynamic stability of monomeric GroEL were studied by CD and fluorescence spectroscopy. GroEL denaturation with urea and dilution in buffer leads to formation of a folded GroEL monomer. The monomeric nature of this protein was verified by size-exclusion chromatography and native PAGE. It has a well-defined secondary and tertiary structure, folding activity (prevention of aggregation) for substrate protein and is resistant to proteolysis. Being a properly folded and reversibly refoldable, monomeric GroEL is amenable for the study of thermodynamic stability by unfolding transition methods. We present the equilibrium unfolding of monomeric GroEL as studied by urea and heat mediated unfolding processes. The urea mediated unfolding shows two transitions and a single transition in the heat mediated unfolding process. In the case of thermal unfolding, some residual structure unfolds at a higher temperature (70-75°C). The process of folding/unfolding is reversible in both cases. Analysis of folding/unfolding data provides a measure of ΔG NU H 2 O , T m , ΔH van and ΔS van of monomeric GroEL. The thermodynamic stability parameter ΔG NU H 2 O is similar with both CD and intrinsic fluorescence i.e. 7.10±1.0kcal/mol. The calculated T m , ΔH van and ΔS van from the thermal unfolding transition is 46±0.5°C, 43.3±0.1kcal/mol and 143.9±0.1cal/mol/k respectively. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology.

    Science.gov (United States)

    Pareek, Chandra Shekhar; Smoczyński, Rafał; Kadarmideen, Haja N; Dziuba, Piotr; Błaszczyk, Paweł; Sikora, Marcin; Walendzik, Paulina; Grzybowski, Tomasz; Pierzchała, Mariusz; Horbańczuk, Jarosław; Szostak, Agnieszka; Ogluszka, Magdalena; Zwierzchowski, Lech; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Wąsowicz, Krzysztof; Gelfand, Brian; Feng, Yaping; Kumar, Dibyendu

    2016-01-01

    Examination of bovine pituitary gland transcriptome by strand-specific RNA-seq allows detection of putative single nucleotide polymorphisms (SNPs) within potential candidate genes (CGs) or QTLs regions as well as to understand the genomics variations that contribute to economic trait. Here we report a breed-specific model to successfully perform the detection of SNPs in the pituitary gland of young growing bulls representing Polish Holstein-Friesian (HF), Polish Red, and Hereford breeds at three developmental ages viz., six months, nine months, and twelve months. A total of 18 bovine pituitary gland polyA transcriptome libraries were prepared and sequenced using the Illumina NextSeq 500 platform. Sequenced FastQ databases of all 18 young bulls were submitted to NCBI-SRA database with NCBI-SRA accession numbers SRS1296732. For the investigated young bulls, a total of 113,882,3098 raw paired-end reads with a length of 156 bases were obtained, resulting in an approximately 63 million paired-end reads per library. Breed-wise, a total of 515.38, 215.39, and 408.04 million paired-end reads were obtained for Polish HF, Polish Red, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed 93.04%, 94.39%, and 83.46% of the mapped sequencing reads were properly paired to the Polish HF, Polish Red, and Hereford breeds, respectively. Constructed breed-specific SNP-db of three cattle breeds yielded at 13,775,885 SNPs. On an average 765,326 breed-specific SNPs per young bull were identified. Using two stringent filtering parameters, i.e., a minimum 10 SNP reads per base with an accuracy ≥ 90% and a minimum 10 SNP reads per base with an accuracy = 100%, SNP-db records were trimmed to construct a highly reliable SNP-db. This resulted in a reduction of 95,7% and 96,4% cut-off mark of constructed raw SNP-db. Finally, SNP discoveries using RNA-Seq data were validated by KASP™ SNP genotyping assay. The comprehensive QTLs/CGs analysis of 76 QTLs

  11. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology.

    Directory of Open Access Journals (Sweden)

    Chandra Shekhar Pareek

    Full Text Available Examination of bovine pituitary gland transcriptome by strand-specific RNA-seq allows detection of putative single nucleotide polymorphisms (SNPs within potential candidate genes (CGs or QTLs regions as well as to understand the genomics variations that contribute to economic trait. Here we report a breed-specific model to successfully perform the detection of SNPs in the pituitary gland of young growing bulls representing Polish Holstein-Friesian (HF, Polish Red, and Hereford breeds at three developmental ages viz., six months, nine months, and twelve months. A total of 18 bovine pituitary gland polyA transcriptome libraries were prepared and sequenced using the Illumina NextSeq 500 platform. Sequenced FastQ databases of all 18 young bulls were submitted to NCBI-SRA database with NCBI-SRA accession numbers SRS1296732. For the investigated young bulls, a total of 113,882,3098 raw paired-end reads with a length of 156 bases were obtained, resulting in an approximately 63 million paired-end reads per library. Breed-wise, a total of 515.38, 215.39, and 408.04 million paired-end reads were obtained for Polish HF, Polish Red, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA read alignments showed 93.04%, 94.39%, and 83.46% of the mapped sequencing reads were properly paired to the Polish HF, Polish Red, and Hereford breeds, respectively. Constructed breed-specific SNP-db of three cattle breeds yielded at 13,775,885 SNPs. On an average 765,326 breed-specific SNPs per young bull were identified. Using two stringent filtering parameters, i.e., a minimum 10 SNP reads per base with an accuracy ≥ 90% and a minimum 10 SNP reads per base with an accuracy = 100%, SNP-db records were trimmed to construct a highly reliable SNP-db. This resulted in a reduction of 95,7% and 96,4% cut-off mark of constructed raw SNP-db. Finally, SNP discoveries using RNA-Seq data were validated by KASP™ SNP genotyping assay. The comprehensive QTLs/CGs analysis

  12. GC-Content Normalization for RNA-Seq Data

    Science.gov (United States)

    2011-01-01

    Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264

  13. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.

    Science.gov (United States)

    Zhao, Shanrong; Xi, Li; Quan, Jie; Xi, Hualin; Zhang, Ying; von Schack, David; Vincent, Michael; Zhang, Baohong

    2016-01-08

    RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists. By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines. The high degree

  14. Cloning, expression, and homology modeling of GroEL protein from Leptospira interrogans serovar autumnalis strain N2.

    Science.gov (United States)

    Natarajaseenivasan, Kalimuthusamy; Shanmughapriya, Santhanam; Velineni, Sridhar; Artiushin, Sergey C; Timoney, John F

    2011-10-01

    Leptospirosis is an infectious bacterial disease caused by Leptospira species. In this study, we cloned and sequenced the gene encoding the immunodominant protein GroEL from L. interrogans serovar Autumnalis strain N2, which was isolated from the urine of a patient during an outbreak of leptospirosis in Chennai, India. This groEL gene encodes a protein of 60 kDa with a high degree of homology (99% similarity) to those of other leptospiral serovars. Recombinant GroEL was overexpressed in Escherichia coli. Immunoblot analysis indicated that the sera from confirmed leptospirosis patients showed strong reactivity with the recombinant GroEL while no reactivity was observed with the sera from seronegative control patient. In addition, the 3D structure of GroEL was constructed using chaperonin complex cpn60 from Thermus thermophilus as template and validated. The results indicated a Z-score of -8.35, which is in good agreement with the expected value for a protein. The superposition of the Ca traces of cpn60 structure and predicted structure of leptospiral GroEL indicates good agreement of secondary structure elements with an RMSD value of 1.5 Å. Further study is necessary to evaluate GroEL for serological diagnosis of leptospirosis and for its potential as a vaccine component. Copyright © 2011 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

  15. Prediction of Poly(A Sites by Poly(A Read Mapping.

    Directory of Open Access Journals (Sweden)

    Thomas Bonfert

    Full Text Available RNA-seq reads containing part of the poly(A tail of transcripts (denoted as poly(A reads provide the most direct evidence for the position of poly(A sites in the genome. However, due to reduced coverage of poly(A tails by reads, poly(A reads are not routinely identified during RNA-seq mapping. Nevertheless, recent studies for several herpesviruses successfully employed mapping of poly(A reads to identify herpesvirus poly(A sites using different strategies and customized programs. To more easily allow such analyses without requiring additional programs, we integrated poly(A read mapping and prediction of poly(A sites into our RNA-seq mapping program ContextMap 2. The implemented approach essentially generalizes previously used poly(A read mapping approaches and combines them with the context-based approach of ContextMap 2 to take into account information provided by other reads aligned to the same location. Poly(A read mapping using ContextMap 2 was evaluated on real-life data from the ENCODE project and compared against a competing approach based on transcriptome assembly (KLEAT. This showed high positive predictive value for our approach, evidenced also by the presence of poly(A signals, and considerably lower runtime than KLEAT. Although sensitivity is low for both methods, we show that this is in part due to a high extent of spurious results in the gold standard set derived from RNA-PET data. Sensitivity improves for poly(A sites of known transcripts or determined with a more specific poly(A sequencing protocol and increases with read coverage on transcript ends. Finally, we illustrate the usefulness of the approach in a high read coverage scenario by a re-analysis of published data for herpes simplex virus 1. Thus, with current trends towards increasing sequencing depth and read length, poly(A read mapping will prove to be increasingly useful and can now be performed automatically during RNA-seq mapping with ContextMap 2.

  16. Induction of heat shock proteins DnaK, GroEL, and GroES by salt stress in Lactococcus lactis

    DEFF Research Database (Denmark)

    Kilstrup, Mogens; Jacobsen, Susanne; Hammer, Karin

    1997-01-01

    The bacterium Lactococcus lactis has become a model organism in studies of growth physiology and membrane transport, as a result of its simple fermentative metabolism. It is also used as a model for studying the importance of specific genes and functions during lie in excess nutrients, by compari...... the timing during heat stress although at a lower induction level. These data indicate an overlap between the heat shock and salt stress responses in L. lactis......., by comparison of prototrophic wild-type strains and auxotrophic domesticated (daily) strains. In a study of the capacity of domesticated strains to perform directed responses toward various stress conditions, we have analyzed the heat and salt stress response in the established L,. lactis subsp. cremoris...... laboratory strain MG1363, which was originally derived from a dairy strain, After two-dimensional separation of proteins, the DnaK, GroEL, and GroES heat shock proteins, the HrcA (Orf1) heat shack repressor, and the glycolytic enzymes pyruvate kinase, glyceral-dehyde-3-phosphate dehydrogenase...

  17. RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.

    Science.gov (United States)

    Habegger, Lukas; Sboner, Andrea; Gianoulis, Tara A; Rozowsky, Joel; Agarwal, Ashish; Snyder, Michael; Gerstein, Mark

    2011-01-15

    The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.

  18. GroEL1, a heat shock protein 60 of Chlamydia pneumoniae, impairs neovascularization by decreasing endothelial progenitor cell function.

    Directory of Open Access Journals (Sweden)

    Yi-Wen Lin

    Full Text Available The number and function of endothelial progenitor cells (EPCs are sensitive to hyperglycemia, hypertension, and smoking in humans, which are also associated with the development of atherosclerosis. GroEL1 from Chlamydia pneumoniae has been found in atherosclerotic lesions and is related to atherosclerotic pathogenesis. However, the actual effects of GroEL1 on EPC function are unclear. In this study, we investigate the EPC function in GroEL1-administered hind limb-ischemic C57BL/B6 and C57BL/10ScNJ (a toll-like receptor 4 (TLR4 mutation mice and human EPCs. In mice, laser Doppler imaging, flow cytometry, and immunohistochemistry were used to evaluate the degree of neo-vasculogenesis, circulating level of EPCs, and expression of CD34, vWF, and endothelial nitric oxide synthase (eNOS in vessels. Blood flow in the ischemic limb was significantly impaired in C57BL/B6 but not C57BL/10ScNJ mice treated with GroEL1. Circulating EPCs were also decreased after GroEL1 administration in C57BL/B6 mice. Additionally, GroEL1 inhibited the expression of CD34 and eNOS in C57BL/B6 ischemic muscle. In vitro, GroEL1 impaired the capacity of differentiation, mobilization, tube formation, and migration of EPCs. GroEL1 increased senescence, which was mediated by caspases, p38 MAPK, and ERK1/2 signaling in EPCs. Furthermore, GroEL1 decreased integrin and E-selectin expression and induced inflammatory responses in EPCs. In conclusion, these findings suggest that TLR4 and impaired NO-related mechanisms could contribute to the reduced number and functional activity of EPCs in the presence of GroEL1 from C. pneumoniae.

  19. Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.

    Science.gov (United States)

    Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart

    2014-01-15

    High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter

  20. Schmitt’s Theory of Großraum: A Post-Statal Perspective?

    Directory of Open Access Journals (Sweden)

    Antonino Scalone

    2017-07-01

    Full Text Available The article analyses some aspects of Schmitt’s theories on international law, in particular the notion of Großraum. The assumption is that with this notion Schmitt tries to re-think politics and international relations beyond the classical categories of the State. From this point of view, there is an essential affinity between the concept of politics (Begriff des Politischen explained in the homonymous essay published in 1927 and the concept of Großraum. After the Second World War, Schmitt distances himself from the notion of Großraum, too close to the nazi theories, and focuses on that of nomos, explained in Der Nomos der Erde (1950, and on the crisis of the Jus publicum europaeum. However, Schmitt fails to define which system of international relations should follow the Jus publicum europaeum and the same notion of nomos remains rather undefined. In the last part of the paper the author compares Schmitt’s theories about international law and some theories of Kelsen, with particular reference to the theory of bellum justum.

  1. Sentence‐Chain Based Seq2seq Model for Corpus Expansion

    Directory of Open Access Journals (Sweden)

    Euisok Chung

    2017-08-01

    Full Text Available This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence‐chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4‐times the number of n‐grams with superior performance for English text.

  2. Confirmation of a high magnetic field in GRO J1008-57

    DEFF Research Database (Denmark)

    Bellm, Eric C.; Fuerst, Felix; Pottschmidt, Katja

    2014-01-01

    GRO J1008-57 is a high-mass X-ray binary for which several claims of a cyclotron resonance scattering feature near 80 keV have been reported. We use NuSTAR, Suzaku, and Swift data from its giant outburst of 2012 November to confirm the existence of the 80 keV feature and perform the most sensitiv...... a fundamental at lower energies with optical depth larger than 5% of the 78 keV line. These results indicate that GRO J1008-57 has a magnetic field of 6.7 x 10(12)(1 + z) G, the highest among known accreting pulsars....

  3. Developing methodological awareness of reading, thinking and writing as knowledge producing practices

    DEFF Research Database (Denmark)

    Katan, Lina Hauge; Baarts, Charlotte

    Developing methodological awareness among university students about reading, thinking and writing as knowledge producing practices Integrated acts of reading, thinking and writing comprise an extensive and extremely significant part of the learning processes through which we produce knowledge...... text books on method and classes too. As a consequence students have few chances of encountering the practices of reading, thinking and writing depicted as those imperative parts of knowledge making that we as researchers of the humanities and social sciences know them to be. Subsequently students...... are not taught to understand reading, thinking and writing as central practices of research nor do they come to develop methododological awareness about them as such. In this paper, we report from our endavour into designing and developing a course offered for under- and graduate students, with the aim...

  4. ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data.

    Science.gov (United States)

    Ou, Jianhong; Liu, Haibo; Yu, Jun; Kelliher, Michelle A; Castilla, Lucio H; Lawson, Nathan D; Zhu, Lihua Julie

    2018-03-01

    ATAC-seq (Assays for Transposase-Accessible Chromatin using sequencing) is a recently developed technique for genome-wide analysis of chromatin accessibility. Compared to earlier methods for assaying chromatin accessibility, ATAC-seq is faster and easier to perform, does not require cross-linking, has higher signal to noise ratio, and can be performed on small cell numbers. However, to ensure a successful ATAC-seq experiment, step-by-step quality assurance processes, including both wet lab quality control and in silico quality assessment, are essential. While several tools have been developed or adopted for assessing read quality, identifying nucleosome occupancy and accessible regions from ATAC-seq data, none of the tools provide a comprehensive set of functionalities for preprocessing and quality assessment of aligned ATAC-seq datasets. We have developed a Bioconductor package, ATACseqQC, for easily generating various diagnostic plots to help researchers quickly assess the quality of their ATAC-seq data. In addition, this package contains functions to preprocess aligned ATAC-seq data for subsequent peak calling. Here we demonstrate the utilities of our package using 25 publicly available ATAC-seq datasets from four studies. We also provide guidelines on what the diagnostic plots should look like for an ideal ATAC-seq dataset. This software package has been used successfully for preprocessing and assessing several in-house and public ATAC-seq datasets. Diagnostic plots generated by this package will facilitate the quality assessment of ATAC-seq data, and help researchers to evaluate their own ATAC-seq experiments as well as select high-quality ATAC-seq datasets from public repositories such as GEO to avoid generating hypotheses or drawing conclusions from low-quality ATAC-seq experiments. The software, source code, and documentation are freely available as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html .

  5. LiGRO: a graphical user interface for protein-ligand molecular dynamics.

    Science.gov (United States)

    Kagami, Luciano Porto; das Neves, Gustavo Machado; da Silva, Alan Wilter Sousa; Caceres, Rafael Andrade; Kawano, Daniel Fábio; Eifler-Lima, Vera Lucia

    2017-10-04

    To speed up the drug-discovery process, molecular dynamics (MD) calculations performed in GROMACS can be coupled to docking simulations for the post-screening analyses of large compound libraries. This requires generating the topology of the ligands in different software, some basic knowledge of Linux command lines, and a certain familiarity in handling the output files. LiGRO-the python-based graphical interface introduced here-was designed to overcome these protein-ligand parameterization challenges by allowing the graphical (non command line-based) control of GROMACS (MD and analysis), ACPYPE (ligand topology builder) and PLIP (protein-binder interactions monitor)-programs that can be used together to fully perform and analyze the outputs of complex MD simulations (including energy minimization and NVT/NPT equilibration). By allowing the calculation of linear interaction energies in a simple and quick fashion, LiGRO can be used in the drug-discovery pipeline to select compounds with a better protein-binding interaction profile. The design of LiGRO allows researchers to freely download and modify the software, with the source code being available under the terms of a GPLv3 license from http://www.ufrgs.br/lasomfarmacia/ligro/ .

  6. OccuPeak: ChIP-Seq peak calling based on internal background modelling

    NARCIS (Netherlands)

    de Boer, Bouke A.; van Duijvenboden, Karel; van den Boogaard, Malou; Christoffels, Vincent M.; Barnett, Phil; Ruijter, Jan M.

    2014-01-01

    ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the

  7. Discovery of a Nonblazar Gamma-Ray Transient Source Near the Galactic Plane: GRO J1838-04

    Science.gov (United States)

    Tavani, M.; Oliversen, Ronald (Technical Monitor)

    2001-01-01

    We report the discovery of a remarkable gamma-ray transient source near the Galactic plane, GRO J1838-04. This source was serendipitously discovered by EGRET in 1995 June with a peak intensity of approx. (4 +/- 1) x 10(exp -6) photons/sq cm s (for photon energies larger than 100 MeV) and a 5.9 sigma significance. At that time, GRO J1838-04 was the second brightest gamma-ray source in the sky. A subsequent EGRET pointing in 1995 late September detected the source at a flux smaller than its peak value by a factor of approx. 7. We determine that no radio-loud spectrally flat blazar is within the error box of GRO J1838-04. We discuss the origin of the gamma-ray transient source and show that interpretations in terms of active galactic nuclei or isolated pulsars are highly problematic. GRO J1838-04 provides strong evidence for the existence of a new class of variable gamma-ray sources.

  8. Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species

    Directory of Open Access Journals (Sweden)

    Hornett Emily A

    2012-08-01

    Full Text Available Abstract Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.

  9. CASSys: an integrated software-system for the interactive analysis of ChIP-seq data

    Directory of Open Access Journals (Sweden)

    Alawi Malik

    2011-06-01

    Full Text Available The mapping of DNA-protein interactions is crucial for a full understanding of transcriptional regulation. Chromatin-immunoprecipitation followed bymassively parallel sequencing (ChIP-seq has become the standard technique for analyzing these interactions on a genome-wide scale. We have developed a software system called CASSys (ChIP-seq data Analysis Software System spanning all steps of ChIP-seq data analysis. It supersedes the laborious application of several single command line tools. CASSys provides functionality ranging from quality assessment and -control of short reads, over the mapping of reads against a reference genome (readmapping and the detection of enriched regions (peakdetection to various follow-up analyses. The latter are accessible via a state-of-the-art web interface and can be performed interactively by the user. The follow-up analyses allow for flexible user defined association of putative interaction sites with genes, visualization of their genomic context with an integrated genome browser, the detection of putative binding motifs, the identification of over-represented Gene Ontology-terms, pathway analysis and the visualization of interaction networks. The system is client-server based, accessible via a web browser and does not require any software installation on the client side. To demonstrate CASSys’s functionality we used the system for the complete data analysis of a publicly available Chip-seq study that investigated the role of the transcription factor estrogen receptor-α in breast cancer cells.

  10. A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

    Directory of Open Access Journals (Sweden)

    Mickael Orgeur

    2018-01-01

    Full Text Available The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

  11. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs.

    Directory of Open Access Journals (Sweden)

    Ru Huang

    Full Text Available Imprinted macro non-protein-coding (nc RNAs are cis-repressor transcripts that silence multiple genes in at least three imprinted gene clusters in the mouse genome. Similar macro or long ncRNAs are abundant in the mammalian genome. Here we present the full coding and non-coding transcriptome of two mouse tissues: differentiated ES cells and fetal head using an optimized RNA-Seq strategy. The data produced is highly reproducible in different sequencing locations and is able to detect the full length of imprinted macro ncRNAs such as Airn and Kcnq1ot1, whose length ranges between 80-118 kb. Transcripts show a more uniform read coverage when RNA is fragmented with RNA hydrolysis compared with cDNA fragmentation by shearing. Irrespective of the fragmentation method, all coding and non-coding transcripts longer than 8 kb show a gradual loss of sequencing tags towards the 3' end. Comparisons to published RNA-Seq datasets show that the strategy presented here is more efficient in detecting known functional imprinted macro ncRNAs and also indicate that standardization of RNA preparation protocols would increase the comparability of the transcriptome between different RNA-Seq datasets.

  12. Substantial differences in bias between single-digest and double-digest RAD-seq libraries: A case study.

    Science.gov (United States)

    Flanagan, Sarah P; Jones, Adam G

    2018-03-01

    The trade-offs of using single-digest vs. double-digest restriction site-associated DNA sequencing (RAD-seq) protocols have been widely discussed. However, no direct empirical comparisons of the two methods have been conducted. Here, we sampled a single population of Gulf pipefish (Syngnathus scovelli) and genotyped 444 individuals using RAD-seq. Sixty individuals were subjected to single-digest RAD-seq (sdRAD-seq), and the remaining 384 individuals were genotyped using a double-digest RAD-seq (ddRAD-seq) protocol. We analysed the resulting Illumina sequencing data and compared the two genotyping methods when reads were analysed either together or separately. Coverage statistics, observed heterozygosity, and allele frequencies differed significantly between the two protocols, as did the results of selection components analysis. We also performed an in silico digestion of the Gulf pipefish genome and modelled five major sources of bias: PCR duplicates, polymorphic restriction sites, shearing bias, asymmetric sampling (i.e., genotyping fewer individuals with sdRAD-seq than with ddRAD-seq) and higher major allele frequencies. This combination of approaches allowed us to determine that polymorphic restriction sites, an asymmetric sampling scheme, mean allele frequencies and to some extent PCR duplicates all contribute to different estimates of allele frequencies between samples genotyped using sdRAD-seq versus ddRAD-seq. Our finding that sdRAD-seq and ddRAD-seq can result in different allele frequencies has implications for comparisons across studies and techniques that endeavour to identify genomewide signatures of evolutionary processes in natural populations. © 2017 John Wiley & Sons Ltd.

  13. Impact of artefact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data.

    Directory of Open Access Journals (Sweden)

    Thomas Samuel Carroll

    2014-04-01

    Full Text Available With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium’s large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency.

  14. Guidance for RNA-seq co-expression network construction and analysis: safety in numbers.

    Science.gov (United States)

    Ballouz, S; Verleyen, W; Gillis, J

    2015-07-01

    RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks. We examine RNA-seq co-expression data generated from 1970 RNA-seq samples using a Guilt-By-Association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were >20 samples with read depth >10 M per sample. While the aggregate network constructed shows good performance (area under the receiver operator characteristic curve ∼0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples are required to obtain 'gold-standard' co-expression. We find a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology. jgillis@cshl.edu or sballouz@cshl.edu Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/ and supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Boiler: lossy compression of RNA-seq alignments using coverage vectors.

    Science.gov (United States)

    Pritt, Jacob; Langmead, Ben

    2016-09-19

    We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. A Comparison and Integration of MiSeq and MinION Platforms for Sequencing Single Source and Mixed Mitochondrial Genomes.

    Directory of Open Access Journals (Sweden)

    Michael R Lindberg

    Full Text Available Single source and multiple donor (mixed samples of human mitochondrial DNA were analyzed and compared using the MinION and the MiSeq platforms. A generalized variant detection strategy was employed to provide a cursory framework for evaluating the reliability and accuracy of mitochondrial sequences produced by the MinION. The feasibility of long-read phasing was investigated to establish its efficacy in quantitatively distinguishing and deconvolving individuals in a mixture. Finally, a proof-of-concept was demonstrated by integrating both platforms in a hybrid assembly that leverages solely mixture data to accurately reconstruct full mitochondrial genomes.

  17. SeqEntropy: genome-wide assessment of repeats for short read sequencing.

    Directory of Open Access Journals (Sweden)

    Hsueh-Ting Chu

    Full Text Available BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9 bp and 320 bp for the sequencing of fruit fly (1.8×10(8 bp. We also calculated the ΔH(k scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.

  18. Quantification of Human Fecal Bifidobacterium Species by Use of Quantitative Real-Time PCR Analysis Targeting the groEL Gene

    Science.gov (United States)

    Junick, Jana

    2012-01-01

    Quantitative real-time PCR assays targeting the groEL gene for the specific enumeration of 12 human fecal Bifidobacterium species were developed. The housekeeping gene groEL (HSP60 in eukaryotes) was used as a discriminative marker for the differentiation of Bifidobacterium adolescentis, B. angulatum, B. animalis, B. bifidum, B. breve, B. catenulatum, B. dentium, B. gallicum, B. longum, B. pseudocatenulatum, B. pseudolongum, and B. thermophilum. The bifidobacterial chromosome contains a single copy of the groEL gene, allowing the determination of the cell number by quantification of the groEL copy number. Real-time PCR assays were validated by comparing fecal samples spiked with known numbers of a given Bifidobacterium species. Independent of the Bifidobacterium species tested, the proportion of groEL copies recovered from fecal samples spiked with 5 to 9 log10 cells/g feces was approximately 50%. The quantification limit was 5 to 6 log10 groEL copies/g feces. The interassay variability was less than 10%, and variability between different DNA extractions was less than 23%. The method developed was applied to fecal samples from healthy adults and full-term breast-fed infants. Bifidobacterial diversity in both adults and infants was low, with mostly ≤3 Bifidobacterium species and B. longum frequently detected. The predominant species in infant and adult fecal samples were B. breve and B. adolescentis, respectively. It was possible to distinguish B. catenulatum and B. pseudocatenulatum. We conclude that the groEL gene is a suitable molecular marker for the specific and accurate quantification of human fecal Bifidobacterium species by real-time PCR. PMID:22307308

  19. SERE: single-parameter quality control and sample comparison for RNA-Seq.

    Science.gov (United States)

    Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S

    2012-10-03

    Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

  20. High-specificity detection of rare alleles with Paired-End Low Error Sequencing (PELE-Seq).

    Science.gov (United States)

    Preston, Jessica L; Royall, Ariel E; Randel, Melissa A; Sikkink, Kristin L; Phillips, Patrick C; Johnson, Eric A

    2016-06-14

    Polymorphic loci exist throughout the genomes of a population and provide the raw genetic material needed for a species to adapt to changes in the environment. The minor allele frequencies of rare Single Nucleotide Polymorphisms (SNPs) within a population have been difficult to track with Next-Generation Sequencing (NGS), due to the high error rate of standard methods such as Illumina sequencing. We have developed a wet-lab protocol and variant-calling method that identifies both sequencing and PCR errors, called Paired-End Low Error Sequencing (PELE-Seq). To test the specificity and sensitivity of the PELE-Seq method, we sequenced control E. coli DNA libraries containing known rare alleles present at frequencies ranging from 0.2-0.4 % of the total reads. PELE-Seq had higher specificity and sensitivity than standard libraries. We then used PELE-Seq to characterize rare alleles in a Caenorhabditis remanei nematode worm population before and after laboratory adaptation, and found that minor and rare alleles can undergo large changes in frequency during lab-adaptation. We have developed a method of rare allele detection that mitigates both sequencing and PCR errors, called PELE-Seq. PELE-Seq was evaluated using control E. coli populations and was then used to compare a wild C. remanei population to a lab-adapted population. The PELE-Seq method is ideal for investigating the dynamics of rare alleles in a broad range of reduced-representation sequencing methods, including targeted amplicon sequencing, RAD-Seq, ddRAD, and GBS. PELE-Seq is also well-suited for whole genome sequencing of mitochondria and viruses, and for high-throughput rare mutation screens.

  1. SeqBox: RNAseq/ChIPseq reproducible analysis on a consumer game computer.

    Science.gov (United States)

    Beccuti, Marco; Cordero, Francesca; Arigoni, Maddalena; Panero, Riccardo; Amparore, Elvio G; Donatelli, Susanna; Calogero, Raffaele A

    2018-03-01

    Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analysis also to scientists with/without scripting experience. Docker container images, docker4seq package and the GUI are available at http://www.bioinformatica.unito.it/reproducibile.bioinformatics.html. beccuti@di.unito.it. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  2. A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads

    Directory of Open Access Journals (Sweden)

    Stanley Kimbung Mbandi

    2014-02-01

    Full Text Available Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score base filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms.

  3. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Zhipeng Su

    Full Text Available Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs, UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures. The transcriptional units

  4. RNA-seq: technical variability and sampling

    Science.gov (United States)

    2011-01-01

    Background RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases. PMID:21645359

  5. A sensitive short read homology search tool for paired-end read sequencing data.

    Science.gov (United States)

    Techa-Angkoon, Prapaporn; Sun, Yanni; Lei, Jikai

    2017-10-16

    Homology search is still a significant step in functional analysis for genomic data. Profile Hidden Markov Model-based homology search has been widely used in protein domain analysis in many different species. In particular, with the fast accumulation of transcriptomic data of non-model species and metagenomic data, profile homology search is widely adopted in integrated pipelines for functional analysis. While the state-of-the-art tool HMMER has achieved high sensitivity and accuracy in domain annotation, the sensitivity of HMMER on short reads declines rapidly. The low sensitivity on short read homology search can lead to inaccurate domain composition and abundance computation. Our experimental results showed that half of the reads were missed by HMMER for a RNA-Seq dataset. Thus, there is a need for better methods to improve the homology search performance for short reads. We introduce a profile homology search tool named Short-Pair that is designed for short paired-end reads. By using an approximate Bayesian approach employing distribution of fragment lengths and alignment scores, Short-Pair can retrieve the missing end and determine true domains. In particular, Short-Pair increases the accuracy in aligning short reads that are part of remote homologs. We applied Short-Pair to a RNA-Seq dataset and a metagenomic dataset and quantified its sensitivity and accuracy on homology search. The experimental results show that Short-Pair can achieve better overall performance than the state-of-the-art methodology of profile homology search. Short-Pair is best used for next-generation sequencing (NGS) data that lack reference genomes. It provides a complementary paired-end read homology search tool to HMMER. The source code is freely available at https://sourceforge.net/projects/short-pair/ .

  6. Immobilization of cadmium in soils by UV-mutated Bacillus subtilis 38 bioaugmentation and NovoGro amendment

    International Nuclear Information System (INIS)

    Jiang Chunxiao; Sun Hongwen; Sun Tieheng; Zhang Qingmin; Zhang Yanfeng

    2009-01-01

    Immobilization of cadmium (10 mg Cd per kilogram soil) in soil by bioaugmentation of a UV-mutated microorganism, Bacillus subtilis 38 accompanied with amendment of a bio-fertilizer, NovoGro was investigated using extractable cadmium (E-Cd) by DTPA. B. subtilis 38, the mutant with the strongest resistance against Cd, could bioaccumulate Cd four times greater than the original wild type. Single bioaugmentation of B. subtilis 38 (SB treatment) to soil however did not reduce E-Cd significantly, while the amendment of NovoGro (SN treatment) reduced E-Cd remarkably. Simultaneous application of B. subtilis 38 and NovoGro (SNB treatment) exhibited a synergetic effect compared to the single SB and SN treatment. The immobilization effect was significantly affected by temperature, soil moisture, and pH. It seems that the immobilization on Cd reached the maximum when environmental conditions favored the activity of microorganisms. Under the optimum conditions, after 90 days incubation, E-Cd was 3.34, 3.39, 2.25 and 0.87 mg kg -1 in the control soil, SB, SN and SNB soils, respectively. NovoGro not only showed a great capacity for Cd adsorption, but also promoted the growth of B. subtilis 38. This study provides a potential cost-effective technique for in situ remediation of Cd contaminated soils with bioaugmentation.

  7. A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

    Directory of Open Access Journals (Sweden)

    Robert Lindner

    Full Text Available Transcriptome sequencing (RNA-Seq overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.

  8. Use of hold-gro erosion control fabric in the establishment of plant species on coal mine soil.

    Science.gov (United States)

    Day, A D; Ludeke, K L

    1986-09-01

    Experiments were conducted on the Black Mesa Coal Mine, Kayenta, Arizona in 1977 and 1978 to study the effectiveness of Hold-Gro Erosion Control Fabric (a product from the Gulf States Paper Corporation, Tuscaloosa, Alabama) in the establishment of plants on coal mine soil following the surface mining of coal. Four plant species were planted: (1) spring barley (Horduem vulgare L.), an annual grass (2) crested wheatgrass (Agropyron cristatum L.), a perennial grass (3) alfalfa (lucerne) (Medicago sativa L.), a perennial legume and (4) fourwing saltbush (Atriplex canescens Pursh.), a perennial shrub. Seeds of each plant species were planted in reclaimed coal mine soil in the spring of the year by both broadcast seeding (conventional culture) and the incorporation of seeds in Hold-Gro Erosion Control Fabric. Average numbers of seedlings established and percent ground cover for all species studied were higher in areas where conventional culture was used than they were in areas where seeds were incorporated in Hold-Gro Erosion Control Fabric. The incorporation of seeds in Hold-Gro Erosion Control Fabric in the establishment of plant species on coal mine soil was not an effective cultural practice in the southwestern United States.

  9. Use of Hold-Gro Erosion Control Fabric in the establishment of plant species on coal mine soil

    Energy Technology Data Exchange (ETDEWEB)

    Day, A.D.; Ludeke, K.L.

    1986-09-01

    Experiments were conducted on the Black Mesa Coal Mine, Kayenta, Arizona in 1977 and 1978 to study the effectiveness of Hold-Gro Erosion Control Fabric (a product from the Gulf States Paper Corporation, Tuscaloosa, Alabama) in the establishment of plants on coal mine soil following the surface mining of coal. Four plant species were planted: spring barley (Horduem vulgare L.), an annual grass; crested wheatgrass (Agropyron cristatum L.), a perennial grass; alfalfa (lucerne) (Medicago sativa L.), a perennial legume; and fourwing saltbush (Atriplex canescens Pursh.), a perennial shrub. Seeds of each plant species were planted in reclaimed coal mine soil in the spring of the year by both broadcast seeding (conventional culture) and the incorporation of seeds in Hold-Gro Erosion Control Fabric. Average numbers of seedlings established and percent ground cover for all species studied were higher in areas where conventional culture was used than they were in areas where seeds were incorporated in Hold-Gro Erosion Control Fabric. The incorporation of seeds in Hold-Gro Erosion Control Fabric in the establishment of plant species on coal mine soil was not an effective cultural practice in the southwestern United States. 11 refs.

  10. miRge - A Multiplexed Method of Processing Small RNA-Seq Data to Determine MicroRNA Entropy.

    Directory of Open Access Journals (Sweden)

    Alexander S Baras

    Full Text Available Small RNA RNA-seq for microRNAs (miRNAs is a rapidly developing field where opportunities still exist to create better bioinformatics tools to process these large datasets and generate new, useful analyses. We built miRge to be a fast, smart small RNA-seq solution to process samples in a highly multiplexed fashion. miRge employs a Bayesian alignment approach, whereby reads are sequentially aligned against customized mature miRNA, hairpin miRNA, noncoding RNA and mRNA sequence libraries. miRNAs are summarized at the level of raw reads in addition to reads per million (RPM. Reads for all other RNA species (tRNA, rRNA, snoRNA, mRNA are provided, which is useful for identifying potential contaminants and optimizing small RNA purification strategies. miRge was designed to optimally identify miRNA isomiRs and employs an entropy based statistical measurement to identify differential production of isomiRs. This allowed us to identify decreasing entropy in isomiRs as stem cells mature into retinal pigment epithelial cells. Conversely, we show that pancreatic tumor miRNAs have similar entropy to matched normal pancreatic tissues. In a head-to-head comparison with other miRNA analysis tools (miRExpress 2.0, sRNAbench, omiRAs, miRDeep2, Chimira, UEA small RNA Workbench, miRge was faster (4 to 32-fold and was among the top-two methods in maximally aligning miRNAs reads per sample. Moreover, miRge has no inherent limits to its multiplexing. miRge was capable of simultaneously analyzing 100 small RNA-Seq samples in 52 minutes, providing an integrated analysis of miRNA expression across all samples. As miRge was designed for analysis of single as well as multiple samples, miRge is an ideal tool for high and low-throughput users. miRge is freely available at http://atlas.pathology.jhu.edu/baras/miRge.html.

  11. Quark enables semi-reference-based compression of RNA-seq data.

    Science.gov (United States)

    Sarkar, Hirak; Patro, Rob

    2017-11-01

    The past decade has seen an exponential increase in biological sequencing capacity, and there has been a simultaneous effort to help organize and archive some of the vast quantities of sequencing data that are being generated. Although these developments are tremendous from the perspective of maximizing the scientific utility of available data, they come with heavy costs. The storage and transmission of such vast amounts of sequencing data is expensive. We present Quark, a semi-reference-based compression tool designed for RNA-seq data. Quark makes use of a reference sequence when encoding reads, but produces a representation that can be decoded independently, without the need for a reference. This allows Quark to achieve markedly better compression rates than existing reference-free schemes, while still relieving the burden of assuming a specific, shared reference sequence between the encoder and decoder. We demonstrate that Quark achieves state-of-the-art compression rates, and that, typically, only a small fraction of the reference sequence must be encoded along with the reads to allow reference-free decompression. Quark is implemented in C ++11, and is available under a GPLv3 license at www.github.com/COMBINE-lab/quark. rob.patro@cs.stonybrook.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

    Science.gov (United States)

    Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar

    2013-10-15

    High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.

  13. LookSeq: A browser-based viewer for deep sequencing data

    OpenAIRE

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P.

    2009-01-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an ov...

  14. Defining the maize transcriptome de novo using deep RNA-Seq

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

    2011-06-01

    De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.

  15. Defining the maize transcriptome de novo using deep RNA-Seq

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

    2011-06-02

    De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.

  16. The Accretion Powered Spin-up of GRO 1750–27

    DEFF Research Database (Denmark)

    Kretschmar, P.; Shaw, S.; Hill, A. B.

    2009-01-01

    The transient Be X-ray pulsar GRO J1750-27 was originally detected in 1995 by CGRO/BATSE during a giant outburst. After a long period of quiescence the source was detected in another outburst early 2008. Following this outburst with hard X-ray data from INTEGRAL and Swift, the orbital parameters...

  17. A ScaI RFLP demonstrated for the GRO gene on chromosome 4

    Energy Technology Data Exchange (ETDEWEB)

    Beck, J.S.; Murray, J.C. (Univ. of Iowa Hospitals, Iowa City (USA)); Sager, R. (Dana Farber Cancer Institute, Boston, MA (USA))

    1989-11-11

    TC870 is a 0.85 kb fragment running from the EcoRI site 200 pb from the 5{prime} end of the human cDNA subcloned into the EcoRI site of pGEM3. ScaI detects a polymorphism with two variable bands of 19 kb and 16 kb. One strong constant band (14 kb) and two fainter constant bands (5 kb and 3 kb) are also present. The polymorphism type is unknown. GRO has been localized to 4q13-4q21 by somatic cell hybrid analysis and in situ hybridization. Codominant segregation and Hardy-Weinberg equilibrium demonstrated in 20 CEPH families. The probe also was assigned to chromosome 4 with significant linkage to ALB, GC, INP10, MT2P1. GRO may be the same gene as MGSA.

  18. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    Science.gov (United States)

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  19. An effective approach for identification of in vivo protein-DNA binding sites from paired-end ChIP-Seq data

    Directory of Open Access Journals (Sweden)

    Wilson Zoe A

    2010-02-01

    Full Text Available Abstract Background ChIP-Seq, which combines chromatin immunoprecipitation (ChIP with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. However, to maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites. Results Here, we present SIPeS (Site Identification from Paired-end Sequencing, a novel algorithm for precise identification of binding sites from short reads generated by paired-end solexa ChIP-Seq technology. In this paper we used ChIP-Seq data from the Arabidopsis basic helix-loop-helix transcription factor ABORTED MICROSPORES (AMS, which is expressed within the anther during pollen development, the results show that SIPeS has better resolution for binding site identification compared to two existing ChIP-Seq peak detection algorithms, Cisgenome and MACS. Conclusions When compared to Cisgenome and MACS, SIPeS shows better resolution for binding site discovery. Moreover, SIPeS is designed to calculate the mappable genome length accurately with the fragment length based on the paired-end reads. Dynamic baselines are also employed to effectively discriminate closely adjacent binding sites, for effective binding sites discovery, which is of particular value when working with high-density genomes.

  20. ChimericSeq: An open-source, user-friendly interface for analyzing NGS data to identify and characterize viral-host chimeric sequences.

    Directory of Open Access Journals (Sweden)

    Fwu-Shan Shieh

    Full Text Available Identification of viral integration sites has been important in understanding the pathogenesis and progression of diseases associated with particular viral infections. The advent of next-generation sequencing (NGS has enabled researchers to understand the impact that viral integration has on the host, such as tumorigenesis. Current computational methods to analyze NGS data of virus-host junction sites have been limited in terms of their accessibility to a broad user base. In this study, we developed a software application (named ChimericSeq, that is the first program of its kind to offer a graphical user interface, compatibility with both Windows and Mac operating systems, and optimized for effectively identifying and annotating virus-host chimeric reads within NGS data. In addition, ChimericSeq's pipeline implements custom filtering to remove artifacts and detect reads with quantitative analytical reporting to provide functional significance to discovered integration sites. The improved accessibility of ChimericSeq through a GUI interface in both Windows and Mac has potential to expand NGS analytical support to a broader spectrum of the scientific community.

  1. SraTailor: graphical user interface software for processing and visualizing ChIP-seq data.

    Science.gov (United States)

    Oki, Shinya; Maehara, Kazumitsu; Ohkawa, Yasuyuki; Meno, Chikara

    2014-12-01

    Raw data from ChIP-seq (chromatin immunoprecipitation combined with massively parallel DNA sequencing) experiments are deposited in public databases as SRAs (Sequence Read Archives) that are publically available to all researchers. However, to graphically visualize ChIP-seq data of interest, the corresponding SRAs must be downloaded and converted into BigWig format, a process that involves complicated command-line processing. This task requires users to possess skill with script languages and sequence data processing, a requirement that prevents a wide range of biologists from exploiting SRAs. To address these challenges, we developed SraTailor, a GUI (Graphical User Interface) software package that automatically converts an SRA into a BigWig-formatted file. Simplicity of use is one of the most notable features of SraTailor: entering an accession number of an SRA and clicking the mouse are the only steps required to obtain BigWig-formatted files and to graphically visualize the extents of reads at given loci. SraTailor is also able to make peak calls, generate files of other formats, process users' own data, and accept various command-line-like options. Therefore, this software makes ChIP-seq data fully exploitable by a wide range of biologists. SraTailor is freely available at http://www.devbio.med.kyushu-u.ac.jp/sra_tailor/, and runs on both Mac and Windows machines. © 2014 The Authors Genes to Cells © 2014 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  2. Strip detectors read-out system user's guide

    International Nuclear Information System (INIS)

    Claus, G.; Dulinski, W.; Lounis, A.

    1996-01-01

    The Strip Detector Read-out System consists of two VME modules: SDR-Flash and SDR-seq completed by a fast logic SDR-Trig stand alone card. The system is a self-consistent, cost effective and easy use solution for the read-out of analog multiplexed signals coming from some of the front-end electronics chips (Viking/VA chips family, Premus 128 etc...) currently used together with solid (silicon) or gas microstrip detectors. (author)

  3. Bibliothekarische Berufsethik in der Praxis – Ergebnisse eines Studienprojekts aus Deutschland und Großbritannien

    Directory of Open Access Journals (Sweden)

    Jens Boye

    2011-10-01

    Full Text Available Sowohl in Deutschland als auch in Großbritannien existieren seit einigen Jahren Grundsatzpapiere zur Berufsethik im Informations- und Bibliotheksbereich. Neben der bloßen Existenz derartiger Kodizes stellt sich die Frage, welche Relevanz diese Dokumente in der beruflichen Praxis haben. Dieser Frage wurde im Rahmen einer Team-Projektarbeit an der FH Köln nachgegangen. Mithilfe eines standardisierten Fragebogens wurden deutsche Bibliotheksleitungen zu der Thematik befragt. In Großbritannien wurde ein Interview mit Paul Sturges geführt, darüber hinaus erfolgte eine Analyse der Online-Angebote von CILIP, insbesondere des Information Ethics Blog.

  4. An empirical strategy to detect bacterial transcript structure from directional RNA-seq transcriptome data.

    Science.gov (United States)

    Wang, Yejun; MacKenzie, Keith D; White, Aaron P

    2015-05-07

    As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis. In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s. Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for

  5. Discovery of the 198 s X-Ray Pulsar GRO J2058+42

    Science.gov (United States)

    Wilson, Colleen A.; Finger, Mark H.; Harmon, B. Alan; Chakrabarty, Deepto; Strohmayer, Tod

    1997-01-01

    GRO J2058+42, a transient 198 second x-ray pulsar, was discovered by the Burst and Transient Source Experiment (BATSE) on the Compton Gamma-Ray Observatory (CGRO), during a "giant" outburst in 1995 September-October. The total flux peaked at about 300 mCrab (20-50 keV) as measured by Earth occultation. The pulse period decreased from 198 s to 196 s during the 46-day outburst. The pulse shape evolved over the course of the outburst and exhibited energy dependent variations. BATSE observed five additional weak outbursts from GRO J2058+427 each with two week duration and peak pulsed flux of about 15 mcrab (20-50 keV), that were spaced by about 110 days. An observation of the 1996 November outburst by the Rossi X-ray Timing Explorer (RXTE) Proportional Counter Array (PCA) localized the source to within a 4' radius error circle (90% confidence) centered on R.A. = 20 h 59 m.0, Decl. = 41 deg 43 min (J2000). Additional shorter outbursts with peak pulsed fluxes of about 8 mCrab were detected by BATSE halfway between the first four 15 mCrab outbursts. The RXTE All-Sky Monitor detected 8 weak outbursts with approximately equal durations and intensities. GRO J2058+42 is most likely a Be/X-ray binary that appears to outburst at periastron and apastron. No optical counterpart has been identified to date and no x-ray source was present in the error circle in archival ROSAT observations.

  6. Discovery of the 198 Second X-Ray Pulsar GRO J2058+42

    Science.gov (United States)

    Wilson, Colleen A.; Finger, Mark H.; Harmon, B. Alan; Chakrabarty, Deepto; Strohmayer, Tod

    1998-01-01

    GRO J2058+42, a transient 198 s X-ray pulsar, was discovered by the Burst and Transient Source Experiment (BATSE) on the Compton Gamma Ray Observatory (CGRO) during a "giant" outburst in 1995 September-October. The total flux peaked at about 300 mcrab (20-50 keV) as measured by Earth occultation. The pulse period decreased from 198 to 196 s during the 46 day outburst. The pulse shape evolved over the course of the outburst and exhibited energy-dependent variations. BATSE observed five additional weak outbursts from GRO J2058 + 42, each with a 2 week duration and a peak-pulsed flux of about 15 mcrab (20-50 keV), that were spaced by about 110 days. An observation of the 1996 November outburst by the Rossi X-Ray Timing Explorer (RXTE) proportional counter array (PCA) localized the source to within a 4 s radius error circle (90% confidence) centered on R.A. = 20h 59m.0, decl. = 41 deg 43 s (J2000). Additional shorter outbursts with peak-pulsed fluxes of about 8 mcrab were detected by BATSE halfway between the first four 15 mcrab outbursts. The RXTE All-Sky Monitor detected all eight weak outbursts with approximately equal durations and intensities. GRO J2058 + 42 is most likely a Be/X-ray binary that appears to outburst at periastron and apastron, No optical counterpart has been identified to date, and no X-ray source was present in the error circle in archival ROSAT observations.

  7. Confirmation of a high magnetic field in GRO J1008–57

    International Nuclear Information System (INIS)

    Bellm, Eric C.; Fürst, Felix; Harrison, Fiona A.; Walton, Dominic J.; Pottschmidt, Katja; Tomsick, John A.; Boggs, Steven E.; Craig, William W.; Chakrabarty, Deepto; Christensen, Finn E.; Hailey, Charles J.; Stern, Daniel; Wilms, Jörn; Zhang, William W.

    2014-01-01

    GRO J1008–57 is a high-mass X-ray binary for which several claims of a cyclotron resonance scattering feature near 80 keV have been reported. We use NuSTAR, Suzaku, and Swift data from its giant outburst of 2012 November to confirm the existence of the 80 keV feature and perform the most sensitive search to date for cyclotron scattering features at lower energies. We find evidence for a 78 −2 +3 keV line in the NuSTAR and Suzaku data at >4σ significance, confirming the detection using Suzaku alone by Yamamoto et al. A search of both the phase-averaged and phase-resolved data rules out a fundamental at lower energies with optical depth larger than 5% of the 78 keV line. These results indicate that GRO J1008–57 has a magnetic field of 6.7 × 10 12 (1 + z) G, the highest among known accreting pulsars.

  8. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    Science.gov (United States)

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  9. Analysis Of Transcriptomes In A Porcine Tissue Collection Using RNA-Seq And Genome Assembly 10

    DEFF Research Database (Denmark)

    Hornshøj, Henrik; Thomsen, Bo; Hedegaard, Jakob

    2011-01-01

    The release of Sus scrofa genome assembly 10 supports improvement of the pig genome annotation and in depth transcriptome analyses using next-generation sequencing technologies. In this study we analyze RNA-seq reads from a tissue collection, including 10 separate tissues from Duroc boars and 10...... short read alignment software we mapped the reads to the genome assembly 10. We extracted contig sequences of gene transcripts using the Cufflinks software. Based on this information we identified expressed genes that are present in the genome assembly. The portion of these genes being previously known...... was roughly estimated by sequence comparison to known genes. Similarly, we searched for genes that are expressed in the tissues but not present in the genome assembly by aligning the non-genome-mapped reads to known gene transcripts. For the genes predicted to have alternative transcript variants by Cufflinks...

  10. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    Science.gov (United States)

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  11. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection [version 2; referees: 2 approved, 1 approved with reservations

    Directory of Open Access Journals (Sweden)

    Laura Oikkonen

    2017-03-01

    Full Text Available Identifying variants from RNA-seq (transcriptome sequencing data is a cost-effective and versatile complement to whole-exome (WES and whole-genome sequencing (WGS analysis. RNA-seq (transcriptome sequencing is primarily considered a method of gene expression analysis but it can also be used to detect DNA variants in expressed regions of the genome. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.

  12. ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments.

    Science.gov (United States)

    Picardi, Ernesto; D'Antonio, Mattia; Carrabino, Danilo; Castrignanò, Tiziana; Pesole, Graziano

    2011-05-01

    ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context. ExpEdit is freely available on the web at http://www.caspur.it/ExpEdit/.

  13. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection [version 1; referees: 2 approved, 1 approved with reservations

    Directory of Open Access Journals (Sweden)

    Laura Oikkonen

    2017-01-01

    Full Text Available Identifying variants from RNA-seq (transcriptome sequencing data is a cost-effective and versatile alternative to whole-genome sequencing. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.

  14. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    Science.gov (United States)

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. CcpA affects expression of the groESL and dnaK operons in Lactobacillus plantarum

    Directory of Open Access Journals (Sweden)

    Marasco Rosangela

    2006-11-01

    Full Text Available Abstract Background Lactic acid bacteria (LAB are widely used in food industry and their growth performance is important for the quality of the fermented product. During industrial processes changes in temperature may represent an environmental stress to be overcome by starters and non-starters LAB. Studies on adaptation to heat shock have shown the involvement of the chaperon system-proteins in various Gram-positive bacteria. The corresponding operons, namely the dnaK and groESL operons, are controlled by a negative mechanism involving the HrcA repressor protein binding to the cis acting element CIRCE. Results We studied adaptation to heat shock in the lactic acid bacterium Lactobacillus plantarum. The LM3-2 strain, carrying a null mutation in the ccpA gene, encoding the catabolite control protein A (CcpA, showed a lower percent of survival to high temperature with respect to the LM3 wild type strain. Among proteins differentially expressed in the two strains, the GroES chaperon was more abundant in the wild type strain compared to the mutant strain under standard growth conditions. Transcriptional studies showed that class I heat shock operons were differentially expressed upon heat shock in both strains. Indeed, the dnaK and groESL operons were induced about two times more in the LM3 strain compared to the LM3-2 strain. Analysis of the regulatory region of the two operons showed the presence of cre sequences, putative binding sites for the CcpA protein. Conclusion The L. plantarum dnaK and groESL operons are characterized by the presence of the cis acting sequence CIRCE in the promoter region, suggesting a negative regulation by the HrcA/CIRCE system, which is a common type of control among the class I heat shock operons of Gram-positive bacteria. We found an additional system of regulation, based on a positive control exerted by the CcpA protein, which would interact with cre sequences present in the regulatory region of the dnaK and gro

  16. PolyaPeak: Detecting Transcription Factor Binding Sites from ChIP-seq Using Peak Shape Information

    Science.gov (United States)

    Wu, Hao; Ji, Hongkai

    2014-01-01

    ChIP-seq is a powerful technology for detecting genomic regions where a protein of interest interacts with DNA. ChIP-seq data for mapping transcription factor binding sites (TFBSs) have a characteristic pattern: around each binding site, sequence reads aligned to the forward and reverse strands of the reference genome form two separate peaks shifted away from each other, and the true binding site is located in between these two peaks. While it has been shown previously that the accuracy and resolution of binding site detection can be improved by modeling the pattern, efficient methods are unavailable to fully utilize that information in TFBS detection procedure. We present PolyaPeak, a new method to improve TFBS detection by incorporating the peak shape information. PolyaPeak describes peak shapes using a flexible Pólya model. The shapes are automatically learnt from the data using Minorization-Maximization (MM) algorithm, then integrated with the read count information via a hierarchical model to distinguish true binding sites from background noises. Extensive real data analyses show that PolyaPeak is capable of robustly improving TFBS detection compared with existing methods. An R package is freely available. PMID:24608116

  17. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification

    Directory of Open Access Journals (Sweden)

    Tamar Hashimshony

    2012-09-01

    Full Text Available High-throughput sequencing has allowed for unprecedented detail in gene expression analyses, yet its efficient application to single cells is challenged by the small starting amounts of RNA. We have developed CEL-Seq, a method for overcoming this limitation by barcoding and pooling samples before linearly amplifying mRNA with the use of one round of in vitro transcription. We show that CEL-Seq gives more reproducible, linear, and sensitive results than a PCR-based amplification method. We demonstrate the power of this method by studying early C. elegans embryonic development at single-cell resolution. Differential distribution of transcripts between sister cells is seen as early as the two-cell stage embryo, and zygotic expression in the somatic cell lineages is enriched for transcription factors. The robust transcriptome quantifications enabled by CEL-Seq will be useful for transcriptomic analyses of complex tissues containing populations of diverse cell types.

  18. GenoGAM: genome-wide generalized additive models for ChIP-Seq analysis.

    Science.gov (United States)

    Stricker, Georg; Engelhardt, Alexander; Schulz, Daniel; Schmid, Matthias; Tresch, Achim; Gagneur, Julien

    2017-08-01

    Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective. Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays. Software is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/GenoGAM.html . gagneur@in.tum.de. Supplementary information is available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Impact of genome assembly status on ChIP-Seq and ChIP-PET data mapping

    Directory of Open Access Journals (Sweden)

    Sachs Laurent

    2009-12-01

    Full Text Available Abstract Background ChIP-Seq and ChIP-PET can potentially be used with any genome for genome wide profiling of protein-DNA interaction sites. Unfortunately, it is probable that most genome assemblies will never reach the quality of the human genome assembly. Therefore, it remains to be determined whether ChIP-Seq and ChIP-PET are practicable with genome sequences other than a few (e.g. human and mouse. Findings Here, we used in silico simulations to assess the impact of completeness or fragmentation of genome assemblies on ChIP-Seq and ChIP-PET data mapping. Conclusions Most currently published genome assemblies are suitable for mapping the short sequence tags produced by ChIP-Seq or ChIP-PET.

  20. Molecular cloning of the potato Gro1-4 gene conferring resistance to pathotype Ro1 of the root cyst nematode Globodera rostochiensis, based on a candidate gene approach.

    Science.gov (United States)

    Paal, Jürgen; Henselewski, Heike; Muth, Jost; Meksem, Khalid; Menéndez, Cristina M; Salamini, Francesco; Ballvora, Agim; Gebhardt, Christiane

    2004-04-01

    The endoparasitic root cyst nematode Globodera rostochiensis causes considerable damage in potato cultivation. In the past, major genes for nematode resistance have been introgressed from related potato species into cultivars. Elucidating the molecular basis of resistance will contribute to the understanding of nematode-plant interactions and assist in breeding nematode-resistant cultivars. The Gro1 resistance locus to G. rostochiensis on potato chromosome VII co-localized with a resistance-gene-like (RGL) DNA marker. This marker was used to isolate from genomic libraries 15 members of a closely related candidate gene family. Analysis of inheritance, linkage mapping, and sequencing reduced the number of candidate genes to three. Complementation analysis by stable potato transformation showed that the gene Gro1-4 conferred resistance to G. rostochiensis pathotype Ro1. Gro1-4 encodes a protein of 1136 amino acids that contains Toll-interleukin 1 receptor (TIR), nucleotide-binding (NB), leucine-rich repeat (LRR) homology domains and a C-terminal domain with unknown function. The deduced Gro1-4 protein differed by 29 amino acid changes from susceptible members of the Gro1 gene family. Sequence characterization of 13 members of the Gro1 gene family revealed putative regulatory elements and a variable microsatellite in the promoter region, insertion of a retrotransposon-like element in the first intron, and a stop codon in the NB coding region of some genes. Sequence analysis of RT-PCR products showed that Gro1-4 is expressed, among other members of the family including putative pseudogenes, in non-infected roots of nematode-resistant plants. RT-PCR also demonstrated that members of the Gro1 gene family are expressed in most potato tissues.

  1. Tandem differential mobility analysis-mass spectrometry reveals partial gas-phase collapse of the GroEL complex.

    Science.gov (United States)

    Hogan, Christopher J; Ruotolo, Brandon T; Robinson, Carol V; Fernandez de la Mora, Juan

    2011-04-07

    A parallel-plate differential mobility analyzer and a time-of-flight mass spectrometer (DMA-MS) are used in series to measure true mobility in dry atmospheric pressure air for mass-resolved electrosprayed GroEL tetradecamers (14-mers; ~800 kDa). Narrow mobility peaks are found (2.6-2.9% fwhm); hence, precise mobilities can be obtained for these ions without collisional activation, just following their generation by electrospray ionization. In contrast to previous studies, two conformers are found with mobilities (Z) differing by ~5% at charge state z ~ 79. By extrapolating to small z, a common mobility/charge ratio Z(0)/z = 0.0117 cm(2) V(-1) s(-1) is found for both conformers. When interpreted as if the GroEL ion surface were smooth and the gas molecule-protein collisions were perfectly elastic and specular, this mobility yields an experimental collision cross section, Ω, 11% smaller than in an earlier measurement, and close to the cross section, A(C,crystal), expected for the crystal structure (determined by a geometric approximation). However, the similarity between Ω and A(C,crystal) does not imply a coincidence between the native and gas-phase structures. The nonideal nature of protein-gas molecule collisions introduces a drag enhancement factor, ξ = 1.36, with which the true cross section A(C) is related to Ω via A(C) = Ω/ξ. Therefore, A(C) for GroEL 14-mer ions determined by DMA measurements is 0.69A(C,crystal). The factor 1.36 used here is based on the experimental Stokes-Millikan equation, as well as on prior and new numerical modeling accounting for multiple scattering events via exact hard-sphere scattering calculations. Therefore, we conclude that the gas-phase structure of the GroEL complex as electrosprayed is substantially more compact than the corresponding X-ray crystal structure.

  2. RNA-seq analysis of Quercus pubescens Leaves: de novo transcriptome assembly, annotation and functional markers development.

    Directory of Open Access Journals (Sweden)

    Sara Torre

    Full Text Available Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.

  3. An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

    Science.gov (United States)

    Xu, Maoqi; Chen, Liang

    2018-01-01

    The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. Strontium-Substituted Bioceramics Particles: A New Way to Modulate MCP-1 and Gro-α Production by Human Primary Osteoblastic Cells

    Directory of Open Access Journals (Sweden)

    Julien Braux

    2016-12-01

    Full Text Available Background: To avoid morbidity and limited availability associated with autografts, synthetic calcium phosphate (CaP ceramics were extensively developed and used as bone filling materials. Controlling their induced-inflammatory response nevertheless remained a major concern. Strontium-containing CaP ceramics were recently demonstrated for impacting cytokines’ secretion pattern of human primary monocytes. The present study focuses on the ability of strontium-containing CaP to control the human primary bone cell production of two major inflammatory and pro-osteoclastogenic mediators, namely MCP-1 and Gro-α, in response to ceramics particles. Methods: This in vitro study was performed using human primary osteoblasts in which their response to ceramics was evaluated by PCR arrays, antibody arrays were used for screening and real-time PCR and ELISA for more focused analyses. Results: Study of mRNA and protein expression highlights that human primary bone cells are able to produce these inflammatory mediators and reveal that the adjunction of CaP in the culture medium leads to their enhanced production. Importantly, the current work determines the down-regulating effect of strontium-substituted CaP on MCP-1 and Gro-α production. Conclusion: Our findings point out a new capability of strontium to modulate human primary bone cells’ communication with the immune system.

  5. Further observations of GRO J1750-27 (AXJ1749.1-2639) with INTEGRAL

    DEFF Research Database (Denmark)

    Brandt, Søren Kristian; Shaw, S.; Hill, A.

    2008-01-01

    The transient accreting X-ray pulsar GRO J1750-27 (AX J1749.1-2639), which became active end of January 2008 (ATel #1376), has been repeatedly observed by the INTEGRAL Galactic Bulge monitoring program since mid February (ATel #1385) on 11, 20 and 23 Feb. 2008. During the three observations...

  6. Elucidation of terpenoid metabolism in Scoparia dulcis by RNA-seq analysis.

    Science.gov (United States)

    Yamamura, Yoshimi; Kurosaki, Fumiya; Lee, Jung-Bum

    2017-03-07

    Scoparia dulcis biosynthesize bioactive diterpenes, such as scopadulcic acid B (SDB), which are known for their unique molecular skeleton. Although the biosynthesis of bioactive diterpenes is catalyzed by a sequence of class II and class I diterpene synthases (diTPSs), the mechanisms underlying this process are yet to be fully identified. To elucidate these biosynthetic machinery, we performed a high-throughput RNA-seq analysis, and de novo assembly of clean reads revealed 46,332 unique transcripts and 40,503 two unigenes. We found diTPSs genes including a putative syn-copalyl diphosphate synthase (SdCPS2) and two kaurene synthase-like (SdKSLs) genes. Besides them, total 79 full-length of cytochrome P450 (CYP450) genes were also discovered. The expression analyses showed selected CYP450s associated with their expression pattern of SdCPS2 and SdKSL1, suggesting that CYP450 candidates involved diterpene modification. SdCPS2 represents the first predicted gene to produce syn-copalyl diphosphate in dicots. In addition, SdKSL1 potentially contributes to the SDB biosynthetic pathway. Therefore, these identified genes associated with diterpene biosynthesis lead to the development of genetic engineering focus on diterpene metabolism in S. dulcis.

  7. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

    DEFF Research Database (Denmark)

    Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol

    2012-01-01

    RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the I......RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated...... gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays...

  8. Nebula--a web-server for advanced ChIP-seq data analysis.

    Science.gov (United States)

    Boeva, Valentina; Lermine, Alban; Barette, Camille; Guillouf, Christel; Barillot, Emmanuel

    2012-10-01

    ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3-5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Nebula is available at http://nebula.curie.fr/ Supplementary data are available at Bioinformatics online.

  9. Limits on the quiescent radio emission from the black hole binaries GRO J1655−40 and XTE J1550−564

    NARCIS (Netherlands)

    Calvelo, D.E.; Fender, R.P.; Russell, D.M.; Gallo, E.; Corbel, S.; Tzioumis, A.K.; Bell, M.E.; Lewis, F.; Maccarone, T.J.

    2010-01-01

    We present the results of radio observations of the black hole binaries GRO J1655−40 and XTE J1550−564 in quiescence, with the upgraded Australia Telescope Compact Array. Neither system was detected. Radio flux density upper limits (3σ) of 26 μJy (at 5.5 GHz), 47 μJy (at 9 GHz) for GRO J1655−40 and

  10. Anton Grams dem Gustav Friedrich Wilhelm Großmann: Mozart schreibe eine neue Oper

    Czech Academy of Sciences Publication Activity Database

    Jonášová, Milada

    2016-01-01

    Roč. 53, č. 1 (2016), s. 29-53 ISSN 0018-7003 R&D Projects: GA ČR GAP409/12/2563 Institutional support: RVO:68378076 Keywords : Mozart * 1786 * letter * Anton Grams * Gustav Friedrich Wilhelm Großmann Subject RIV: AL - Art, Architecture, Cultural Heritage

  11. Glyprolines exert protective and repair-promoting effects in the rat stomach: potential role of the cytokine GRO/CINC-1.

    Science.gov (United States)

    Bakaeva, Z V; Sangadzhieva, A D; Tani, S; Myasoedov, N F; Andreeva, L A; Torshin, V I; Wallace, J L; Tanaka, T

    2016-04-01

    Glyprolines have been reported to exert protective effects in the stomach. In this study, we examined the potential effects of intranasal administration of Pro-Gly-Pro (PGP) and N-acetyl-Pro-Gly-Pro (AcPGP) on experimental gastric ulcer formation and healing. We also studied gastric release of the cytokine GRO/CINC-1, and its potential role in ulcer development and healing. Gastric ulcers were induced in rats by applying acetic acid to the serosa of the stomach. PGP and AcPGP were then administered at a dose of 3.7 μmol/kg once daily on either days 1 - 3 (ulcer formation) or days 4 - 6 (ulcer healing). Measurement of ulcer area and histological examination of gastric tissue were carried out on days 4 and 7 after application of acetic acid. In vitro studies involved addition of the glyprolines to cultured rat gastric epithelial cells with or without lipopolysaccharide. Reverse transcription PCR, real-time PCR and ELISA were used for cytokine analysis. PGP and AcPGP significantly reduced ulcer areas on the 4(th) day and accelerated the healing on the 7(th) day compared with the control. After acetic acid-induced ulceration, the expression of GRO/CINC-1 mRNA in gastric tissue was increased 9-fold versus the sham-operated group. Treatment with PGP or AcPGP both significantly suppressed the expression of GRO/CINC-1 mRNA in gastric tissue. However, the glyprolines did not alter LPS-induced mRNA expression or release of GRO/CINC-1 from cultured rat gastric epithelial cells, even though those cells were harvested from rats subjected to the ulcer-induction procedure. The results of this study show that intranasal administration of PGP and AcPGP significantly increased resistance against acetic acid-induced ulceration and accelerated healing in the rats. These effects may be due, at least in part, to their ability to reduce the acetic acid-induced GRO/CINC-1 expression and production in gastric tissue.

  12. SAMMate: a GUI tool for processing short read alignments in SAM/BAM format

    Directory of Open Access Journals (Sweden)

    Flemington Erik

    2011-01-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/Map (SAM or Binary SAM (BAM format is now standard, biomedical researchers still have difficulty accessing this information. Results We have developed a Graphical User Interface (GUI software tool named SAMMate. SAMMate allows biomedical researchers to quickly process SAM/BAM files and is compatible with both single-end and paired-end sequencing technologies. SAMMate also automates some standard procedures in DNA-seq and RNA-seq data analysis. Using either standard or customized annotation files, SAMMate allows users to accurately calculate the short read coverage of genomic intervals. In particular, for RNA-seq data SAMMate can accurately calculate the gene expression abundance scores for customized genomic intervals using short reads originating from both exons and exon-exon junctions. Furthermore, SAMMate can quickly calculate a whole-genome signal map at base-wise resolution allowing researchers to solve an array of bioinformatics problems. Finally, SAMMate can export both a wiggle file for alignment visualization in the UCSC genome browser and an alignment statistics report. The biological impact of these features is demonstrated via several case studies that predict miRNA targets using short read alignment information files. Conclusions With just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files. Our software is constantly updated and will greatly facilitate the downstream analysis of NGS data. Both the source code and the GUI executable are freely available under the GNU General Public License at http://sammate.sourceforge.net.

  13. Escherichia coli fusion carrier proteins act as solubilizing agents for recombinant uncoupling protein 1 through interactions with GroEL

    International Nuclear Information System (INIS)

    Douette, Pierre; Navet, Rachel; Gerkens, Pascal; Galleni, Moreno; Levy, Daniel; Sluse, Francis E.

    2005-01-01

    Fusing recombinant proteins to highly soluble partners is frequently used to prevent aggregation of recombinant proteins in Escherichia coli. Moreover, co-overexpression of prokaryotic chaperones can increase the amount of properly folded recombinant proteins. To understand the solubility enhancement of fusion proteins, we designed two recombinant proteins composed of uncoupling protein 1 (UCP1), a mitochondrial membrane protein, in fusion with MBP or NusA. We were able to express soluble forms of MBP-UCP1 and NusA-UCP1 despite the high hydrophobicity of UCP1. Furthermore, the yield of soluble fusion proteins depended on co-overexpression of GroEL that catalyzes folding of polypeptides. MBP-UCP1 was expressed in the form of a non-covalent complex with GroEL. MBP-UCP1/GroEL was purified and characterized by dynamic light scattering, gel filtration, and electron microscopy. Our findings suggest that MBP and NusA act as solubilizing agents by forcing the recombinant protein to pass through the bacterial chaperone pathway in the context of fusion protein

  14. Population structure of Atlantic Mackerel inferred from RAD-seq derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selection

    KAUST Repository

    Rodríguez-Ezpeleta, Naiara

    2016-03-03

    Restriction-site associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in non-model organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species.

  15. Profiling microRNA expression in bovine alveolar macrophages using RNA-seq.

    Science.gov (United States)

    Vegh, Peter; Foroushani, Amir B K; Magee, David A; McCabe, Matthew S; Browne, John A; Nalpas, Nicolas C; Conlon, Kevin M; Gordon, Stephen V; Bradley, Daniel G; MacHugh, David E; Lynn, David J

    2013-10-01

    MicroRNAs (miRNAs) are important regulators of gene expression and are known to play a key role in regulating both adaptive and innate immunity. Bovine alveolar macrophages (BAMs) help maintain lung homeostasis and constitute the front line of host defense against several infectious respiratory diseases, such as bovine tuberculosis. Little is known, however, about the role miRNAs play in these cells. In this study, we used a high-throughput sequencing approach, RNA-seq, to determine the expression levels of known and novel miRNAs in unchallenged BAMs isolated from lung lavages of eight different healthy Holstein-Friesian male calves. Approximately 80 million sequence reads were generated from eight BAM miRNA Illumina sequencing libraries, and 80 miRNAs were identified as being expressed in BAMs at a threshold of at least 100 reads per million (RPM). The expression levels of miRNAs varied over a large dynamic range, with a few miRNAs expressed at very high levels (up to 800,000RPM), and the majority lowly expressed. Notably, many of the most highly expressed miRNAs in BAMs have known roles in regulating immunity in other species (e.g. bta-let-7i, bta-miR-21, bta-miR-27, bta-miR-99b, bta-miR-146, bta-miR-147, bta-miR-155 and bta-miR-223). The most highly expressed miRNA in BAMs was miR-21, which has been shown to regulate the expression of antimicrobial peptides in Mycobacterium leprae-infected human monocytes. Furthermore, the predicted target genes of BAM-expressed miRNAs were found to be statistically enriched for roles in innate immunity. In addition to profiling the expression of known miRNAs, the RNA-seq data was also analysed to identify potentially novel bovine miRNAs. One putatively novel bovine miRNA was identified. To the best of our knowledge, this is the first RNA-seq study to profile miRNA expression in BAMs and provides an important reference dataset for investigating the regulatory roles miRNAs play in this important immune cell type. Copyright

  16. Whole transcriptome expression analysis and comparison of two different strains of Plasmodium falciparum using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Hiasindh Ashmi Antony

    2016-06-01

    Full Text Available The emergence and distribution of drug resistance in malaria are serious public health concerns in tropical and subtropical regions of the world. However, the molecular mechanism of drug resistance remains unclear. In the present study, we performed a high-throughput RNA-Seq to identify and characterize the differentially expressed genes between the chloroquine (CQ sensitive (3D7 and resistant (Dd2 strains of Plasmodium falciparum. The parasite cells were cultured in the presence and absence of CQ by in vitro method. Total RNA was isolated from the harvested parasite cells using TRIzol, and RNA-Seq was conducted using an Illumina HiSeq 2500 sequencing platform with paired-end reads and annotated using Tophat. The transcriptome analysis of P. falciparum revealed the expression of ~5000 genes, in which ~60% of the genes have unknown function. Cuffdiff program was used to identify the differentially expressed genes between the CQ-sensitive and resistant strains. Here, we furnish a detailed description of the experimental design, procedure, and analysis of the transcriptome sequencing data, that have been deposited in the National Center for Biotechnology Information (accession nos. PRJNA308455 and GSE77499.

  17. ChimericSeq: An open-source, user-friendly interface for analyzing NGS data to identify and characterize viral-host chimeric sequences

    Science.gov (United States)

    Shieh, Fwu-Shan; Jongeneel, Patrick; Steffen, Jamin D.; Lin, Selena; Jain, Surbhi; Song, Wei

    2017-01-01

    Identification of viral integration sites has been important in understanding the pathogenesis and progression of diseases associated with particular viral infections. The advent of next-generation sequencing (NGS) has enabled researchers to understand the impact that viral integration has on the host, such as tumorigenesis. Current computational methods to analyze NGS data of virus-host junction sites have been limited in terms of their accessibility to a broad user base. In this study, we developed a software application (named ChimericSeq), that is the first program of its kind to offer a graphical user interface, compatibility with both Windows and Mac operating systems, and optimized for effectively identifying and annotating virus-host chimeric reads within NGS data. In addition, ChimericSeq’s pipeline implements custom filtering to remove artifacts and detect reads with quantitative analytical reporting to provide functional significance to discovered integration sites. The improved accessibility of ChimericSeq through a GUI interface in both Windows and Mac has potential to expand NGS analytical support to a broader spectrum of the scientific community. PMID:28829778

  18. A technical assessment of the porcine ejaculated spermatozoa for a sperm-specific RNA-seq analysis.

    Science.gov (United States)

    Gòdia, Marta; Mayer, Fabiana Quoos; Nafissi, Julieta; Castelló, Anna; Rodríguez-Gil, Joan Enric; Sánchez, Armand; Clop, Alex

    2018-04-26

    The study of the boar sperm transcriptome by RNA-seq can provide relevant information on sperm quality and fertility and might contribute to animal breeding strategies. However, the analysis of the spermatozoa RNA is challenging as these cells harbor very low amounts of highly fragmented RNA, and the ejaculates also contain other cell types with larger amounts of non-fragmented RNA. Here, we describe a strategy for a successful boar sperm purification, RNA extraction and RNA-seq library preparation. Using these approaches our objectives were: (i) to evaluate the sperm recovery rate (SRR) after boar spermatozoa purification by density centrifugation using the non-porcine-specific commercial reagent BoviPure TM ; (ii) to assess the correlation between SRR and sperm quality characteristics; (iii) to evaluate the relationship between sperm cell RNA load and sperm quality traits and (iv) to compare different library preparation kits for both total RNA-seq (SMARTer Universal Low Input RNA and TruSeq RNA Library Prep kit) and small RNA-seq (NEBNext Small RNA and TailorMix miRNA Sample Prep v2) for high-throughput sequencing. Our results show that pig SRR (~22%) is lower than in other mammalian species and that it is not significantly dependent of the sperm quality parameters analyzed in our study. Moreover, no relationship between the RNA yield per sperm cell and sperm phenotypes was found. We compared a RNA-seq library preparation kit optimized for low amounts of fragmented RNA with a standard kit designed for high amount and quality of input RNA and found that for sperm, a protocol designed to work on low-quality RNA is essential. We also compared two small RNA-seq kits and did not find substantial differences in their performance. We propose the methodological workflow described for the RNA-seq screening of the boar spermatozoa transcriptome. FPKM: fragments per kilobase of transcript per million mapped reads; KRT1: keratin 1; miRNA: micro-RNA; miscRNA: miscellaneous

  19. Activation of phagocytic cells by Staphylococcus epidermidis biofilms: effects of extracellular matrix proteins and the bacterial stress protein GroEL on netosis and MRP-14 release.

    Science.gov (United States)

    Dapunt, Ulrike; Gaida, Matthias M; Meyle, Eva; Prior, Birgit; Hänsch, Gertrud M

    2016-07-01

    The recognition and phagocytosis of free-swimming (planktonic) bacteria by polymorphonuclear neutrophils have been investigated in depth. However, less is known about the neutrophil response towards bacterial biofilms. Our previous work demonstrated that neutrophils recognize activating entities within the extracellular polymeric substance (EPS) of biofilms (the bacterial heat shock protein GroEL) and that this process does not require opsonization. Aim of this study was to evaluate the release of DNA by neutrophils in response to biofilms, as well as the release of the inflammatory cytokine MRP-14. Neutrophils were stimulated with Staphylococcus epidermidis biofilms, planktonic bacteria, extracted EPS and GroEL. Release of DNA and of MRP-14 was evaluated. Furthermore, tissue samples from patients suffering from biofilm infections were collected and evaluated by histology. MRP-14 concentration in blood samples was measured. We were able to show that biofilms, the EPS and GroEL induce DNA release. MRP-14 was only released after stimulation with EPS, not GroEL. Histology of tissue samples revealed MRP-14 positive cells in association with neutrophil infiltration and MRP-14 concentration was elevated in blood samples of patients suffering from biofilm infections. Our data demonstrate that neutrophil-activating entities are present in the EPS and that GroEL induces DNA release by neutrophils. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. The htpAB operon of Legionella pneumophila cannot be deleted in the presence of the groE chaperonin operon of Escherichia coli.

    Science.gov (United States)

    Nasrallah, Gheyath K; Gagnon, Elizabeth; Orton, Dennis J; Garduño, Rafael A

    2011-11-01

    HtpB, the chaperonin of the intracellular bacterial pathogen Legionella pneumophila , displays several virulence-related functions in vitro. To confirm HtpB's role in vivo, host infections with an htpB deletion mutant would be required. However, we previously reported that the htpAB operon (encoding co-chaperonin and chaperonin) is essential. We attempted here to delete htpAB in a L. pneumophila strain carrying the groE operon (encoding the Escherichia coli co-chaperonin and chaperonin). The groE operon was inserted into the chromosome of L. pneumophila Lp02, and then allelic replacement of htpAB with a gentamicin resistance cassette was attempted. Although numerous potential postallelic replacement transformants showed a correct selection phenotype, we still detected htpAB by PCR and full-size HtpB by immunoblot. Southern blot and PCR analysis indicated that the gentamicin resistance cassette had apparently integrated in a duplicated htpAB region. However, we showed by Southern blot that strain Lp02, and the Lp02 derivative carrying the groE operon, have only one copy of htpAB. These results confirmed that the htpAB operon cannot be deleted, not even in the presence of the groE operon, and suggested that attempts to delete htpAB under strong phenotypic selection result in aberrant genetic recombinations that could involve duplication of the htpAB locus.

  1. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    Science.gov (United States)

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  2. Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq.

    Science.gov (United States)

    Guertin, Michael J; Cullen, Amy E; Markowetz, Florian; Holding, Andrew N

    2018-04-17

    A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.

  3. RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.

    Science.gov (United States)

    Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei

    2014-07-07

    Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.

  4. ISVASE: identification of sequence variant associated with splicing event using RNA-seq data.

    Science.gov (United States)

    Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Yu, Jun; Hu, Songnian

    2017-06-28

    Exon recognition and splicing precisely and efficiently by spliceosome is the key to generate mature mRNAs. About one third or a half of disease-related mutations affect RNA splicing. Software PVAAS has been developed to identify variants associated with aberrant splicing by directly using RNA-seq data. However, it bases on the assumption that annotated splicing site is normal splicing, which is not true in fact. We develop the ISVASE, a tool for specifically identifying sequence variants associated with splicing events (SVASE) by using RNA-seq data. Comparing with PVAAS, our tool has several advantages, such as multi-pass stringent rule-dependent filters and statistical filters, only using split-reads, independent sequence variant identification in each part of splicing (junction), sequence variant detection for both of known and novel splicing event, additional exon-exon junction shift event detection if known splicing events provided, splicing signal evaluation, known DNA mutation and/or RNA editing data supported, higher precision and consistency, and short running time. Using a realistic RNA-seq dataset, we performed a case study to illustrate the functionality and effectiveness of our method. Moreover, the output of SVASEs can be used for downstream analysis such as splicing regulatory element study and sequence variant functional analysis. ISVASE is useful for researchers interested in sequence variants (DNA mutation and/or RNA editing) associated with splicing events. The package is freely available at https://sourceforge.net/projects/isvase/ .

  5. 5' Rapid Amplification of cDNA Ends and Illumina MiSeq Reveals B Cell Receptor Features in Healthy Adults, Adults With Chronic HIV-1 Infection, Cord Blood, and Humanized Mice.

    Science.gov (United States)

    Waltari, Eric; Jia, Manxue; Jiang, Caroline S; Lu, Hong; Huang, Jing; Fernandez, Cristina; Finzi, Andrés; Kaufmann, Daniel E; Markowitz, Martin; Tsuji, Moriya; Wu, Xueling

    2018-01-01

    Using 5' rapid amplification of cDNA ends, Illumina MiSeq, and basic flow cytometry, we systematically analyzed the expressed B cell receptor (BCR) repertoire in 14 healthy adult PBMCs, 5 HIV-1+ adult PBMCs, 5 cord blood samples, and 3 HIS-CD4/B mice, examining the full-length variable region of μ, γ, α, κ, and λ chains for V-gene usage, somatic hypermutation (SHM), and CDR3 length. Adding to the known repertoire of healthy adults, Illumina MiSeq consistently detected small fractions of reads with high mutation frequencies including hypermutated μ reads, and reads with long CDR3s. Additionally, the less studied IgA repertoire displayed similar characteristics to that of IgG. Compared to healthy adults, the five HIV-1 chronically infected adults displayed elevated mutation frequencies for all μ, γ, α, κ, and λ chains examined and slightly longer CDR3 lengths for γ, α, and λ. To evaluate the reconstituted human BCR sequences in a humanized mouse model, we analyzed cord blood and HIS-CD4/B mice, which all lacked the typical SHM seen in the adult reference. Furthermore, MiSeq revealed identical unmutated IgM sequences derived from separate cell aliquots, thus for the first time demonstrating rare clonal members of unmutated IgM B cells by sequencing.

  6. RNA-SeQC: RNA-seq metrics for quality control and process optimization.

    Science.gov (United States)

    DeLuca, David S; Levin, Joshua Z; Sivachenko, Andrey; Fennell, Timothy; Nazaire, Marc-Danie; Williams, Chris; Reich, Michael; Winckler, Wendy; Getz, Gad

    2012-06-01

    RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3'/5' bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis. See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool.

  7. A quantitative and qualitative comparison of illumina MiSeq and 454 amplicon sequencing for genotyping the highly polymorphic major histocompatibility complex (MHC) in a non-model species.

    Science.gov (United States)

    Razali, Haslina; O'Connor, Emily; Drews, Anna; Burke, Terry; Westerdahl, Helena

    2017-07-28

    High-throughput sequencing enables high-resolution genotyping of extremely duplicated genes. 454 amplicon sequencing (454) has become the standard technique for genotyping the major histocompatibility complex (MHC) genes in non-model organisms. However, illumina MiSeq amplicon sequencing (MiSeq), which offers a much higher read depth, is now superseding 454. The aim of this study was to quantitatively and qualitatively evaluate the performance of MiSeq in relation to 454 for genotyping MHC class I alleles using a house sparrow (Passer domesticus) dataset with pedigree information. House sparrows provide a good study system for this comparison as their MHC class I genes have been studied previously and, consequently, we had prior expectations concerning the number of alleles per individual. We found that 454 and MiSeq performed equally well in genotyping amplicons with low diversity, i.e. amplicons from individuals that had fewer than 6 alleles. Although there was a higher rate of failure in the 454 dataset in resolving amplicons with higher diversity (6-9 alleles), the same genotypes were identified by both 454 and MiSeq in 98% of cases. We conclude that low diversity amplicons are equally well genotyped using either 454 or MiSeq, but the higher coverage afforded by MiSeq can lead to this approach outperforming 454 in amplicons with higher diversity.

  8. 5′ Rapid Amplification of cDNA Ends and Illumina MiSeq Reveals B Cell Receptor Features in Healthy Adults, Adults With Chronic HIV-1 Infection, Cord Blood, and Humanized Mice

    Directory of Open Access Journals (Sweden)

    Eric Waltari

    2018-03-01

    Full Text Available Using 5′ rapid amplification of cDNA ends, Illumina MiSeq, and basic flow cytometry, we systematically analyzed the expressed B cell receptor (BCR repertoire in 14 healthy adult PBMCs, 5 HIV-1+ adult PBMCs, 5 cord blood samples, and 3 HIS-CD4/B mice, examining the full-length variable region of μ, γ, α, κ, and λ chains for V-gene usage, somatic hypermutation (SHM, and CDR3 length. Adding to the known repertoire of healthy adults, Illumina MiSeq consistently detected small fractions of reads with high mutation frequencies including hypermutated μ reads, and reads with long CDR3s. Additionally, the less studied IgA repertoire displayed similar characteristics to that of IgG. Compared to healthy adults, the five HIV-1 chronically infected adults displayed elevated mutation frequencies for all μ, γ, α, κ, and λ chains examined and slightly longer CDR3 lengths for γ, α, and λ. To evaluate the reconstituted human BCR sequences in a humanized mouse model, we analyzed cord blood and HIS-CD4/B mice, which all lacked the typical SHM seen in the adult reference. Furthermore, MiSeq revealed identical unmutated IgM sequences derived from separate cell aliquots, thus for the first time demonstrating rare clonal members of unmutated IgM B cells by sequencing.

  9. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    Science.gov (United States)

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Whole genome complete resequencing of Bacillus subtilis natto by combining long reads with high-quality short reads.

    Directory of Open Access Journals (Sweden)

    Mayumi Kamada

    Full Text Available De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food "natto." The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome.

  11. Fully automated pipeline for detection of sex linked genes using RNA-Seq data.

    Science.gov (United States)

    Michalovova, Monika; Kubat, Zdenek; Hobza, Roman; Vyskot, Boris; Kejnovsky, Eduard

    2015-03-11

    Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design

  12. GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.

    Science.gov (United States)

    Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A

    2017-05-15

    Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction

  13. Towards the integration, annotation and association of historical microarray experiments with RNA-seq.

    Science.gov (United States)

    Chavan, Shweta S; Bauer, Michael A; Peterson, Erich A; Heuck, Christoph J; Johann, Donald J

    2013-01-01

    Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches. Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets. Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline. A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.

  14. RNA-Seq-Based Transcript Structure Analysis with TrBorderExt.

    Science.gov (United States)

    Wang, Yejun; Sun, Ming-An; White, Aaron P

    2018-01-01

    RNA-Seq has become a routine strategy for genome-wide gene expression comparisons in bacteria. Despite lower resolution in transcript border parsing compared with dRNA-Seq, TSS-EMOTE, Cappable-seq, Term-seq, and others, directional RNA-Seq still illustrates its advantages: low cost, quantification and transcript border analysis with a medium resolution (±10-20 nt). To facilitate mining of directional RNA-Seq datasets especially with respect to transcript structure analysis, we developed a tool, TrBorderExt, which can parse transcript start sites and termination sites accurately in bacteria. A detailed protocol is described in this chapter for how to use the software package step by step to identify bacterial transcript borders from raw RNA-Seq data. The package was developed with Perl and R programming languages, and is accessible freely through the website: http://www.szu-bioinf.org/TrBorderExt .

  15. Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

    Science.gov (United States)

    Hocking, Toby Dylan; Goerner-Potvin, Patricia; Morin, Andreanne; Shao, Xiaojian; Pastinen, Tomi; Bourque, Guillaume

    2017-02-15

    Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome. We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms. Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ , R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError. toby.hocking@mail.mcgill.ca or guil.bourque@mcgill.ca. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. SplicingTypesAnno: annotating and quantifying alternative splicing events for RNA-Seq data.

    Science.gov (United States)

    Sun, Xiaoyong; Zuo, Fenghua; Ru, Yuanbin; Guo, Jiqiang; Yan, Xiaoyan; Sablok, Gaurav

    2015-04-01

    Alternative splicing plays a key role in the regulation of the central dogma. Four major types of alternative splicing have been classified as intron retention, exon skipping, alternative 5 splice sites or alternative donor sites, and alternative 3 splice sites or alternative acceptor sites. A few algorithms have been developed to detect splice junctions from RNA-Seq reads. However, there are few tools targeting at the major alternative splicing types at the exon/intron level. This type of analysis may reveal subtle, yet important events of alternative splicing, and thus help gain deeper understanding of the mechanism of alternative splicing. This paper describes a user-friendly R package, extracting, annotating and analyzing alternative splicing types for sequence alignment files from RNA-Seq. SplicingTypesAnno can: (1) provide annotation for major alternative splicing at exon/intron level. By comparing the annotation from GTF/GFF file, it identifies the novel alternative splicing sites; (2) offer a convenient two-level analysis: genome-scale annotation for users with high performance computing environment, and gene-scale annotation for users with personal computers; (3) generate a user-friendly web report and additional BED files for IGV visualization. SplicingTypesAnno is a user-friendly R package for extracting, annotating and analyzing alternative splicing types at exon/intron level for sequence alignment files from RNA-Seq. It is publically available at https://sourceforge.net/projects/splicingtypes/files/ or http://genome.sdau.edu.cn/research/software/SplicingTypesAnno.html. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  17. It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

    Science.gov (United States)

    Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K

    2016-01-01

    RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

  18. Illumina MiSeq sequencing analysis of fungal diversity in stored dates.

    Science.gov (United States)

    Al-Bulushi, Ismail M; Bani-Uraba, Muna S; Guizani, Nejib S; Al-Khusaibi, Mohammed K; Al-Sadi, Abdullah M

    2017-03-27

    Date palm has been a major fruit tree in the Middle East over thousands of years, especially in the Arabian Peninsula. Dates are consumed fresh (Rutab) or after partial drying and storage (Tamar) during off-season. The aim of the study was to provide in-depth analysis of fungal communities associated with the skin (outer part) and mesocarp (inner fleshy part) of stored dates (Tamar) of two cultivars (Khenizi and Burny) through the use of Illumina MiSeq sequencing. The study revealed the dominance of Ascomycota (94%) in both cultivars, followed by Chytridiomycota (4%) and Zygomycota (2%). Among the classes recovered, Eurotiomycetes, Dothideomycetes, Saccharomycetes and Sordariomycetes were the most dominant. A total of 54 fungal species were detected, with species belonging to Penicillium, Alternaria, Cladosporium and Aspergillus comprising more than 60% of the fungal reads. Some potentially mycotoxin-producing fungi were detected in stored dates, including Aspergillus flavus, A. versicolor and Penicillium citrinum, but their relative abundance was very limited (PerMANOVA analysis revealed the presence of insignificant differences in fungal communities between date parts or date cultivars, indicating that fungal species associated with the skin may also be detected in the mesocarp. It also indicates the possible contamination of dates from different cultivars with similar fungal species, even though if they are obtained from different areas. The analysis shows the presence of different fungal species in dates. This appears to be the first study to report 25 new fungal species in Oman and 28 new fungal species from date fruits. The study discusses the sources of fungi on dates and the presence of potentially mycotoxin producing fungi on date skin and mesocarp.

  19. Simultaneous NuSTAR/Chandra Observations of The Bursting Pulsar GRO J1744-28 During Its Third Reactivation

    DEFF Research Database (Denmark)

    Younes, G.; Kouveliotou, C.; Grefenstette, B. W.

    2015-01-01

    We report on a 10 ks simultaneous Chandra/High Energy Transmission Grating (HETG)-Nuclear Spectroscopic Telescope Array (NuSTAR) observation of the Bursting Pulsar, GRO J1744-28, during its third detected outburst since discovery and after nearly 18 yr of quiescence. The source is detected up to 60...

  20. BrAD-seq: Breath Adapter Directional sequencing: a streamlined, ultra-simple and fast library preparation protocol for strand specific mRNA library construction.

    Directory of Open Access Journals (Sweden)

    Brad Thomas Townsley

    2015-05-01

    Full Text Available Next Generation Sequencing (NGS is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing inherent properties of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE libraries and can easily extend to full transcript coverage shotgun (SHO type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.

  1. A tale of two sequences: microRNA-target chimeric reads.

    Science.gov (United States)

    Broughton, James P; Pasquinelli, Amy E

    2016-04-04

    In animals, a functional interaction between a microRNA (miRNA) and its target RNA requires only partial base pairing. The limited number of base pair interactions required for miRNA targeting provides miRNAs with broad regulatory potential and also makes target prediction challenging. Computational approaches to target prediction have focused on identifying miRNA target sites based on known sequence features that are important for canonical targeting and may miss non-canonical targets. Current state-of-the-art experimental approaches, such as CLIP-seq (cross-linking immunoprecipitation with sequencing), PAR-CLIP (photoactivatable-ribonucleoside-enhanced CLIP), and iCLIP (individual-nucleotide resolution CLIP), require inference of which miRNA is bound at each site. Recently, the development of methods to ligate miRNAs to their target RNAs during the preparation of sequencing libraries has provided a new tool for the identification of miRNA target sites. The chimeric, or hybrid, miRNA-target reads that are produced by these methods unambiguously identify the miRNA bound at a specific target site. The information provided by these chimeric reads has revealed extensive non-canonical interactions between miRNAs and their target mRNAs, and identified many novel interactions between miRNAs and noncoding RNAs.

  2. Characterization and Improvement of RNA-Seq Precision in Quantitative Transcript Expression Profiling

    Energy Technology Data Exchange (ETDEWEB)

    Labaj, Pawel P.; Leparc, German G.; Linggi, Bryan E.; Markillie, Lye Meng; Wiley, H. S.; Kreil, David P.

    2011-07-01

    Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large scale RNA-Seq data sets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target coverage and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive target coverage of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, less than 30% of all transcripts could be quantified reliably with a relative error < 20%. Based on established tools, we then introduce a new approach for mapping and analyzing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

  3. The accretion powered spin-up of GRO J1750-27

    DEFF Research Database (Denmark)

    Shaw, S.E.; Hill, A.B.; Kuulkers, E.

    2009-01-01

    The timing properties of the 4.45 s pulsar in the Be X-ray binary system GRO J1750-27 are examined using hard X-ray data from INTEGRAL and Swift during a type II outburst observed during 2008. The orbital parameters of the system are measured and agree well with those found during the last known...... outburst of the system in 1995. Correcting the effects of the Doppler shifting of the period, due to the orbital motion of the pulsar, leads to the detection of an intrinsic spin-up that is well described by a simple model including. P and P terms of - 7.5 x 10(-10) s s(-1) and 1 x 10(-16) s s(-2...

  4. From root to fruit: RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis may affect tomato fruit metabolism.

    Science.gov (United States)

    Zouari, Inès; Salvioli, Alessandra; Chialva, Matteo; Novero, Mara; Miozzi, Laura; Tenore, Gian Carlo; Bagnaresi, Paolo; Bonfante, Paola

    2014-03-21

    Tomato (Solanum lycopersicum) establishes a beneficial symbiosis with arbuscular mycorrhizal (AM) fungi. The formation of the mycorrhizal association in the roots leads to plant-wide modulation of gene expression. To understand the systemic effect of the fungal symbiosis on the tomato fruit, we used RNA-Seq to perform global transcriptome profiling on Moneymaker tomato fruits at the turning ripening stage. Fruits were collected at 55 days after flowering, from plants colonized with Funneliformis mosseae and from control plants, which were fertilized to avoid responses related to nutrient deficiency. Transcriptome analysis identified 712 genes that are differentially expressed in fruits from mycorrhizal and control plants. Gene Ontology (GO) enrichment analysis of these genes showed 81 overrepresented functional GO classes. Up-regulated GO classes include photosynthesis, stress response, transport, amino acid synthesis and carbohydrate metabolism functions, suggesting a general impact of fungal symbiosis on primary metabolisms and, particularly, on mineral nutrition. Down-regulated GO classes include cell wall, metabolism and ethylene response pathways. Quantitative RT-PCR validated the RNA-Seq results for 12 genes out of 14 when tested at three fruit ripening stages, mature green, breaker and turning. Quantification of fruit nutraceutical and mineral contents produced values consistent with the expression changes observed by RNA-Seq analysis. This RNA-Seq profiling produced a novel data set that explores the intersection of mycorrhization and fruit development. We found that the fruits of mycorrhizal plants show two transcriptomic "signatures": genes characteristic of a climacteric fleshy fruit, and genes characteristic of mycorrhizal status, like phosphate and sulphate transporters. Moreover, mycorrhizal plants under low nutrient conditions produce fruits with a nutrient content similar to those from non-mycorrhizal plants under high nutrient conditions

  5. Resistance mechanisms to erlotinib in the non-small cell lung cancer cell line, HCC827 examined by RNA-seq

    DEFF Research Database (Denmark)

    Jacobsen, Kirstine; Alcaraz, Nicolas; Ditzel, Henrik

    (Illumina) prior to sequencing on an Illumina HiSeq platform (100bp paired end). The resistant subclones were examined both in presence and absence of erlotinib. The data was analyzed by an in-house developed pipeline including quality control by Trim Galore v0.3.3, mapping of reads to HG19 by TopHat2 v.2......Background: Erlotinib, an EGFR selective reversible inhibitor, has dramatically changed the treatment of non-small cell lung cancer (NSCLC) as approximately 70% of patients show significant tumor regression upon treatment. However, all patients eventually relapse due to development of acquired...... - in erlotinib-resistant subclones of the NSCLC cell line HCC827. Materials & Methods: We established 3 erlotinib-resistant subclones (resistant to 10, 20, 30 µM erlotinib, respectively), and prepared cDNA libraries of purified RNA from biological duplicates using TruSeq® Stranded Total RNA Ribo-Zero™ Gold...

  6. RNA-Seq Reveals Infection-Related Gene Expression Changes in Phytophthora capsici

    Science.gov (United States)

    Chen, Xiao-Ren; Xing, Yu-Ping; Li, Yan-Peng; Tong, Yun-Hui; Xu, Jing-You

    2013-01-01

    Phytophthora capsici is a soilborne plant pathogen capable of infecting a wide range of plants, including many solanaceous crops. However, genetic resistance and fungicides often fail to manage P. capsici due to limited knowledge on the molecular biology and basis of P. capsici pathogenicity. To begin to rectify this situation, Illumina RNA-Seq was used to perform massively parallel sequencing of three cDNA samples derived from P. capsici mycelia (MY), zoospores (ZO) and germinating cysts with germ tubes (GC). Over 11 million reads were generated for each cDNA library analyzed. After read mapping to the gene models of P. capsici reference genome, 13,901, 14,633 and 14,695 putative genes were identified from the reads of the MY, ZO and GC libraries, respectively. Comparative analysis between two of samples showed major differences between the expressed gene content of MY, ZO and GC stages. A large number of genes associated with specific stages and pathogenicity were identified, including 98 predicted effector genes. The transcriptional levels of 19 effector genes during the developmental and host infection stages of P. capsici were validated by RT-PCR. Ectopic expression in Nicotiana benthamiana showed that P. capsici RXLR and Crinkler effectors can suppress host cell death triggered by diverse elicitors including P. capsici elicitin and NLP effectors. This study provides a first look at the transcriptome and effector arsenal of P. capsici during the important pre-infection stages. PMID:24019970

  7. QuASAR: quantitative allele-specific analysis of reads.

    Science.gov (United States)

    Harvey, Chris T; Moyerbrailean, Gregory A; Davis, Gordon O; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

    2015-04-15

    Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available. http://github.com/piquelab/QuASAR. fluca@wayne.edu or rpique@wayne.edu Supplementary Material is available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. RNA-Seq as an Emerging Tool for Marine Dinoflagellate Transcriptome Analysis: Process and Challenges

    Directory of Open Access Journals (Sweden)

    Muhamad Afiq Akbar

    2018-01-01

    Full Text Available Dinoflagellates are the large group of marine phytoplankton with primary studies interest regarding their symbiosis with coral reef and the abilities to form harmful algae blooms (HABs. Toxin produced by dinoflagellates during events of HABs cause severe negative impact both in the economy and health sector. However, attempts to understand the dinoflagellates genomic features are hindered by their complex genome organization. Transcriptomics have been employed to understand dinoflagellates genome structure, profile genes and gene expression. RNA-seq is one of the latest methods for transcriptomics study. This method is capable of profiling the dinoflagellates transcriptomes and has several advantages, including highly sensitive, cost effective and deeper sequence coverage. Thus, in this review paper, the current workflow of dinoflagellates RNA-seq starts with the extraction of high quality RNA and is followed by cDNA sequencing using the next-generation sequencing platform, dinoflagellates transcriptome assembly and computational analysis will be discussed. Certain consideration needs will be highlighted such as difficulty in dinoflagellates sequence annotation, post-transcriptional activity and the effect of RNA pooling when using RNA-seq.

  9. Annotating and quantifying pri-miRNA transcripts using RNA-Seq data of wild type and serrate-1 globular stage embryos of Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Daniel Lepe-Soltero

    2017-12-01

    Full Text Available The genome annotation for the model plant Arabidopsis thaliana does not include the primary transcripts from which MIRNAs are processed. Here we present and analyze the raw mRNA sequencing data from wild type and serrate-1 globular stage embryos of A. thaliana, ecotype Columbia. Because SERRATE is required for pri-miRNA processing, these precursors accumulate in serrate-1 mutants, facilitating their detection using standard RNA-Seq protocols. We first use the mapping of the RNA-Seq reads to the reference genome to annotate the potential primary transcripts of MIRNAs expressed in the embryo. We then quantify these pri-miRNAs in wild type and serrate-1 mutants. Finally, we use differential expression analysis to determine which are up-regulated in serrate-1 compared to wild type, to select the best candidates for bona fide pri-miRNAs expressed in the globular stage embryos. In addition, we analyze a previously published RNA-Seq dataset of wild type and dicer-like 1 mutant embryos at the globular stage [1]. Our data are interpreted and discussed in a separate article [2].

  10. Annotating and quantifying pri-miRNA transcripts using RNA-Seq data of wild type and serrate-1 globular stage embryos of Arabidopsis thaliana.

    Science.gov (United States)

    Lepe-Soltero, Daniel; Armenta-Medina, Alma; Xiang, Daoquan; Datla, Raju; Gillmor, C Stewart; Abreu-Goodger, Cei

    2017-12-01

    The genome annotation for the model plant Arabidopsis thaliana does not include the primary transcripts from which MIRNAs are processed. Here we present and analyze the raw mRNA sequencing data from wild type and serrate-1 globular stage embryos of A. thaliana , ecotype Columbia. Because SERRATE is required for pri-miRNA processing, these precursors accumulate in serrate-1 mutants, facilitating their detection using standard RNA-Seq protocols. We first use the mapping of the RNA-Seq reads to the reference genome to annotate the potential primary transcripts of MIRNAs expressed in the embryo. We then quantify these pri-miRNAs in wild type and serrate-1 mutants. Finally, we use differential expression analysis to determine which are up-regulated in serrate-1 compared to wild type, to select the best candidates for bona fide pri-miRNAs expressed in the globular stage embryos. In addition, we analyze a previously published RNA-Seq dataset of wild type and dicer-like 1 mutant embryos at the globular stage [1]. Our data are interpreted and discussed in a separate article [2].

  11. An RNA-Seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants.

    Science.gov (United States)

    O'Rourke, Jamie A; Yang, S Samuel; Miller, Susan S; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W; Vance, Carroll P

    2013-02-01

    Phosphorus, in its orthophosphate form (P(i)), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to P(i) deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in P(i)-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to P(i) supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to P(i) deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to P(i) deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the P(i) status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in P(i) deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to P(i) deficiency.

  12. HMCan: A method for detecting chromatin modifications in cancer samples using ChIP-seq data

    KAUST Repository

    Ashoor, Haitham

    2013-09-09

    Motivation: Cancer cells are often characterized by epigenetic changes, which include aberrant histone modifications. In particular, local or regional epigenetic silencing is a common mechanism in cancer for silencing expression of tumor suppressor genes. Though several tools have been created to enable detection of histone marks in ChIP-seq data from normal samples, it is unclear whether these tools can be efficiently applied to ChIP-seq data generated from cancer samples. Indeed, cancer genomes are often characterized by frequent copy number alterations: gains and losses of large regions of chromosomal material. Copy number alterations may create a substantial statistical bias in the evaluation of histone mark signal enrichment and result in underdetection of the signal in the regions of loss and overdetection of the signal in the regions of gain. Results: We present HMCan (Histone modifications in cancer), a tool specially designed to analyze histone modification ChIP-seq data produced from cancer genomes. HMCan corrects for the GC-content and copy number bias and then applies Hidden Markov Models to detect the signal from the corrected data. On simulated data, HMCan outperformed several commonly used tools developed to analyze histone modification data produced from genomes without copy number alterations. HMCan also showed superior results on a ChIP-seq dataset generated for the repressive histone mark H3K27me3 in a bladder cancer cell line. HMCan predictions matched well with experimental data (qPCR validated regions) and included, for example, the previously detected H3K27me3 mark in the promoter of the DLEC1 gene, missed by other tools we tested. The Author 2013. Published by Oxford University Press. All rights reserved.

  13. Getting the most out of RNA-seq data analysis

    Directory of Open Access Journals (Sweden)

    Tsung Fei Khang

    2015-10-01

    Full Text Available Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists.Results. Using two large public RNA-seq data sets—one representing strong, and another mild, biological effect size—we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV, such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods

  14. Co-administration of rIpaB domain of Shigella with rGroEL of S. Typhi enhances the immune responses and protective efficacy against Shigella infection.

    Science.gov (United States)

    Chitradevi, Sekar Tamil Selvi; Kaur, Gurpreet; Uppalapati, Sivaramakrishna; Yadav, Anandprakash; Singh, Dependrapratap; Bansal, Anju

    2015-11-01

    Shigella species cause severe bacillary dysentery in humans and are associated with high morbidity and mortality. The Invasion plasmid antigen (IpaB) protein, which is conserved across all Shigella spp., induces macrophage cell death and is required to invade host cells. The present study evaluates the immunogenicity and protective efficacy of the recombinant (r) domain region of IpaB (rIpaB) of S. flexneri. rIpaB was administered either alone or was co-administered with the rGroEL (heat shock protein 60) protein from S. Typhi as an adjuvant in a mouse model of intranasal immunization. The IpaB domain region (37 kDa) of S. flexneri was amplified from an invasion plasmid, cloned, expressed in BL21 Escherichia coli cells and purified. Immunization with the rIpaB domain alone stimulated both humoral and cell-mediated immune responses. Furthermore, robust antibody (IgG, IgA) and T-cell responses were induced when the rIpaB domain was co-administered with rGroEL. Antibody isotyping revealed higher IgG1 and IgG2a antibody titers and increased interferon-gamma (IFN-γ) secretion in the co-administered group. Immunization of mice with the rIpaB domain alone protected 60%-70% of the mice from lethal infection by S. flexneri, S. boydii and S. sonnei, whereas co-administration with rGroEL increased the protective efficacy to 80%-85%. Organ burden and histopathological studies also revealed a significant reduction in lung infection in the co-immunized mice compared with mice immunized with the rIpaB domain alone. This study emphasizes that the co-administration of the rIpaB domain and rGroEL protein improves immune responses in mice and increases protective efficacy against Shigella infection. This is also the first report to evaluate the potential of the GroEL (Hsp 60) protein of S. Typhi as an adjuvant molecule, thereby overcoming the need for commercial adjuvants.

  15. NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

    Science.gov (United States)

    Dong, Kai; Zhao, Hongyu; Tong, Tiejun; Wan, Xiang

    2016-09-13

    RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data

  16. RNA-Seq transcriptomics and pathway analyses reveal potential regulatory genes and molecular mechanisms in high- and low-residual feed intake in Nordic dairy cattle

    DEFF Research Database (Denmark)

    Salleh, M. S.; Mazzoni, G.; Höglund, J. K.

    2017-01-01

    -throughput RNA sequencing data of liver biopsies from 19 dairy cows were used to identify differentially expressed genes (DEGs) between high- and low-FE groups of cows (based on Residual Feed Intake or RFI). Subsequently, a profile of the pathways connecting the DEGs to FE was generated, and a list of candidate...... genes and biomarkers was derived for their potential inclusion in breeding programmes to improve FE. The bovine RNA-Seq gene expression data from the liver was analysed to identify DEGs and, subsequently, identify the molecular mechanisms, pathways and possible candidate biomarkers of feed efficiency....... On average, 57 million reads (short reads or short mRNA sequences ...

  17. RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.

    Science.gov (United States)

    Merrick, B Alex; Phadke, Dhiral P; Auerbach, Scott S; Mav, Deepak; Stiegelmeyer, Suzy M; Shah, Ruchir R; Tice, Raymond R

    2013-01-01

    Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the

  18. Suppression of the Escherichia coli ssb-1 mutation by an allele of groEL.

    OpenAIRE

    Ruben, S M; VanDenBrink-Webb, S E; Rein, D C; Meyer, R R

    1988-01-01

    A series of spontaneous suppressors to the temperature-sensitive phenotype of the single-stranded DNA-binding protein mutation ssb-1 were isolated. A genomic library of EcoRI fragments from one of these suppressor strains was prepared by using pBR325 as the cloning vector. A 10.0-kilobase class of inserts was identified as carrying the ssb-1 gene itself. A second class of 8.3-kilobase inserts was shown to contain the groE region by (i) restriction analysis, (ii) Southern hybridization of the ...

  19. Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.

    Directory of Open Access Journals (Sweden)

    Nicholas J Schurch

    Full Text Available The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1 gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb; (2 disentangling of gene expression in complex regions; (3 clearer interpretation of small RNA expression and (4 identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.

  20. AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data

    KAUST Repository

    Zhang, Runxuan

    2016-05-06

    Background Alternative splicing is the major post-transcriptional mechanism by which gene expression is regulated and affects a wide range of processes and responses in most eukaryotic organisms. RNA-sequencing (RNA-seq) can generate genome-wide quantification of individual transcript isoforms to identify changes in expression and alternative splicing. RNA-seq is an essential modern tool but its ability to accurately quantify transcript isoforms depends on the diversity, completeness and quality of the transcript information. Results We have developed a new Reference Transcript Dataset for Arabidopsis (AtRTD2) for RNA-seq analysis containing over 82k non-redundant transcripts, whereby 74,194 transcripts originate from 27,667 protein-coding genes. A total of 13,524 protein-coding genes have at least one alternatively spliced transcript in AtRTD2 such that about 60% of the 22,453 protein-coding, intron-containing genes in Arabidopsis undergo alternative splicing. More than 600 putative U12 introns were identified in more than 2,000 transcripts. AtRTD2 was generated from transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries and merged along with the previous version, AtRTD, and Araport11 transcript assemblies. AtRTD2 increases the diversity of transcripts and through application of stringent filters represents the most extensive and accurate transcript collection for Arabidopsis to date. We have demonstrated a generally good correlation of alternative splicing ratios from RNA-seq data analysed by Salmon and experimental data from high resolution RT-PCR. However, we have observed inaccurate quantification of transcript isoforms for genes with multiple transcripts which have variation in the lengths of their UTRs. This variation is not effectively corrected in RNA-seq analysis programmes and will therefore impact RNA-seq analyses generally. To address this, we have tested different genome

  1. Der Ausbau des Hochrheins zur Schifffahrtsstraße - Die Geschichte eines gescheiterten Großprojekts

    OpenAIRE

    Steiner, Rudolf

    2006-01-01

    Die vorliegende Arbeit befasst sich mit Genese, Verfall und Scheitern eines der ehrgeizigsten technischen Großprojekte des 20. Jahrhunderts im süddeutschen Raum – dem Ausbau des Rheinabschnitts zwischen Basel und dem Bodensee zu einer schiffbaren Wasserstraße. Hatte es bereits seit dem Mittelalter mehrere erfolglose Versuche gegeben, die zahlreichen natürlichen Hindernisse auf dem Hochrhein, wie die genannte Strecke bezeichnet wird, zum Zwecke einer durchgängigen Schifffahrt bis zum Bodensee ...

  2. Synergistic Roles of Helicobacter pylori Methionine Sulfoxide Reductase and GroEL in Repairing Oxidant-damaged Catalase*

    Science.gov (United States)

    Mahawar, Manish; Tran, ViLinh; Sharp, Joshua S.; Maier, Robert J.

    2011-01-01

    Hypochlorous acid (HOCl) produced via the enzyme myeloperoxidase is a major antibacterial oxidant produced by neutrophils, and Met residues are considered primary amino acid targets of HOCl damage via conversion to Met sulfoxide. Met sulfoxide can be repaired back to Met by methionine sulfoxide reductase (Msr). Catalase is an important antioxidant enzyme; we show it constitutes 4–5% of the total Helicobacter pylori protein levels. msr and katA strains were about 14- and 4-fold, respectively, more susceptible than the parent to killing by the neutrophil cell line HL-60 cells. Catalase activity of an msr strain was much more reduced by HOCl exposure than for the parental strain. Treatment of pure catalase with HOCl caused oxidation of specific MS-identified Met residues, as well as structural changes and activity loss depending on the oxidant dose. Treatment of catalase with HOCl at a level to limit structural perturbation (at a catalase/HOCl molar ratio of 1:60) resulted in oxidation of six identified Met residues. Msr repaired these residues in an in vitro reconstituted system, but no enzyme activity could be recovered. However, addition of GroEL to the Msr repair mixture significantly enhanced catalase activity recovery. Neutrophils produce large amounts of HOCl at inflammation sites, and bacterial catalase may be a prime target of the host inflammatory response; at high concentrations of HOCl (1:100), we observed loss of catalase secondary structure, oligomerization, and carbonylation. The same HOCl-sensitive Met residue oxidation targets in catalase were detected using chloramine-T as a milder oxidant. PMID:21460217

  3. An RNA-Seq Transcriptome Analysis of Orthophosphate-Deficient White Lupin Reveals Novel Insights into Phosphorus Acclimation in Plants1[W][OA

    Science.gov (United States)

    O’Rourke, Jamie A.; Yang, S. Samuel; Miller, Susan S.; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W.; Vance, Carroll P.

    2013-01-01

    Phosphorus, in its orthophosphate form (Pi), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to Pi deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in Pi-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to Pi supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to Pi deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to Pi deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the Pi status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in Pi deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to Pi deficiency. PMID:23197803

  4. Electron microscopy and image analysis of the GroEL-like protein and its complexes with glutamine synthetase from pea leaves

    NARCIS (Netherlands)

    Tsuprun, Vladimir L.; Boekema, Egbert J.; Pushkin, Alexander V.; Tagunova, Irina V.

    1992-01-01

    The molecular structure of groEL-like protein from pea leaves has been studied by electron microscopy and image analysis of negatively stained particles. Over 1500 molecular projections were selected and classified by multivariate statistical analysis. It was shown that the molecule consists of 14

  5. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

    Science.gov (United States)

    Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

    2015-06-25

    Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

  6. ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.

    Directory of Open Access Journals (Sweden)

    Brett A McKinney

    Full Text Available Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k for each gene to optimize the Relief-F test statistics (importance scores for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to

  7. Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.

    Science.gov (United States)

    Liu, Ruolin; Dickerson, Julie

    2017-11-01

    We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.

  8. The morphogenesis of herpes simplex virus type 1 in infected parental mouse L fibroblasts and mutant gro29 cells

    DEFF Research Database (Denmark)

    Jensen, Helle Lone; Norrild, Bodil

    2003-01-01

    Mutants of cell lines and viruses are important biological tools. The pathway of herpesvirus particle maturation and egress are contentious issues. The mutant gro29 line of mouse L cells is defective for egress of herpes simplex virus type 1 (HSV-1) virions, and a candidate for studies of virus...

  9. Computational Methods for ChIP-seq Data Analysis and Applications

    KAUST Repository

    Ashoor, Haitham

    2017-04-25

    The development of Chromatin immunoprecipitation followed by sequencing (ChIP-seq) technology has enabled the construction of genome-wide maps of protein-DNA interaction. Such maps provide information about transcriptional regulation at the epigenetic level (histone modifications and histone variants) and at the level of transcription factor (TF) activity. This dissertation presents novel computational methods for ChIP-seq data analysis and applications. The work of this dissertation addresses four main challenges. First, I address the problem of detecting histone modifications from ChIP-seq cancer samples. The presence of copy number variations (CNVs) in cancer samples results in statistical biases that lead to inaccurate predictions when standard methods are used. To overcome this issue I developed HMCan, a specially designed algorithm to handle ChIP-seq cancer data by accounting for the presence of CNVs. When using ChIP-seq data from cancer cells, HMCan demonstrates unbiased and accurate predictions compared to the standard state of the art methods. Second, I address the problem of identifying changes in histone modifications between two ChIP-seq samples with different genetic backgrounds (for example cancer vs. normal). In addition to CNVs, different antibody efficiency between samples and presence of samples replicates are challenges for this problem. To overcome these issues, I developed the HMCan-diff algorithm as an extension to HMCan. HMCan-diff implements robust normalization methods to address the challenges listed above. HMCan-diff significantly outperforms another state of the art methods on data containing cancer samples. Third, I investigate and analyze predictions of different methods for enhancer prediction based on ChIP-seq data. The analysis shows that predictions generated by different methods are poorly overlapping. To overcome this issue, I developed DENdb, a database that integrates enhancer predictions from different methods. DENdb also

  10. TRANSIT--A Software Tool for Himar1 TnSeq Analysis.

    Directory of Open Access Journals (Sweden)

    Michael A DeJesus

    2015-10-01

    Full Text Available TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of M. tuberculosis grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: https://github.com/mad-lab/transit.

  11. Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform.

    Science.gov (United States)

    Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan

    2013-11-01

    Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  12. An Integrated Approach for RNA-seq Data Normalization.

    Science.gov (United States)

    Yang, Shengping; Mercante, Donald E; Zhang, Kun; Fang, Zhide

    2016-01-01

    DNA copy number alteration is common in many cancers. Studies have shown that insertion or deletion of DNA sequences can directly alter gene expression, and significant correlation exists between DNA copy number and gene expression. Data normalization is a critical step in the analysis of gene expression generated by RNA-seq technology. Successful normalization reduces/removes unwanted nonbiological variations in the data, while keeping meaningful information intact. However, as far as we know, no attempt has been made to adjust for the variation due to DNA copy number changes in RNA-seq data normalization. In this article, we propose an integrated approach for RNA-seq data normalization. Comparisons show that the proposed normalization can improve power for downstream differentially expressed gene detection and generate more biologically meaningful results in gene profiling. In addition, our findings show that due to the effects of copy number changes, some housekeeping genes are not always suitable internal controls for studying gene expression. Using information from DNA copy number, integrated approach is successful in reducing noises due to both biological and nonbiological causes in RNA-seq data, thus increasing the accuracy of gene profiling.

  13. A combination of luxR1 and luxR2 genes activates Pr-promoters of psychrophilic Aliivibrio logei lux-operon independently of chaperonin GroEL/ES and protease Lon at high concentrations of autoinducer.

    Science.gov (United States)

    Konopleva, Maria N; Khrulnova, Svetlana A; Baranova, Ancha; Ekimov, Leonid V; Bazhenov, Sergey V; Goryanin, Ignatiy I; Manukhov, Ilya V

    2016-05-13

    Lux-operon of psychrophilic bacteria Aliivibrio logei contains two copies of luxR and is regulated by Type I quorum sensing (QS). Activation of lux-operon of psychrophilic bacteria A. logei by LuxR1 requires about 100 times higher concentrations of autoinducer (AI) than the activation by LuxR2. On the other hand, LuxR1 does not require GroEL/ES chaperonin for its folding and cannot be degraded by protease Lon, while LuxR2 sensitive to Lon and requires GroEL/ES. Here we show that at 10(-5) - 10(-4)М concentrations of AI a combination of luxR1 and luxR2 products is capable of activating the Pr-promoters of A. logei lux-operon in Escherichia coli independently of GroEL/ES and protease Lon. The presence of LuxR1 assists LuxR2 in gro(-) cells when AI was added at high concentration, while at low concentration of AI in a cell LuxR1 decreases the LuxR2 activity. These observations may be explained by the formation of LuxR1/LuxR2 heterodimers that act in complex with AI independently from GroEL/ES and protease Lon. This study expands current understanding of QS regulation in A. logei as it implies cooperative regulation of lux-operon by LuxR1 and LuxR2 proteins. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Low-coverage MiSeq next generation sequencing reveals the mitochondrial genome of the Eastern Rock Lobster, Sagmariasus verreauxi.

    Science.gov (United States)

    Doyle, Stephen R; Griffith, Ian S; Murphy, Nick P; Strugnell, Jan M

    2015-01-01

    The complete mitochondrial genome of the Eastern Rock lobster, Sagmariasus verreauxi, is reported for the first time. Using low-coverage, long read MiSeq next generation sequencing, we constructed and determined the mtDNA genome organization of the 15,470 bp sequence from two isolates from Eastern Tasmania, Australia and Northern New Zealand, and identified 46 polymorphic nucleotides between the two sequences. This genome sequence and its genetic polymorphisms will likely be useful in understanding the distribution and population connectivity of the Eastern Rock Lobster, and in the fisheries management of this commercially important species.

  15. Co-overexpression of bacterial GroESL chaperonins partly overcomes non-productive folding and tetramer assembly of E. coli-expressed human medium-chain acyl-CoA dehydrogenase (MCAD) carrying the prevalent disease-causing K304E mutation

    DEFF Research Database (Denmark)

    Bross, P; Andresen, B S; Winter, V

    1993-01-01

    , tetramer formation and yield of enzyme activity of wild-type MCAD is largely independent of GroESL co-overexpression; (ii) the larger part of the K304Q mutant is insoluble without and solubility is enhanced with GroESL co-overexpression; solubility correlates with the amount of tetramer detected...... and the enzyme activity measured as observed for the wild-type protein. (iii) Solubility of the K304E mutant is in a similar fashion GroESL responsive as the K304Q mutant, but the amount of tetramer observed and the enzyme activity measured do not correlate with the amount of soluble K304E MCAD protein detected...

  16. Assessing Reliability and Validity of the "GroPromo" Audit Tool for Evaluation of Grocery Store Marketing and Promotional Environments

    Science.gov (United States)

    Kerr, Jacqueline; Sallis, James F.; Bromby, Erica; Glanz, Karen

    2012-01-01

    Objective: To evaluate reliability and validity of a new tool for assessing the placement and promotional environment in grocery stores. Methods: Trained observers used the "GroPromo" instrument in 40 stores to code the placement of 7 products in 9 locations within a store, along with other promotional characteristics. To test construct validity,…

  17. Selecting chemical and ecotoxicological test batteries for risk assessment of trace element-contaminated soils (phyto)managed by gentle remediation options (GRO).

    Science.gov (United States)

    Kumpiene, Jurate; Bert, Valérie; Dimitriou, Ioannis; Eriksson, Jan; Friesl-Hanl, Wolfgang; Galazka, Rafal; Herzig, Rolf; Janssen, Jolien; Kidd, Petra; Mench, Michel; Müller, Ingo; Neu, Silke; Oustriere, Nadège; Puschenreiter, Markus; Renella, Giancarlo; Roumier, Pierre-Hervé; Siebielec, Grzegorz; Vangronsveld, Jaco; Manier, Nicolas

    2014-10-15

    During the past decades a number of field trials with gentle remediation options (GRO) have been established on trace element (TE) contaminated sites throughout Europe. Each research group selects different methods to assess the remediation success making it difficult to compare efficacy between various sites and treatments. This study aimed at selecting a minimum risk assessment battery combining chemical and ecotoxicological assays for assessing and comparing the effectiveness of GRO implemented in seven European case studies. Two test batteries were pre-selected; a chemical one for quantifying TE exposure in untreated soils and GRO-managed soils and a biological one for characterizing soil functionality and ecotoxicity. Soil samples from field studies representing one of the main GROs (phytoextraction in Belgium, Sweden, Germany and Switzerland, aided phytoextraction in France, and aided phytostabilization or in situ stabilization/phytoexclusion in Poland, France and Austria) were collected and assessed using the selected test batteries. The best correlations were obtained between NH4NO3-extractable, followed by NaNO3-extractable TE and the ecotoxicological responses. Biometrical parameters and biomarkers of dwarf beans were the most responsive indicators for the soil treatments and changes in soil TE exposures. Plant growth was inhibited at the higher extractable TE concentrations, while plant stress enzyme activities increased with the higher TE extractability. Based on these results, a minimum risk assessment battery to compare/biomonitor the sites phytomanaged by GROs might consist of the NH4NO3 extraction and the bean Plantox test including the stress enzyme activities. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Single-tube linear DNA amplification (LinDA) for robust ChIP-seq

    NARCIS (Netherlands)

    Shankaranarayanan, P.; Mendoza-Parra, M.A.; Walia, M.; Wang, L.; Li, N.; Trindade, L.M.; Gronemeyer, H.

    2011-01-01

    Genome-wide profiling of transcription factors based on massive parallel sequencing of immunoprecipitated chromatin (ChIP-seq) requires nanogram amounts of DNA. Here we describe a high-fidelity, single-tube linear DNA amplification method (LinDA) for ChIP-seq and reChIP-seq with picogram DNA amounts

  19. In Silico Pooling of ChIP-seq Control Experiments

    Science.gov (United States)

    Sun, Guannan; Srinivasan, Rajini; Lopez-Anido, Camila; Hung, Holly A.; Svaren, John; Keleş, Sündüz

    2014-01-01

    As next generation sequencing technologies are becoming more economical, large-scale ChIP-seq studies are enabling the investigation of the roles of transcription factor binding and epigenome on phenotypic variation. Studying such variation requires individual level ChIP-seq experiments. Standard designs for ChIP-seq experiments employ a paired control per ChIP-seq sample. Genomic coverage for control experiments is often sacrificed to increase the resources for ChIP samples. However, the quality of ChIP-enriched regions identifiable from a ChIP-seq experiment depends on the quality and the coverage of the control experiments. Insufficient coverage leads to loss of power in detecting enrichment. We investigate the effect of in silico pooling of control samples within multiple biological replicates, multiple treatment conditions, and multiple cell lines and tissues across multiple datasets with varying levels of genomic coverage. Our computational studies suggest guidelines for performing in silico pooling of control experiments. Using vast amounts of ENCODE data, we show that pairwise correlations between control samples originating from multiple biological replicates, treatments, and cell lines/tissues can be grouped into two classes representing whether or not in silico pooling leads to power gain in detecting enrichment between the ChIP and the control samples. Our findings have important implications for multiplexing samples. PMID:25380244

  20. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.)

    NARCIS (Netherlands)

    Koning, C.F.S.; Esselink, G.; Vukosavljev, M.; Westende, van 't W.P.C.; Gitonga, V.W.; Krens, F.A.; Voorrips, R.E.; Weg, van de W.E.; Schulz, D.; Debener, T.; Maliepaard, C.A.; Arens, P.F.P.; Smulders, M.J.M.

    2015-01-01

    In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa

  1. Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data.

    Science.gov (United States)

    León-Novelo, Luis; Fuentes, Claudio; Emerson, Sarah

    2017-10-01

    RNA-Seq data characteristically exhibits large variances, which need to be appropriately accounted for in any proposed model. We first explore the effects of this variability on the maximum likelihood estimator (MLE) of the dispersion parameter of the negative binomial distribution, and propose instead to use an estimator obtained via maximization of the marginal likelihood in a conjugate Bayesian framework. We show, via simulation studies, that the marginal MLE can better control this variation and produce a more stable and reliable estimator. We then formulate a conjugate Bayesian hierarchical model, and use this new estimator to propose a Bayesian hypothesis test to detect differentially expressed genes in RNA-Seq data. We use numerical studies to show that our much simpler approach is competitive with other negative binomial based procedures, and we use a real data set to illustrate the implementation and flexibility of the procedure. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Cardinality enhancement utilizing Sequential Algorithm (SeQ code in OCDMA system

    Directory of Open Access Journals (Sweden)

    Fazlina C. A. S.

    2017-01-01

    Full Text Available Optical Code Division Multiple Access (OCDMA has been important with increasing demand for high capacity and speed for communication in optical networks because of OCDMA technique high efficiency that can be achieved, hence fibre bandwidth is fully used. In this paper we will focus on Sequential Algorithm (SeQ code with AND detection technique using Optisystem design tool. The result revealed SeQ code capable to eliminate Multiple Access Interference (MAI and improve Bit Error Rate (BER, Phase Induced Intensity Noise (PIIN and orthogonally between users in the system. From the results, SeQ shows good performance of BER and capable to accommodate 190 numbers of simultaneous users contrast with existing code. Thus, SeQ code have enhanced the system about 36% and 111% of FCC and DCS code. In addition, SeQ have good BER performance 10-25 at 155 Mbps in comparison with 622 Mbps, 1 Gbps and 2 Gbps bit rate. From the plot graph, 155 Mbps bit rate is suitable enough speed for FTTH and LAN networks. Resolution can be made based on the superior performance of SeQ code. Thus, these codes will give an opportunity in OCDMA system for better quality of service in an optical access network for future generation's usage

  3. Cardinality enhancement utilizing Sequential Algorithm (SeQ) code in OCDMA system

    Science.gov (United States)

    Fazlina, C. A. S.; Rashidi, C. B. M.; Rahman, A. K.; Aljunid, S. A.

    2017-11-01

    Optical Code Division Multiple Access (OCDMA) has been important with increasing demand for high capacity and speed for communication in optical networks because of OCDMA technique high efficiency that can be achieved, hence fibre bandwidth is fully used. In this paper we will focus on Sequential Algorithm (SeQ) code with AND detection technique using Optisystem design tool. The result revealed SeQ code capable to eliminate Multiple Access Interference (MAI) and improve Bit Error Rate (BER), Phase Induced Intensity Noise (PIIN) and orthogonally between users in the system. From the results, SeQ shows good performance of BER and capable to accommodate 190 numbers of simultaneous users contrast with existing code. Thus, SeQ code have enhanced the system about 36% and 111% of FCC and DCS code. In addition, SeQ have good BER performance 10-25 at 155 Mbps in comparison with 622 Mbps, 1 Gbps and 2 Gbps bit rate. From the plot graph, 155 Mbps bit rate is suitable enough speed for FTTH and LAN networks. Resolution can be made based on the superior performance of SeQ code. Thus, these codes will give an opportunity in OCDMA system for better quality of service in an optical access network for future generation's usage

  4. Practical guidelines for the comprehensive analysis of ChIP-seq data.

    Directory of Open Access Journals (Sweden)

    Timothy Bailey

    Full Text Available Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.

  5. Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.

    Science.gov (United States)

    Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X

    2017-12-05

    Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.

  6. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu, E-mail: lssqlh@mail.sysu.edu.cn; Yang, Jian-Hua, E-mail: lssqlh@mail.sysu.edu.cn [RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou (China)

    2015-01-14

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  7. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    International Nuclear Information System (INIS)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu; Yang, Jian-Hua

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  8. RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.

    Directory of Open Access Journals (Sweden)

    B Alex Merrick

    Full Text Available Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1, a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL. Paired-end reads were mapped to the rat genome (Rn4 with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005 compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c

  9. The Bursting Pulsar GRO J1744-28: the Slowest Transitional Pulsar?

    Science.gov (United States)

    Court, J. M. C.; Altamirano, D.; Sanna, A.

    2018-04-01

    GRO J1744-28 (the Bursting Pulsar) is a neutron star LMXB which shows highly structured X-ray variability near the end of its X-ray outbursts. In this letter we show that this variability is analogous to that seen in Transitional Millisecond Pulsars such as PSR J1023+0038: `missing link' systems consisting of a pulsar nearing the end of its recycling phase. As such, we show that the Bursting Pulsar may also be associated with this class of objects. We discuss the implications of this scenario; in particular, we discuss the fact that the Bursting Pulsar has a significantly higher spin period and magnetic field than any other known Transitional Pulsar. If the Bursting Pulsar is indeed transitional, then this source opens a new window of oppurtunity to test our understanding of these systems in an entirely unexplored physical regime.

  10. B/Ordering in der Großregion. Mobilitäten – Grenzen – Identitäten

    OpenAIRE

    Wille, Christian

    2014-01-01

    Die Akteure der regionalpolitischen Zusammenarbeit in der Großregion bemühen oft die Vor-stellung einer grenzüberschreitenden Identität, um Kooperationsfortschritte zu bilanzieren. Die (Un-)Möglichkeit einer solchen Identität wird in diesem Beitrag anhand von empirischen Ergebnissen diskutiert. Dafür werden Grenzgänger betrachtet, stehen sie doch besonders im Verdacht eine grenzüberschreitende Identität zu entwickeln. Daneben werden die Bewohner Luxemburgs in den Blick genommen, die aufgrund ...

  11. SeqAn An efficient, generic C++ library for sequence analysis

    Directory of Open Access Journals (Sweden)

    Rausch Tobias

    2008-01-01

    Full Text Available Abstract Background The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.

  12. NGScloud: RNA-seq analysis of non-model species using cloud computing.

    Science.gov (United States)

    Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai

    2018-05-03

    RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.

  13. Identification of genes related to drought in native potatoes using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Roberto Lozano

    2014-03-01

    Full Text Available The recent advent RNA sequencing technology (RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to understand the expression profile of plants in response to biotic and abiotic stress. In this study, the mRNA was sequencing from leaves and roots of two native potato varieties at different levels of drought. Fifty-base-pair reads from whole mRNAs were mapped to the potato genomic sequence: 75 – 82% mapped uniquely to the genome, 6 – 14% mapped to several locations in the genome and 9 – 12% had no match in the genome. Comparing expression profiles, 887 to 1925 genes were found to be induced/repressed by drought in the sensible variety and 998 to 1995 in the tolerant. This research provides valuable information for future studies and deeper understanding of the molecular mechanism of drought resistance in potato and related species.

  14. The Impact of Normalization Methods on RNA-Seq Data Analysis

    Science.gov (United States)

    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

    2015-01-01

    High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

  15. Determination of in vivo RNA kinetics using RATE-seq.

    Science.gov (United States)

    Neymotin, Benjamin; Athanasiadou, Rodoniki; Gresham, David

    2014-10-01

    The abundance of a transcript is determined by its rate of synthesis and its rate of degradation; however, global methods for quantifying RNA abundance cannot distinguish variation in these two processes. Here, we introduce RNA approach to equilibrium sequencing (RATE-seq), which uses in vivo metabolic labeling of RNA and approach to equilibrium kinetics, to determine absolute RNA degradation and synthesis rates. RATE-seq does not disturb cellular physiology, uses straightforward normalization with exogenous spike-ins, and can be readily adapted for studies in most organisms. We demonstrate the use of RATE-seq to estimate genome-wide kinetic parameters for coding and noncoding transcripts in Saccharomyces cerevisiae. © 2014 Neymotin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  16. De novo RNA-Seq based transcriptome analysis of Papiliotrema laurentii strain RY1 under nitrogen starvation.

    Science.gov (United States)

    Sarkar, Soumyadev; Chakravorty, Somnath; Mukherjee, Avishek; Bhattacharya, Debanjana; Bhattacharya, Semantee; Gachhui, Ratan

    2018-03-01

    Nitrogen is a key nutrient for all cell forms. Most organisms respond to nitrogen scarcity by slowing down their growth rate. On the contrary, our previous studies have shown that Papiliotrema laurentii strain RY1 has a robust growth under nitrogen starvation. To understand the global regulation that leads to such an extraordinary response, we undertook a de novo approach for transcriptome analysis of the yeast. Close to 33 million sequence reads of high quality for nitrogen limited and enriched condition were generated using Illumina NextSeq500. Trinity analysis and clustered transcripts annotation of the reads produced 17,611 unigenes, out of which 14,157 could be annotated. Gene Ontology term analysis generated 44.92% cellular component terms, 39.81% molecular function terms and 15.24% biological process terms. The most over represented pathways in general were translation, carbohydrate metabolism, amino acid metabolism, general metabolism, folding, sorting, degradation followed by transport and catabolism, nucleotide metabolism, replication and repair, transcription and lipid metabolism. A total of 4256 Single Sequence Repeats were identified. Differential gene expression analysis detected 996 P-significant transcripts to reveal transmembrane transport, lipid homeostasis, fatty acid catabolism and translation as the enriched terms which could be essential for Papiliotrema laurentii strain RY1 to adapt during nitrogen deprivation. Transcriptome data was validated by quantitative real-time PCR analysis of twelve transcripts. To the best of our knowledge, this is the first report of Papiliotrema laurentii strain RY1 transcriptome which would play a pivotal role in understanding the biochemistry of the yeast under acute nitrogen stress and this study would be encouraging to initiate extensive investigations into this Papiliotrema system. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

    NARCIS (Netherlands)

    Muino, J.M.; Kaufmann, K.; Ham, van R.C.H.J.; Angenent, G.C.; Krajewski, P.

    2011-01-01

    Background In vivo detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally

  18. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

    Science.gov (United States)

    Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

    2016-12-01

    Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

    Directory of Open Access Journals (Sweden)

    Yuttachon Promworn

    Full Text Available Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance.We present Transformation of Nucleotide Enrichment Ratios (ToNER, a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR.ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a

  20. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

    Science.gov (United States)

    Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima

    2017-01-01

    Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and Git

  1. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine

    Directory of Open Access Journals (Sweden)

    Joshua Xu

    2016-03-01

    Full Text Available Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454 were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq

  2. The immobilization of heavy metals in soil by bioaugmentation of a UV-mutant Bacillus subtilis 38 assisted by NovoGro biostimulation and changes of soil microbial community

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Ting [MOE Key Laboratory of Pollution Processes and Environmental Criteria, College of Environmental Science and Engineering, Nankai University, Tianjin 300071 (China); Urban Transport Emission Control Research Centre, College of Environmental Science and Engineering, Nankai University, Tianjin 300071 (China); Sun, Hongwen, E-mail: sunhongwen@nankai.edu.cn [MOE Key Laboratory of Pollution Processes and Environmental Criteria, College of Environmental Science and Engineering, Nankai University, Tianjin 300071 (China); Mao, Hongjun [Urban Transport Emission Control Research Centre, College of Environmental Science and Engineering, Nankai University, Tianjin 300071 (China); Zhang, Yanfeng; Wang, Cuiping; Zhang, Zhiyuan; Wang, Baolin; Sun, Lei [MOE Key Laboratory of Pollution Processes and Environmental Criteria, College of Environmental Science and Engineering, Nankai University, Tianjin 300071 (China)

    2014-08-15

    Highlights: • A UV-mutated species, Bacillus subtilis 38, is a good sorbent for multi-metals (Cd, Cr, Hg and Pb). • B38 mixed with NovoGro exhibited a synergetic effect on the immobilization of heavy metals in soil. • DTPA, M3 and BCR were suitable for predicting metal bioavailability for specific classes of plant. • The NovoGro could enhance the proliferation of both exotic B38 and native microbes. • It's a practical strategy for the remediation of actual farmland polluted by multi-heavy metals. - Abstract: Bacillus subtilis 38 (B38) is a mutant species of Bacillus subtilis acquired by UV irradiation with high cadmium tolerance. This study revealed that B38 was a good biosorbent for the adsorption of multiple heavy metals (cadmium, chromium, mercury, and lead). Simultaneous application of B38 and NovoGro (SNB) exhibited a synergetic effect on the immobilization of heavy metals in soil. The heavy metal concentrations in the edible part of the tested plants (lettuce, radish, and soybean) under SNB treatment decreased by 55.4–97.9% compared to the control. Three single extraction methods, diethylenetriaminepentaacetic acid (DTPA), Mehlich 3 (M3), and the first step of the Community Bureau of Reference method (BCR1), showed good predictive capacities for metal bioavailability to leafy, rhizome, and leguminous plant, respectively. The polymerase chain reaction–denaturing gradient gel electrophoresis (PCR–DGGE) profiles revealed that NovoGro could enhance the proliferation of both exotic B38 and native microbes. Finally, the technology was checked in the field, the reduction in heavy metal concentrations in the edible part of radish was in the range between 30.8% and 96.0% after bioremediation by SNB treatment. This study provides a practical strategy for the remediation of farmland contaminated by multiple heavy metals.

  3. The immobilization of heavy metals in soil by bioaugmentation of a UV-mutant Bacillus subtilis 38 assisted by NovoGro biostimulation and changes of soil microbial community

    International Nuclear Information System (INIS)

    Wang, Ting; Sun, Hongwen; Mao, Hongjun; Zhang, Yanfeng; Wang, Cuiping; Zhang, Zhiyuan; Wang, Baolin; Sun, Lei

    2014-01-01

    Highlights: • A UV-mutated species, Bacillus subtilis 38, is a good sorbent for multi-metals (Cd, Cr, Hg and Pb). • B38 mixed with NovoGro exhibited a synergetic effect on the immobilization of heavy metals in soil. • DTPA, M3 and BCR were suitable for predicting metal bioavailability for specific classes of plant. • The NovoGro could enhance the proliferation of both exotic B38 and native microbes. • It's a practical strategy for the remediation of actual farmland polluted by multi-heavy metals. - Abstract: Bacillus subtilis 38 (B38) is a mutant species of Bacillus subtilis acquired by UV irradiation with high cadmium tolerance. This study revealed that B38 was a good biosorbent for the adsorption of multiple heavy metals (cadmium, chromium, mercury, and lead). Simultaneous application of B38 and NovoGro (SNB) exhibited a synergetic effect on the immobilization of heavy metals in soil. The heavy metal concentrations in the edible part of the tested plants (lettuce, radish, and soybean) under SNB treatment decreased by 55.4–97.9% compared to the control. Three single extraction methods, diethylenetriaminepentaacetic acid (DTPA), Mehlich 3 (M3), and the first step of the Community Bureau of Reference method (BCR1), showed good predictive capacities for metal bioavailability to leafy, rhizome, and leguminous plant, respectively. The polymerase chain reaction–denaturing gradient gel electrophoresis (PCR–DGGE) profiles revealed that NovoGro could enhance the proliferation of both exotic B38 and native microbes. Finally, the technology was checked in the field, the reduction in heavy metal concentrations in the edible part of radish was in the range between 30.8% and 96.0% after bioremediation by SNB treatment. This study provides a practical strategy for the remediation of farmland contaminated by multiple heavy metals

  4. Is this the right normalization? A diagnostic tool for ChIP-seq normalization.

    Science.gov (United States)

    Angelini, Claudia; Heller, Ruth; Volkinshtein, Rita; Yekutieli, Daniel

    2015-05-09

    Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The

  5. Comparison of transcriptomic landscapes of bovine embryos using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Khatib Hasan

    2010-12-01

    Full Text Available Abstract Background Advances in sequencing technologies have opened a new era of high throughput investigations. Although RNA-seq has been demonstrated in many organisms, no study has provided a comprehensive investigation of the bovine transcriptome using RNA-seq. Results In this study, we provide a deep survey of the bovine embryonic transcriptomes, the first application of RNA-seq in cattle. Embryos cultured in vitro were used as models to study early embryonic development in cattle. RNA amplified from limited amounts of starting total RNA were sequenced and mapped to the reference genome to obtain digital gene expression at single base resolution. In particular, gene expression estimates from more than 1.6 million unannotated bases in 1785 novel transcribed units were obtained. We compared the transcriptomes of embryos showing distinct developmental statuses and found genes that showed differential overall expression as well as alternative splicing. Conclusion Our study demonstrates the power of RNA-seq and provides further understanding of bovine preimplantation embryonic development at a fine scale.

  6. As seqüelas psicológicas da tortura

    Directory of Open Access Journals (Sweden)

    Alfredo Guillermo Martín

    Full Text Available Analisam-se, no texto, as seqüelas psicológicas da tortura, sendo esta compreendida como instituição do Estado e como experiência-limite em diferentes aspectos (as três etapas do processo traumatizante, principais seqüelas somáticas, retraumatização. Estuda-se o incremento das psicoses, a alta porcentagem de suicídios, as dificuldades de reinserção social, as seqüelas crônicas trans-geracionais e a taxa de mortalidade muito superior à normal. Desenvolve-se uma análise detalhada das questões ligadas à indenização das vítimas. Propõem-se instrumentos diagnósticos e terapêuticos apropriados, baseando-se numa crítica clínica do PTSD, numa ampla experiência pessoal e numa bibliografia internacional atualizada.

  7. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology

    DEFF Research Database (Denmark)

    Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr

    2017-01-01

    Background RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver...

  8. Combining multiple ChIP-seq peak detection systems using combinatorial fusion.

    Science.gov (United States)

    Schweikert, Christina; Brown, Stuart; Tang, Zuojian; Smith, Phillip R; Hsu, D Frank

    2012-01-01

    Due to the recent rapid development in ChIP-seq technologies, which uses high-throughput next-generation DNA sequencing to identify the targets of Chromatin Immunoprecipitation, there is an increasing amount of sequencing data being generated that provides us with greater opportunity to analyze genome-wide protein-DNA interactions. In particular, we are interested in evaluating and enhancing computational and statistical techniques for locating protein binding sites. Many peak detection systems have been developed; in this study, we utilize the following six: CisGenome, MACS, PeakSeq, QuEST, SISSRs, and TRLocator. We define two methods to merge and rescore the regions of two peak detection systems and analyze the performance based on average precision and coverage of transcription start sites. The results indicate that ChIP-seq peak detection can be improved by fusion using score or rank combination. Our method of combination and fusion analysis would provide a means for generic assessment of available technologies and systems and assist researchers in choosing an appropriate system (or fusion method) for analyzing ChIP-seq data. This analysis offers an alternate approach for increasing true positive rates, while decreasing false positive rates and hence improving the ChIP-seq peak identification process.

  9. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.

    Science.gov (United States)

    Yip, Shun H; Sham, Pak Chung; Wang, Junwen

    2018-02-21

    Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.

  10. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    Science.gov (United States)

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  11. Targeted Integration of RNA-Seq and Metabolite Data to Elucidate Curcuminoid Biosynthesis in Four Curcuma Species.

    Science.gov (United States)

    Li, Donghan; Ono, Naoaki; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Ohta, Daisaku; Suzuki, Hideyuki; Arita, Masanori; Tanaka, Ken; Ma, Zhiqiang; Kanaya, Shigehiko

    2015-05-01

    Curcuminoids, namely curcumin and its analogs, are secondary metabolites that act as the primary active constituents of turmeric (Curcuma longa). The contents of these curcuminoids vary among species in the genus Curcuma. For this reason, we compared two wild strains and two cultivars to understand the differences in the synthesis of curcuminoids. Because the fluxes of metabolic reactions depend on the amounts of their substrate and the activity of the catalysts, we analyzed the metabolite concentrations and gene expression of related enzymes. We developed a method based on RNA sequencing (RNA-Seq) analysis that focuses on a specific set of genes to detect expression differences between species in detail. We developed a 'selection-first' method for RNA-Seq analysis in which short reads are mapped to selected enzymes in the target biosynthetic pathways in order to reduce the effect of mapping errors. Using this method, we found that the difference in the contents of curcuminoids among the species, as measured by gas chromatography-mass spectrometry, could be explained by the changes in the expression of genes encoding diketide-CoA synthase, and curcumin synthase at the branching point of the curcuminoid biosynthesis pathway. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  12. GroEL and dnaK genes of Escherichia coli are induced by UV irradiation and nalidixic acid in an htpR+-dependent fashion

    International Nuclear Information System (INIS)

    Krueger, J.H.; Walker, G.C.

    1984-01-01

    Two proteins with molecular weights of 61,000 and 73,000 were found to be induced by UV light in Escherichia coli mutants in which the SOS responses are constitutively expressed. The induction of these proteins by UV light and nalidixic acid was shown to be independent of the recA + lexA + regulatory system. Analysis of these proteins by two-dimensional gel electrophoresis and comparison with the heat-shock proteins of E. coli revealed that the M/sub r/ 61,000 protein comigrated with the groEL gene product, that the M/sub r/ 73,000 protein comigrated with the dnaK gene product, and that other heat-shock proteins were also induced. The induction of groEL and dnaK by UV light and nalidixic acid is controlled by the htpR locus. The results suggest that the regulatory response of E. coli to agents such as UV light and nalidixic acid is more complex than previously thought. 35 references, 6 figures, 1 table

  13. Seq2Ref: a web server to facilitate functional interpretation

    Directory of Open Access Journals (Sweden)

    Li Wenlin

    2013-01-01

    Full Text Available Abstract Background The size of the protein sequence database has been exponentially increasing due to advances in genome sequencing. However, experimentally characterized proteins only constitute a small portion of the database, such that the majority of sequences have been annotated by computational approaches. Current automatic annotation pipelines inevitably introduce errors, making the annotations unreliable. Instead of such error-prone automatic annotations, functional interpretation should rely on annotations of ‘reference proteins’ that have been experimentally characterized or manually curated. Results The Seq2Ref server uses BLAST to detect proteins homologous to a query sequence and identifies the reference proteins among them. Seq2Ref then reports publications with experimental characterizations of the identified reference proteins that might be relevant to the query. Furthermore, a plurality-based rating system is developed to evaluate the homologous relationships and rank the reference proteins by their relevance to the query. Conclusions The reference proteins detected by our server will lend insight into proteins of unknown function and provide extensive information to develop in-depth understanding of uncharacterized proteins. Seq2Ref is available at: http://prodata.swmed.edu/seq2ref.

  14. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    Science.gov (United States)

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  15. Nuclear-like Seq in mt Genome - RMG | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ar-like Seq in mt Genome Data detail Data name Nuclear-like Seq in mt Genome DOI 10...e Site Policy | Contact Us Nuclear-like Seq in mt Genome - RMG | LSDB Archive ... ...switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us RMG Nucle

  16. DFI-seq identification of environment-specific gene expression in uropathogenic Escherichia coli

    DEFF Research Database (Denmark)

    Madelung, Michelle; Kronborg, Tina; Doktor, Thomas Koed

    2017-01-01

    response. We combined differential fluorescence induction (DFI) with next-generation sequencing, collectively termed DFI-seq, to identify differentially expressed genes in UPEC strain UTI89 during growth in human urine and bladder cells. RESULTS: DFI-seq eliminates the need for iterative cell sorting...... hypothetical proteins. One such gene UTI89_C5139, displayed increased adhesion and invasion of J82 cells when deleted from UPEC strain UTI89. CONCLUSIONS: We demonstrate the usefulness of DFI-seq for identification of genes required for optimal growth of UPEC in human urine, as well as potential virulence...

  17. Liver Transcriptome Analysis of the Large Yellow Croaker (Larimichthys crocea) during Fasting by Using RNA-Seq

    Science.gov (United States)

    Qian, Baoying; Xue, Liangyi; Huang, Hongli

    2016-01-01

    The large yellow croaker (Larimichthys crocea) is an economically important fish species in Chinese mariculture industry. To understand the molecular basis underlying the response to fasting, Illumina HiSeqTM 2000 was used to analyze the liver transcriptome of fasting large yellow croakers. A total of 54,933,550 clean reads were obtained and assembled into 110,364 contigs. Annotation to the NCBI database identified a total of 38,728 unigenes, of which 19,654 were classified into Gene Ontology and 22,683 were found in Kyoto Encyclopedia of Genes and Genomes (KEGG). Comparative analysis of the expression profiles between fasting fish and normal-feeding fish identified a total of 7,623 differentially expressed genes (P fasting as well as identified areas that require further investigation. PMID:26967898

  18. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

    Science.gov (United States)

    Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi

    2014-12-23

    Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.

  19. RNA-Seq Analysis of Abdominal Fat Reveals Differences between Modern Commercial Broiler Chickens with High and Low Feed Efficiencies.

    Directory of Open Access Journals (Sweden)

    Zhu Zhuo

    Full Text Available For economic and environmental reasons, chickens with superior feed efficiency (FE are preferred in the broiler chicken industry. High FE (HFE chickens typically have reduced abdominal fat, the major adipose tissue in chickens. In addition to its function of energy storage, adipose tissue is a metabolically active organ that also possesses endocrine and immune regulatory functions. It plays a central role in maintaining energy homeostasis. Comprehensive understanding of the gene expression in the adipose tissue and the biological basis of FE are of significance to optimize selection and breeding strategies. Through gene expression profiling of abdominal fat from high and low FE (LFE commercial broiler chickens, the present study aimed to characterize the differences of gene expression between HFE and LFE chickens. mRNA-seq analysis was carried out on the total RNA of abdominal fat from 10 HFE and 12 LFE commercial broiler chickens, and 1.48 billion of 75-base sequence reads were generated in total. On average, 11,565 genes were expressed (>5 reads/gene/sample in the abdominal fat tissue, of which 286 genes were differentially expressed (DE at q (False Discover Rate 1.3 between HFE and LFE chickens. Expression levels from RNA-seq were confirmed with the NanoString nCounter analysis system. Functional analysis showed that the DE genes were significantly (p < 0.01 enriched in lipid metabolism, coagulation, and immune regulation pathways. Specifically, the LFE chickens had higher expression of lipid synthesis genes and lower expression of triglyceride hydrolysis and cholesterol transport genes. In conclusion, our study reveals the overall differences of gene expression in the abdominal fat from HFE and LFE chickens, and the results suggest that the divergent expression of lipid metabolism genes represents the major differences.

  20. Simultaneous fits in ISIS on the example of GRO J1008-57

    Science.gov (United States)

    Kühnel, Matthias; Müller, Sebastian; Kreykenbohm, Ingo; Schwarm, Fritz-Walter; Grossberger, Christoph; Dauser, Thomas; Pottschmidt, Katja; Ferrigno, Carlo; Rothschild, Richard E.; Klochkov, Dmitry; Staubert, Rüdiger; Wilms, Joern

    2015-04-01

    Parallel computing and steadily increasing computation speed have led to a new tool for analyzing multiple datasets and datatypes: fitting several datasets simultaneously. With this technique, physically connected parameters of individual data can be treated as a single parameter by implementing this connection into the fit directly. We discuss the terminology, implementation, and possible issues of simultaneous fits based on the X-ray data analysis tool Interactive Spectral Interpretation System (ISIS). While all data modeling tools in X-ray astronomy allow in principle fitting data from multiple data sets individually, the syntax used in these tools is not often well suited for this task. Applying simultaneous fits to the transient X-ray binary GRO J1008-57, we find that the spectral shape is only dependent on X-ray flux. We determine time independent parameters such as, e.g., the folding energy E_fold, with unprecedented precision.

  1. Seeing the forest for the trees: annotating small RNA producing genes in plants.

    Science.gov (United States)

    Coruh, Ceyda; Shahid, Saima; Axtell, Michael J

    2014-04-01

    A key goal in genomics is the complete annotation of the expressed regions of the genome. In plants, substantial portions of the genome make regulatory small RNAs produced by Dicer-Like (DCL) proteins and utilized by Argonaute (AGO) proteins. These include miRNAs and various types of endogenous siRNAs. Small RNA-seq, enabled by cheap and fast DNA sequencing, has produced an enormous volume of data on plant miRNA and siRNA expression in recent years. In this review, we discuss recent progress in using small RNA-seq data to produce stable and reliable annotations of miRNA and siRNA genes in plants. In addition, we highlight key goals for the future of small RNA gene annotation in plants. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. smallWig: parallel compression of RNA-seq WIG files.

    Science.gov (United States)

    Wang, Zhiying; Weissman, Tsachy; Milenkovic, Olgica

    2016-01-15

    We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms. The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate

  3. Discovery and Orbital Determination of the Transient X-Ray Pulsar GRO J1750-27

    Science.gov (United States)

    Scott, D. M.; Finger, M. H.; Wilson, R. B.; Koh, D. T.; Prince, T. A.; Vaughan, B. A.; Chakrabarty, D.

    1997-01-01

    We report on the discovery and hard X-ray (20 - 70 keV) observations of the 4.45 s period transient X-ray pulsar GRO J1750-27 with the BATSE all-sky monitor on board CGRO. A relatively faint out- burst (less than 30 mcrab peak) lasting at least 60 days was observed during which the spin-up rate peaked at 38 pHz/s and was correlated with the pulsed intensity. An orbit with a period of 29.8 days was found. The large spin-up rate, spin period, and orbital period together suggest that accretion is occurring from a disk and that the outburst is a "giant" outburst typical of a Be/X-ray transient system. No optical counterpart has yet been reported.

  4. A model-based approach to identify binding sites in CLIP-Seq data.

    Directory of Open Access Journals (Sweden)

    Tao Wang

    Full Text Available Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq has made it possible to identify the targeting sites of RNA-binding proteins in various cell culture systems and tissue types on a genome-wide scale. Here we present a novel model-based approach (MiClip to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets. In the HITS-CLIP dataset, the signal/noise ratios of miRNA seed motif enrichment produced by the MiClip approach are between 17% and 301% higher than those by the ad hoc method for the top 10 most enriched miRNAs. In the PAR-CLIP dataset, the MiClip approach can identify ∼50% more validated binding targets than the original ad hoc method and two recently published methods. To facilitate the application of the algorithm, we have released an R package, MiClip (http://cran.r-project.org/web/packages/MiClip/index.html, and a public web-based graphical user interface software (http://galaxy.qbrc.org/tool_runner?tool_id=mi_clip for customized analysis.

  5. A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases.

    Science.gov (United States)

    Jain, Chirag; Dilthey, Alexander; Koren, Sergey; Aluru, Srinivas; Phillippy, Adam M

    2018-04-30

    Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290 × faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each ≥5 kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.

  6. shRNA-seq data analysis with edgeR [v1; ref status: indexed, http://f1000r.es/38s

    Directory of Open Access Journals (Sweden)

    Zhiyin Dai

    2014-04-01

    Full Text Available Pooled short hairpin RNA sequencing (shRNA-seq screens are becoming increasingly popular in functional genomics research, and there is a need to establish optimal analysis tools to handle such data. Our open-source shRNA processing pipeline in edgeR provides a complete analysis solution for shRNA-seq screen data, that begins with the raw sequence reads and ends with a ranked lists of candidate shRNAs for downstream biological validation. We first summarize the raw data contained in a fastq file into a matrix of counts (samples in the columns, hairpins in the rows with options for allowing mismatches and small shifts in hairpin position. Diagnostic plots, normalization and differential representation analysis can then be performed using established methods to prioritize results in a statistically rigorous way, with the choice of either the classic exact testing methodology or a generalized linear modelling that can handle complex experimental designs. A detailed users’ guide that demonstrates how to analyze screen data in edgeR along with a point-and-click implementation of this workflow in Galaxy are also provided. The edgeR package is freely available from http://www.bioconductor.org.

  7. TruSeq Stranded mRNA and Total RNA Sample Preparation Kits

    Science.gov (United States)

    Total RNA-Seq enabled by ribosomal RNA (rRNA) reduction is compatible with formalin-fixed paraffin embedded (FFPE) samples, which contain potentially critical biological information. The family of TruSeq Stranded Total RNA sample preparation kits provides a unique combination of unmatched data quality for both mRNA and whole-transcriptome analyses, robust interrogation of both standard and low-quality samples and workflows compatible with a wide range of study designs.

  8. DETECTION OF BACTERIAL SMALL TRANSCRIPTS FROM RNA-SEQ DATA: A COMPARATIVE ASSESSMENT.

    Science.gov (United States)

    Peña-Castillo, Lourdes; Grüell, Marc; Mulligan, Martin E; Lang, Andrew S

    2016-01-01

    Small non-coding RNAs (sRNAs) are regulatory RNA molecules that have been identified in a multitude of bacterial species and shown to control numerous cellular processes through various regulatory mechanisms. In the last decade, next generation RNA sequencing (RNA-seq) has been used for the genome-wide detection of bacterial sRNAs. Here we describe sRNA-Detect, a novel approach to identify expressed small transcripts from prokaryotic RNA-seq data. Using RNA-seq data from three bacterial species and two sequencing platforms, we performed a comparative assessment of five computational approaches for the detection of small transcripts. We demonstrate that sRNA-Detect improves upon current standalone computational approaches for identifying novel small transcripts in bacteria.

  9. SIMULTANEOUS FITS IN ISIS ON THE EXAMPLE OF GRO J1008–57

    Directory of Open Access Journals (Sweden)

    M. Kühnel

    2015-04-01

    Full Text Available Parallel computing and steadily increasing computation speed have led to a new tool for analyzing multiple datasets and datatypes: fitting several datasets simultaneously.  With this technique, physically connected parameters of individual data can be treated as a single parameter by implementing this connection directly into the fit. We discuss the terminology, implementation, and possible issues of simultaneous fits based on the Interactive Spectral Interpretation System (ISIS X-ray data analysis tool. While all data modeling tools in X-ray astronomy in principle allow data to be fitted individually from multiple data sets, the syntax used in these tools is not often well suited for this task. Applying simultaneous fits to the transient X-ray binary GRO J1008–57, we find that the spectral shape is only dependent on X-ray flux. We determine time independent parameters e.g., the folding energy Efold, with unprecedented precision.

  10. SeqVISTA: a graphical tool for sequence feature visualization and comparison

    Directory of Open Access Journals (Sweden)

    Niu Tianhua

    2003-01-01

    Full Text Available Abstract Background Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite strand? Subsequently, your favorite sequence analysis software informs you that there is an interesting feature at position 13982–14013. By painstakingly counting the 10 bp blocks, you are able to examine the sequence at this location. But now you want to see what other features have been annotated close by, and this information is buried several screenfuls higher up the web page. Results SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet. Our viewer may be launched on a GenBank record by a single click of a button installed in the web browser. Conclusion SeqVISTA allows insights to be gained by viewing the totality of sequence annotations and predictions, which may be more revealing than the sum of their parts. SeqVISTA runs on any operating system with a Java 1.4 virtual machine. It is freely available to academic users at http://zlab.bu.edu/SeqVISTA.

  11. Targeted sequencing of large genomic regions with CATCH-Seq.

    Directory of Open Access Journals (Sweden)

    Kenneth Day

    Full Text Available Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

  12. Analysis of ChIP-seq Data in R/Bioconductor.

    Science.gov (United States)

    de Santiago, Ines; Carroll, Thomas

    2018-01-01

    The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.

  13. CMT: a constrained multi-level thresholding approach for ChIP-Seq data analysis.

    Directory of Open Access Journals (Sweden)

    Iman Rezaeian

    Full Text Available Genome-wide profiling of DNA-binding proteins using ChIP-Seq has emerged as an alternative to ChIP-chip methods. ChIP-Seq technology offers many advantages over ChIP-chip arrays, including but not limited to less noise, higher resolution, and more coverage. Several algorithms have been developed to take advantage of these abilities and find enriched regions by analyzing ChIP-Seq data. However, the complexity of analyzing various patterns of ChIP-Seq signals still needs the development of new algorithms. Most current algorithms use various heuristics to detect regions accurately. However, despite how many formulations are available, it is still difficult to accurately determine individual peaks corresponding to each binding event. We developed Constrained Multi-level Thresholding (CMT, an algorithm used to detect enriched regions on ChIP-Seq data. CMT employs a constraint-based module that can target regions within a specific range. We show that CMT has higher accuracy in detecting enriched regions (peaks by objectively assessing its performance relative to other previously proposed peak finders. This is shown by testing three algorithms on the well-known FoxA1 Data set, four transcription factors (with a total of six antibodies for Drosophila melanogaster and the H3K4ac antibody dataset.

  14. expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform.

    Science.gov (United States)

    Borrill, Philippa; Ramirez-Gonzalez, Ricardo; Uauy, Cristobal

    2016-04-01

    The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP's suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments. © 2016 American Society of Plant Biologists. All Rights Reserved.

  15. Linnorm: improved statistical analysis for single cell RNA-seq expression data.

    Science.gov (United States)

    Yip, Shun H; Wang, Panwen; Kocher, Jean-Pierre A; Sham, Pak Chung; Wang, Junwen

    2017-12-15

    Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Microbial community structure and activity in trace element-contaminated soils phytomanaged by Gentle Remediation Options (GRO).

    Science.gov (United States)

    Touceda-González, M; Prieto-Fernández, Á; Renella, G; Giagnoni, L; Sessitsch, A; Brader, G; Kumpiene, J; Dimitriou, I; Eriksson, J; Friesl-Hanl, W; Galazka, R; Janssen, J; Mench, M; Müller, I; Neu, S; Puschenreiter, M; Siebielec, G; Vangronsveld, J; Kidd, P S

    2017-12-01

    Gentle remediation options (GRO) are based on the combined use of plants, associated microorganisms and soil amendments, which can potentially restore soil functions and quality. We studied the effects of three GRO (aided-phytostabilisation, in situ stabilisation and phytoexclusion, and aided-phytoextraction) on the soil microbial biomass and respiration, the activities of hydrolase enzymes involved in the biogeochemical cycles of C, N, P, and S, and bacterial community structure of trace element contaminated soils (TECS) from six field trials across Europe. Community structure was studied using denaturing gradient gel electrophoresis (DGGE) fingerprinting of Bacteria, α- and β-Proteobacteria, Actinobacteria and Streptomycetaceae, and sequencing of DGGE bands characteristic of specific treatments. The number of copies of genes involved in ammonia oxidation and denitrification were determined by qPCR. Phytomanagement increased soil microbial biomass at three sites and respiration at the Biogeco site (France). Enzyme activities were consistently higher in treated soils compared to untreated soils at the Biogeco site. At this site, microbial biomass increased from 696 to 2352 mg ATP kg -1 soil, respiration increased from 7.4 to 40.1 mg C-CO 2 kg -1 soil d -1 , and enzyme activities were 2-11-fold higher in treated soils compared to untreated soil. Phytomanagement induced shifts in the bacterial community structure at both, the total community and functional group levels, and generally increased the number of copies of genes involved in the N cycle (nirK, nirS, nosZ, and amoA). The influence of the main soil physico-chemical properties and trace element availability were assessed and eventual site-specific effects elucidated. Overall, our results demonstrate that phytomanagement of TECS influences soil biological activity in the long term. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Joint modeling of ChIP-seq data via a Markov random field model

    NARCIS (Netherlands)

    Bao, Yanchun; Vinciotti, Veronica; Wit, Ernst; 't Hoen, Peter A C

    Chromatin ImmunoPrecipitation-sequencing (ChIP-seq) experiments have now become routine in biology for the detection of protein-binding sites. In this paper, we present a Markov random field model for the joint analysis of multiple ChIP-seq experiments. The proposed model naturally accounts for

  18. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation.

    Directory of Open Access Journals (Sweden)

    Wei Shen

    Full Text Available FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.

  19. RNA-Seq reveals complex genetic response to deepwater horizon oil release in Fundulus grandis

    Directory of Open Access Journals (Sweden)

    Garcia Tzintzuni I

    2012-09-01

    Full Text Available Abstract Background The release of oil resulting from the blowout of the Deepwater Horizon (DH drilling platform was one of the largest in history discharging more than 189 million gallons of oil and subject to widespread application of oil dispersants. This event impacted a wide range of ecological habitats with a complex mix of pollutants whose biological impact is still not yet fully understood. To better understand the effects on a vertebrate genome, we studied gene expression in the salt marsh minnow Fundulus grandis, which is local to the northern coast of the Gulf of Mexico and is a sister species of the ecotoxicological model Fundulus heteroclitus. To assess genomic changes, we quantified mRNA expression using high throughput sequencing technologies (RNA-Seq in F. grandis populations in the marshes and estuaries impacted by DH oil release. This application of RNA-Seq to a non-model, wild, and ecologically significant organism is an important evaluation of the technology to quickly assess similar events in the future. Results Our de novo assembly of RNA-Seq data produced a large set of sequences which included many duplicates and fragments. In many cases several of these could be associated with a common reference sequence using blast to query a reference database. This reduced the set of significant genes to 1,070 down-regulated and 1,251 up-regulated genes. These genes indicate a broad and complex genomic response to DH oil exposure including the expected AHR-mediated response and CYP genes. In addition a response to hypoxic conditions and an immune response are also indicated. Several genes in the choriogenin family were down-regulated in the exposed group; a response that is consistent with AH exposure. These analyses are in agreement with oligonucleotide-based microarray analyses, and describe only a subset of significant genes with aberrant regulation in the exposed set. Conclusion RNA-Seq may be successfully applied to feral and

  20. Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection.

    Directory of Open Access Journals (Sweden)

    Thadeous J Kacmarczyk

    Full Text Available Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads. Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.

  1. Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection

    Science.gov (United States)

    Kacmarczyk, Thadeous J.; Bourque, Caitlin; Zhang, Xihui; Jiang, Yanwen; Houvras, Yariv; Alonso, Alicia; Betel, Doron

    2015-01-01

    Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal. PMID:26066343

  2. Multiplexing of ChIP-Seq Samples in an Optimized Experimental Condition Has Minimal Impact on Peak Detection.

    Science.gov (United States)

    Kacmarczyk, Thadeous J; Bourque, Caitlin; Zhang, Xihui; Jiang, Yanwen; Houvras, Yariv; Alonso, Alicia; Betel, Doron

    2015-01-01

    Multiplexing samples in sequencing experiments is a common approach to maximize information yield while minimizing cost. In most cases the number of samples that are multiplexed is determined by financial consideration or experimental convenience, with limited understanding on the effects on the experimental results. Here we set to examine the impact of multiplexing ChIP-seq experiments on the ability to identify a specific epigenetic modification. We performed peak detection analyses to determine the effects of multiplexing. These include false discovery rates, size, position and statistical significance of peak detection, and changes in gene annotation. We found that, for histone marker H3K4me3, one can multiplex up to 8 samples (7 IP + 1 input) at ~21 million single-end reads each and still detect over 90% of all peaks found when using a full lane for sample (~181 million reads). Furthermore, there are no variations introduced by indexing or lane batch effects and importantly there is no significant reduction in the number of genes with neighboring H3K4me3 peaks. We conclude that, for a well characterized antibody and, therefore, model IP condition, multiplexing 8 samples per lane is sufficient to capture most of the biological signal.

  3. Identification of transcripts regulated by CUG-BP, Elav-like family member 1 (CELF1 in primary embryonic cardiomyocytes by RNA-seq

    Directory of Open Access Journals (Sweden)

    Yotam Blech-Hermoni

    2015-12-01

    Full Text Available CUG-BP, Elav-like family member 1 (CELF1 is a multi-functional RNA binding protein that regulates pre-mRNA alternative splicing in the nucleus, as well as polyadenylation status, mRNA stability, and translation in the cytoplasm [1]. Dysregulation of CELF1 has been implicated in cardiomyopathies in myotonic dystrophy type 1 and diabetes [2–5], but the targets of CELF1 regulation in the heart have not been systematically investigated. We previously demonstrated that in the developing heart CELF1 expression is restricted to the myocardium and peaks during embryogenesis [6–8]. To identify transcripts regulated by CELF1 in the embryonic myocardium, RNA-seq was used to compare the transcriptome of primary embryonic cardiomyocytes following siRNA-mediated knockdown of CELF1 to that of controls. Raw data files of the RNA-seq reads have been deposited in NCBI's Gene Expression Omnibus [9] under the GEO Series accession number GSE67360. These data can be used to identify transcripts whose levels or alternative processing (i.e., alternative splicing or polyadenylation site usage are regulated by CELF1, and should provide insight into the pathways and processes modulated by this important RNA binding protein during normal heart development and during cardiac pathogenesis.

  4. Discovery of transcription factors and regulatory regions driving in vivo tumor development by ATAC-seq and FAIRE-seq open chromatin profiling.

    Directory of Open Access Journals (Sweden)

    Kristofer Davie

    2015-02-01

    Full Text Available Genomic enhancers regulate spatio-temporal gene expression by recruiting specific combinations of transcription factors (TFs. When TFs are bound to active regulatory regions, they displace canonical nucleosomes, making these regions biochemically detectable as nucleosome-depleted regions or accessible/open chromatin. Here we ask whether open chromatin profiling can be used to identify the entire repertoire of active promoters and enhancers underlying tissue-specific gene expression during normal development and oncogenesis in vivo. To this end, we first compare two different approaches to detect open chromatin in vivo using the Drosophila eye primordium as a model system: FAIRE-seq, based on physical separation of open versus closed chromatin; and ATAC-seq, based on preferential integration of a transposon into open chromatin. We find that both methods reproducibly capture the tissue-specific chromatin activity of regulatory regions, including promoters, enhancers, and insulators. Using both techniques, we screened for regulatory regions that become ectopically active during Ras-dependent oncogenesis, and identified 3778 regions that become (over-activated during tumor development. Next, we applied motif discovery to search for candidate transcription factors that could bind these regions and identified AP-1 and Stat92E as key regulators. We validated the importance of Stat92E in the development of the tumors by introducing a loss of function Stat92E mutant, which was sufficient to rescue the tumor phenotype. Additionally we tested if the predicted Stat92E responsive regulatory regions are genuine, using ectopic induction of JAK/STAT signaling in developing eye discs, and observed that similar chromatin changes indeed occurred. Finally, we determine that these are functionally significant regulatory changes, as nearby target genes are up- or down-regulated. In conclusion, we show that FAIRE-seq and ATAC-seq based open chromatin profiling

  5. Successful Reading Strategies To Meet the Texas Reading Initiative Components: A Literary Review and Manual for Administrators, Teachers, and Parents.

    Science.gov (United States)

    Baker, Bridget; Karr-Kidwell, PJ

    This paper provides a literary review and research-based techniques for teaching reading. The paper also examines the different philosophies of reading to ascertain beneficial commonalities. Based on the literature review, a manual was produced to support administrators, teachers, and parents in securing quality reading instruction. Appendix A…

  6. Pipeline for the Analysis of ChIP-seq Data and New Motif Ranking Procedure

    KAUST Repository

    Ashoor, Haitham

    2011-06-01

    This thesis presents a computational methodology for ab-initio identification of transcription factor binding sites based on ChIP-seq data. This method consists of three main steps, namely ChIP-seq data processing, motif discovery and models selection. A novel method for ranking the models of motifs identified in this process is proposed. This method combines multiple factors in order to rank the provided candidate motifs. It combines the model coverage of the ChIP-seq fragments that contain motifs from which that model is built, the suitable background data made up of shuffled ChIP-seq fragments, and the p-value that resulted from evaluating the model on actual and background data. Two ChIP-seq datasets retrieved from ENCODE project are used to evaluate and demonstrate the ability of the method to predict correct TFBSs with high precision. The first dataset relates to neuron-restrictive silencer factor, NRSF, while the second one corresponds to growth-associated binding protein, GABP. The pipeline system shows high precision prediction for both datasets, as in both cases the top ranked motif closely resembles the known motifs for the respective transcription factors.

  7. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias[S

    Science.gov (United States)

    Johansen, Christopher T.; Dubé, Joseph B.; Loyzer, Melissa N.; MacDonald, Austin; Carter, David E.; McIntyre, Adam D.; Cao, Henian; Wang, Jian; Robinson, John F.; Hegele, Robert A.

    2014-01-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues. PMID:24503134

  8. Transforming RNA-Seq data to improve the performance of prognostic gene signatures.

    Science.gov (United States)

    Zwiener, Isabella; Frisch, Barbara; Binder, Harald

    2014-01-01

    Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

  9. Statistical modeling of isoform splicing dynamics from RNA-seq time series data.

    Science.gov (United States)

    Huang, Yuanhua; Sanguinetti, Guido

    2016-10-01

    Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here, we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the correlations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real datasets, our results show that DICEseq provides substantially more reproducible and robust quantifications, increasing the correlation of estimates from replicate datasets by up to 10% on genes with low or moderate expression levels (bottom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq experiments, and offer a novel tool for improved analysis of such datasets. Python code is freely available at http://diceseq.sf.net G.Sanguinetti@ed.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Transforming RNA-Seq data to improve the performance of prognostic gene signatures.

    Directory of Open Access Journals (Sweden)

    Isabella Zwiener

    Full Text Available Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

  11. Prevalência de artefatos em exames de ressonância magnética do abdome utilizando a seqüência GRASE: comparável com as melhores seqüências rápidas? Prevalence of artifacts in abdominal magnetic resonance imaging using GRASE sequence: a comparison with TSE sequences

    Directory of Open Access Journals (Sweden)

    Viviane Vieira Francisco

    2005-09-01

    Full Text Available OBJETIVO: Determinar a freqüência global de artefatos na seqüência "gradient and spin echo" (GRASE, por tipo e grau do artefato, em exames de ressonância magnética de abdome; realizar comparação entre as seqüências GRASE e duas seqüências TSE previamente selecionadas como aquelas com melhor relação sinal-ruído e menor incidência de artefatos. MATERIAIS E MÉTODOS: Foi realizado estudo prospectivo, autopareado, em 86 pacientes submetidos a ressonância magnética de abdome superior, sendo adquiridas a seqüência GRASE com sincronizador respiratório e supressão de gordura e seis seqüências TSE ponderadas em T2. Dentre as seis seqüências TSE, foram previamente selecionadas aquelas com melhor relação sinal-ruído e menor número de artefatos, que foram as realizadas com supressão de gordura e com sincronizador respiratório, sendo uma com bobina de corpo (seqüência 1 e outra com bobina de sinergia (seqüência 2. A análise das imagens foi realizada por dois observadores em consenso, quanto a presença, grau e tipo de artefato. Posteriormente os dados foram analisados estatisticamente, através do teste de Friedman e do qui-quadrado. RESULTADOS: A freqüência absoluta de artefatos nas seqüências utilizadas foi de 65,02%. Os artefatos mais encontrados nas três seqüências estudadas foram os de respiração (30% e de pulsação (33%. Apenas 3% dos casos apresentaram algum tipo de artefato que dificultava a análise das imagens. As freqüências de artefatos nas diversas seqüências foram: GRASE, 67,2%; seqüência TSE 1, 62,2%; seqüência TSE 2, 65,5%. Não houve diferença estatisticamente significante na freqüência de artefatos encontrados nas seqüências GRASE e nas seqüências TSE (p = 0,845; NS. CONCLUSÃO: As seqüências GRASE e TSE ponderadas em T2 com sincronizador respiratório e com supressão de gordura, independentemente da bobina utilizada, apresentam freqüentemente artefatos, porém com incid

  12. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq.

    Science.gov (United States)

    Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan

    2015-01-01

    Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal

  13. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Shi-Weng Li

    Full Text Available Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77% were annotated using BLASTx. Among them, 28,225 (35.75% and 28,119 (35.62% unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2 during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3% with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles

  14. Science teacher's discourse about reading

    Directory of Open Access Journals (Sweden)

    Isabel Martins

    2006-08-01

    Full Text Available In this research we start from the assumption that teachers act as mediators of reading practices in school and problematise their practices, meanings and representations of reading. We have investigated meanings constructed by a group of teachers of Physics, Chemistry and Biology, working at a federal technical school. Having French discourse analysis as our theoretical-methodological framework, we considered that meanings, concepts and conceptions of reading are built historically through discourses, which produce meanings that determine ideological practices. Our results show that, for that group of teachers, there were no opportunities during either initial training or on-going education for reflecting upon the role of reading in science teaching and learning. Moreover, there seems to be an association between the type of discourse and modes of reading, so that unique meanings are attributed to scientific texts and their reading are linked to search and assimilation of information.

  15. MultiSeq: unifying sequence and structure data for evolutionary analysis

    Directory of Open Access Journals (Sweden)

    Wright Dan

    2006-08-01

    Full Text Available Abstract Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural

  16. What a Girl! Fighting Gentleness in the Picture Book World: An Analysis of the Norwegian Picture Book "What a Girl!" by Gro Dahle and Svein Nyhus

    Science.gov (United States)

    Maagerø, Eva; Østbye, Guri Lorentzen

    2017-01-01

    The Norwegian picture book "What a Girl!" (original title "Snill") by Gro Dahle and Svein Nyhus was published 2011 and immediately gained a large audience. The book tells the story about a girl who always behaves in the ways expected of her: she never confronts her parents, her teacher or her classmates. This behaviour makes…

  17. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes.

    Science.gov (United States)

    Ackermann, Amanda M; Wang, Zhiping; Schug, Jonathan; Naji, Ali; Kaestner, Klaus H

    2016-03-01

    Although glucagon-secreting α-cells and insulin-secreting β-cells have opposing functions in regulating plasma glucose levels, the two cell types share a common developmental origin and exhibit overlapping transcriptomes and epigenomes. Notably, destruction of β-cells can stimulate repopulation via transdifferentiation of α-cells, at least in mice, suggesting plasticity between these cell fates. Furthermore, dysfunction of both α- and β-cells contributes to the pathophysiology of type 1 and type 2 diabetes, and β-cell de-differentiation has been proposed to contribute to type 2 diabetes. Our objective was to delineate the molecular properties that maintain islet cell type specification yet allow for cellular plasticity. We hypothesized that correlating cell type-specific transcriptomes with an atlas of open chromatin will identify novel genes and transcriptional regulatory elements such as enhancers involved in α- and β-cell specification and plasticity. We sorted human α- and β-cells and performed the "Assay for Transposase-Accessible Chromatin with high throughput sequencing" (ATAC-seq) and mRNA-seq, followed by integrative analysis to identify cell type-selective gene regulatory regions. We identified numerous transcripts with either α-cell- or β-cell-selective expression and discovered the cell type-selective open chromatin regions that correlate with these gene activation patterns. We confirmed cell type-selective expression on the protein level for two of the top hits from our screen. The "group specific protein" (GC; or vitamin D binding protein) was restricted to α-cells, while CHODL (chondrolectin) immunoreactivity was only present in β-cells. Furthermore, α-cell- and β-cell-selective ATAC-seq peaks were identified to overlap with known binding sites for islet transcription factors, as well as with single nucleotide polymorphisms (SNPs) previously identified as risk loci for type 2 diabetes. We have determined the genetic landscape of

  18. WaveSeq: a novel data-driven method of detecting histone modification enrichments using wavelets.

    Directory of Open Access Journals (Sweden)

    Apratim Mitra

    Full Text Available BACKGROUND: Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns. RESULTS: To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup. CONCLUSIONS: WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.

  19. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.

    Science.gov (United States)

    D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano

    2015-01-01

    The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export

  20. Using RAD-seq to recognize sex-specific markers and sex chromosome systems.

    Science.gov (United States)

    Gamble, Tony

    2016-05-01

    Next-generation sequencing methods have initiated a revolution in molecular ecology and evolution (Tautz et al. ). Among the most impressive of these sequencing innovations is restriction site-associated DNA sequencing or RAD-seq (Baird et al. ; Andrews et al. ). RAD-seq uses the Illumina sequencing platform to sequence fragments of DNA cut by a specific restriction enzyme and can generate tens of thousands of molecular genetic markers for analysis. One of the many uses of RAD-seq data has been to identify sex-specific genetic markers, markers found in one sex but not the other (Baxter et al. ; Gamble & Zarkower ). Sex-specific markers are a powerful tool for biologists. At their most basic, they can be used to identify the sex of an individual via PCR. This is useful in cases where a species lacks obvious sexual dimorphism at some or all life history stages. For example, such tests have been important for studying sex differences in life history (Sheldon ; Mossman & Waser ), the management and breeding of endangered species (Taberlet et al. ; Griffiths & Tiwari ; Robertson et al. ) and sexing embryonic material (Hacker et al. ; Smith et al. ). Furthermore, sex-specific markers allow recognition of the sex chromosome system in cases where standard cytogenetic methods fail (Charlesworth & Mank ; Gamble & Zarkower ). Thus, species with male-specific markers have male heterogamety (XY) while species with female-specific markers have female heterogamety (ZW). In this issue, Fowler & Buonaccorsi () illustrate the ease by which RAD-seq data can generate sex-specific genetic markers in rockfish (Sebastes). Moreover, by examining RAD-seq data from two closely related rockfish species, Sebastes chrysomelas and Sebastes carnatus (Fig. ), Fowler & Buonaccorsi () uncover shared sex-specific markers and a conserved sex chromosome system. © 2016 John Wiley & Sons Ltd.

  1. Construction of an SNP-based high-density linkage map for flax (Linum usitatissimum L.) using specific length amplified fragment sequencing (SLAF-seq) technology.

    Science.gov (United States)

    Yi, Liuxi; Gao, Fengyun; Siqin, Bateer; Zhou, Yu; Li, Qiang; Zhao, Xiaoqing; Jia, Xiaoyun; Zhang, Hui

    2017-01-01

    Flax is an important crop for oil and fiber, however, no high-density genetic maps have been reported for this species. Specific length amplified fragment sequencing (SLAF-seq) is a high-resolution strategy for large scale de novo discovery and genotyping of single nucleotide polymorphisms. In this study, SLAF-seq was employed to develop SNP markers in an F2 population to construct a high-density genetic map for flax. In total, 196.29 million paired-end reads were obtained. The average sequencing depth was 25.08 in male parent, 32.17 in the female parent, and 9.64 in each F2 progeny. In total, 389,288 polymorphic SLAFs were detected, from which 260,380 polymorphic SNPs were developed. After filtering, 4,638 SNPs were found suitable for genetic map construction. The final genetic map included 4,145 SNP markers on 15 linkage groups and was 2,632.94 cM in length, with an average distance of 0.64 cM between adjacent markers. To our knowledge, this map is the densest SNP-based genetic map for flax. The SNP markers and genetic map reported in here will serve as a foundation for the fine mapping of quantitative trait loci (QTLs), map-based gene cloning and marker assisted selection (MAS) for flax.

  2. DMS-Seq for In Vivo Genome-wide Mapping of Protein-DNA Interactions and Nucleosome Centers.

    Science.gov (United States)

    Umeyama, Taichi; Ito, Takashi

    2017-10-03

    Protein-DNA interactions provide the basis for chromatin structure and gene regulation. Comprehensive identification of protein-occupied sites is thus vital to an in-depth understanding of genome function. Dimethyl sulfate (DMS) is a chemical probe that has long been used to detect footprints of DNA-bound proteins in vitro and in vivo. Here, we describe a genomic footprinting method, dimethyl sulfate sequencing (DMS-seq), which exploits the cell-permeable nature of DMS to obviate the need for nuclear isolation. This feature makes DMS-seq simple in practice and removes the potential risk of protein re-localization during nuclear isolation. DMS-seq successfully detects transcription factors bound to cis-regulatory elements and non-canonical chromatin particles in nucleosome-free regions. Furthermore, an unexpected preference of DMS confers on DMS-seq a unique potential to directly detect nucleosome centers without using genetic manipulation. We expect that DMS-seq will serve as a characteristic method for genome-wide interrogation of in vivo protein-DNA interactions. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  3. FMLRC: Hybrid long read error correction using an FM-index.

    Science.gov (United States)

    Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D

    2018-02-09

    Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

  4. HMCan: A method for detecting chromatin modifications in cancer samples using ChIP-seq data

    KAUST Repository

    Ashoor, Haitham; Hé rault, Auré lie; Kamoun, Auré lie; Radvanyi, Franç ois; Bajic, Vladimir B.; Barillot, Emmanuel; Boeva, Valentina

    2013-01-01

    genes. Though several tools have been created to enable detection of histone marks in ChIP-seq data from normal samples, it is unclear whether these tools can be efficiently applied to ChIP-seq data generated from cancer samples. Indeed, cancer genomes

  5. How Reading Volume Affects both Reading Fluency and Reading Achievement

    Directory of Open Access Journals (Sweden)

    Richard L. ALLINGTON

    2014-10-01

    Full Text Available Long overlooked, reading volume is actually central to the development of reading proficiencies, especially in the development of fluent reading proficiency. Generally no one in schools monitors the actual volume of reading that children engage in. We know that the commonly used commercial core reading programs provide only material that requires about 15 minutes of reading activity daily. The remaining 75 minute of reading lessons is filled with many other activities such as completing workbook pages or responding to low-level literal questions about what has been read. Studies designed to enhance the volume of reading that children do during their reading lessons demonstrate one way to enhance reading development. Repeated readings have been widely used in fostering reading fluency but wide reading options seem to work faster and more broadly in developing reading proficiencies, including oral reading fluency.

  6. Integrated RNA-Seq and sRNA-Seq Analysis Identifies Chilling and Freezing Responsive Key Molecular Players and Pathways in Tea Plant (Camellia sinensis)

    Science.gov (United States)

    Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang

    2015-01-01

    Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants’ growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., ‘Photosynthesis’), GO terms (e.g., ‘response to karrikin’) and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology. PMID:25901577

  7. Relationships within Cladobranchia (Gastropoda: Nudibranchia) based on RNA-Seq data: an initial investigation.

    Science.gov (United States)

    Goodheart, Jessica A; Bazinet, Adam L; Collins, Allen G; Cummings, Michael P

    2015-09-01

    Cladobranchia (Gastropoda: Nudibranchia) is a diverse (approx. 1000 species) but understudied group of sea slug molluscs. In order to fully comprehend the diversity of nudibranchs and the evolution of character traits within Cladobranchia, a solid understanding of evolutionary relationships is necessary. To date, only two direct attempts have been made to understand the evolutionary relationships within Cladobranchia, neither of which resulted in well-supported phylogenetic hypotheses. In addition to these studies, several others have addressed some of the relationships within this clade while investigating the evolutionary history of more inclusive groups (Nudibranchia and Euthyneura). However, all of the resulting phylogenetic hypotheses contain conflicting topologies within Cladobranchia. In this study, we address some of these long-standing issues regarding the evolutionary history of Cladobranchia using RNA-Seq data (transcriptomes). We sequenced 16 transcriptomes and combined these with four transcriptomes from the NCBI Sequence Read Archive. Transcript assembly using Trinity and orthology determination using HaMStR yielded 839 orthologous groups for analysis. These data provide a well-supported and almost fully resolved phylogenetic hypothesis for Cladobranchia. Our results support the monophyly of Cladobranchia and the sub-clade Aeolidida, but reject the monophyly of Dendronotida.

  8. Accurate clinical genetic testing for autoinflammatory diseases using the next-generation sequencing platform MiSeq.

    Science.gov (United States)

    Nakayama, Manabu; Oda, Hirotsugu; Nakagawa, Kenji; Yasumi, Takahiro; Kawai, Tomoki; Izawa, Kazushi; Nishikomori, Ryuta; Heike, Toshio; Ohara, Osamu

    2017-03-01

    Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA), also known as neonatal-onset multisystem inflammatory disease (NOMID). In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program-which detects and avoids common SNPs in gene-specific PCR primers-we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of somatic mosaic mutations.

  9. Robust Identification of Developmentally Active Endothelial Enhancers in Zebrafish Using FANS-Assisted ATAC-Seq.

    Science.gov (United States)

    Quillien, Aurelie; Abdalla, Mary; Yu, Jun; Ou, Jianhong; Zhu, Lihua Julie; Lawson, Nathan D

    2017-07-18

    Identification of tissue-specific and developmentally active enhancers provides insights into mechanisms that control gene expression during embryogenesis. However, robust detection of these regulatory elements remains challenging, especially in vertebrate genomes. Here, we apply fluorescent-activated nuclei sorting (FANS) followed by Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq) to identify developmentally active endothelial enhancers in the zebrafish genome. ATAC-seq of nuclei from Tg(fli1a:egfp) y1 transgenic embryos revealed expected patterns of nucleosomal positioning at transcriptional start sites throughout the genome and association with active histone modifications. Comparison of ATAC-seq from GFP-positive and -negative nuclei identified more than 5,000 open elements specific to endothelial cells. These elements flanked genes functionally important for vascular development and that displayed endothelial-specific gene expression. Importantly, a majority of tested elements drove endothelial gene expression in zebrafish embryos. Thus, FANS-assisted ATAC-seq using transgenic zebrafish embryos provides a robust approach for genome-wide identification of active tissue-specific enhancer elements. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  10. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation

    DEFF Research Database (Denmark)

    Chana Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene

    2017-01-01

    The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi......-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads...... and provides a new molecular tool to assist biological research in cartilaginous fishes....

  11. The Teaching Reading With Virtual Technology in Education Context

    Directory of Open Access Journals (Sweden)

    Vera Wannmacher Pereira

    2015-06-01

    Full Text Available This research had the specific objectives: produce virtual reading materials to the 7th grade of elementary high school students; develop workshops from these materials for these students; check the benefits of these workshops to the student`s reading comprehension development and also their the linguistic learning knowledge; produce an e-book to the teachers; provide this e-book to the publisher`s website at PUCRS. The methodology involved: conducting cooperative work around educational actions (application of virtual materials produced in workshops, research (development and implementation of research instruments and extension (production and availability of e-book. The results were: a productive network between University and schools; a set of reading comprehension material to the 7th grade of elementary school students; students’ reading comprehension development and learning language skills in the test applied before and after the workshops; the e-book which was developed and it is available on the EDIPUCRS site.

  12. Eye Movement during Silent and Oral Reading: How Can we Compensate the Loss of Multisensory Process during Silent Reading?

    Directory of Open Access Journals (Sweden)

    Maiko Takahashi

    2011-10-01

    Full Text Available While reading texts orally, we process the multisensory language information. Accordingly, in the context of reading aloud, we process the visually presented text and produce the auditory information of the text through articulatory movement. These multisensory processing activities are assumed to facilitate the memory and comprehension of textual information. Conversely, while reading silently, we process only the visual information of the text. Although we cannot use the multisensory language information while reading silently, several researchers have found that there is little difference between the degree of comprehension based on silent and oral reading for adult readers. The purpose of this study is to explain how we compensate the loss of multisensory process during silent reading by comparing the visual processing process during silent and oral reading. By conducting two experiments, we measured and compared the eye movement during silent and oral reading. The results showed that silent reading took shorter time for comprehension than oral reading, and readers had more visual fixation points and read back frequently during reading silently than orally. These reading strategies during silent reading seemed to compensate the loss of multisensory process and support the text comprehension.

  13. A new way of producing pediocin in Pediococcus acidilactici through intracellular stimulation by internalized inulin nanoparticles.

    Science.gov (United States)

    Kim, Whee-Soo; Lee, Jun-Yeong; Singh, Bijay; Maharjan, Sushila; Hong, Liang; Lee, Sang-Mok; Cui, Lian-Hua; Lee, Ki-June; Kim, GiRak; Yun, Cheol-Heui; Kang, Sang-Kee; Choi, Yun-Jaie; Cho, Chong-Su

    2018-04-12

    One of the most challenging aspects of probiotics as a replacement for antibiotics is to enhance their antimicrobial activity against pathogens. Given that prebiotics stimulate the growth and/or activity of probiotics, we developed phthalyl inulin nanoparticles (PINs) as prebiotics and observed their effects on the cellular and antimicrobial activities of Pediococcus acidilactici (PA). First, we assessed the internalization of PINs into PA. The internalization of PINs was largely regulated by glucose transporters in PA, and the process was energy-dependent. Once internalized, PINs induced PA to produce substantial amounts of antimicrobial peptide (pediocin), which is effective against both Gram-positive (Salmonella Gallinarum) and Gram-negative (Listeria monocytogenes) pathogens. When treated with small-sized PINs, PA witnessed a nine-fold increase in antimicrobial activity. The rise in pediocin activity in PA treated with PINs was accompanied by enhanced expression of stress response genes (groEL, groES, dnaK) and pediocin biosynthesis genes (pedA, pedD). Although the mechanism is not clear, it appears that the internalization of PINs by PA causes mild stress to activate the PA defense system, leading to increased production of pediocin. Overall, we identified a prebiotic in nanoparticle form for intracellular stimulation of probiotics, demonstrating a new avenue for the biological production of antimicrobial peptides.

  14. Using quality scores and longer reads improves accuracy of Solexa read mapping

    Directory of Open Access Journals (Sweden)

    Xuan Zhenyu

    2008-02-01

    Full Text Available Abstract Background Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing. The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. Results To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at http://rulai.cshl.edu/rmap/. Conclusion Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.

  15. Flexible taxonomic assignment of ambiguous sequencing reads

    Directory of Open Access Journals (Sweden)

    Jansson Jesper

    2011-01-01

    Full Text Available Abstract Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.

  16. SeqHound: biological sequence and structure database as a platform for bioinformatics research

    Directory of Open Access Journals (Sweden)

    Dumontier Michel

    2002-10-01

    Full Text Available Abstract Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

  17. ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data.

    Science.gov (United States)

    Gardeux, Vincent; David, Fabrice P A; Shajkofci, Adrian; Schwalie, Petra C; Deplancke, Bart

    2017-10-01

    Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. bart.deplancke@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  18. Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method.

    Science.gov (United States)

    Honda, Shozo; Morichika, Keisuke; Kirino, Yohei

    2016-03-01

    RNA digestions catalyzed by many ribonucleases generate RNA fragments that contain a 2',3'-cyclic phosphate (cP) at their 3' termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named cP-RNA-seq that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3' termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment; hence, subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ∼6 d, excluding the time required for sequencing and bioinformatics analyses, which are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ∼3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5'-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes.

  19. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    Science.gov (United States)

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  20. ChIP-PIT: Enhancing the Analysis of ChIP-Seq Data Using Convex-Relaxed Pair-Wise Interaction Tensor Decomposition.

    Science.gov (United States)

    Zhu, Lin; Guo, Wei-Li; Deng, Su-Ping; Huang, De-Shuang

    2016-01-01

    In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Instead of investigating them independently, several recent studies have convincingly demonstrated that a wealth of scientific insights can be gained by integrative analysis of these ChIP-seq data. However, when used for the purpose of integrative analysis, a serious drawback of current ChIP-seq technique is that it is still expensive and time-consuming to generate ChIP-seq datasets of high standard. Most researchers are therefore unable to obtain complete ChIP-seq data for several TFs in a wide variety of cell lines, which considerably limits the understanding of transcriptional regulation pattern. In this paper, we propose a novel method called ChIP-PIT to overcome the aforementioned limitation. In ChIP-PIT, ChIP-seq data corresponding to a diverse collection of cell types, TFs and genes are fused together using the three-mode pair-wise interaction tensor (PIT) model, and the prediction of unperformed ChIP-seq experimental results is formulated as a tensor completion problem. Computationally, we propose efficient first-order method based on extensions of coordinate descent method to learn the optimal solution of ChIP-PIT, which makes it particularly suitable for the analysis of massive scale ChIP-seq data. Experimental evaluation the ENCODE data illustrate the usefulness of the proposed model.

  1. MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.

    Science.gov (United States)

    Ozaki, Haruka; Iwasaki, Wataru

    2016-08-01

    As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes

    Science.gov (United States)

    Rowley, Jesse W.; Oler, Andrew J.; Tolley, Neal D.; Hunter, Benjamin N.; Low, Elizabeth N.; Nix, David A.; Yost, Christian C.; Zimmerman, Guy A.

    2011-01-01

    Inbred mice are a useful tool for studying the in vivo functions of platelets. Nonetheless, the mRNA signature of mouse platelets is not known. Here, we use paired-end next-generation RNA sequencing (RNA-seq) to characterize the polyadenylated transcriptomes of human and mouse platelets. We report that RNA-seq provides unprecedented resolution of mRNAs that are expressed across the entire human and mouse genomes. Transcript expression and abundance are often conserved between the 2 species. Several mRNAs, however, are differentially expressed in human and mouse platelets. Moreover, previously described functional disparities between mouse and human platelets are reflected in differences at the transcript level, including protease activated receptor-1, protease activated receptor-3, platelet activating factor receptor, and factor V. This suggests that RNA-seq is a useful tool for predicting differences in platelet function between mice and humans. Our next-generation sequencing analysis provides new insights into the human and murine platelet transcriptomes. The sequencing dataset will be useful in the design of mouse models of hemostasis and a catalyst for discovery of new functions of platelets. Access to the dataset is found in the “Introduction.” PMID:21596849

  3. Kurzweil Reading Machine: A Partial Evaluation of Its Optical Character Recognition Error Rate.

    Science.gov (United States)

    Goodrich, Gregory L.; And Others

    1979-01-01

    A study designed to assess the ability of the Kurzweil reading machine (a speech reading device for the visually handicapped) to read three different type styles produced by five different means indicated that the machines tested had different error rates depending upon the means of producing the copy and upon the type style used. (Author/CL)

  4. Performance Evaluation of a Novel Optimization Sequential Algorithm (SeQ Code for FTTH Network

    Directory of Open Access Journals (Sweden)

    Fazlina C.A.S.

    2017-01-01

    Full Text Available The SeQ codes has advantages, such as variable cross-correlation property at any given number of users and weights, as well as effectively suppressed the impacts of phase induced intensity noise (PIIN and multiple access interference (MAI cancellation property. The result revealed, at system performance analysis of BER = 10-09, the SeQ code capable to achieved 1 Gbps up to 60 km.

  5. SeqAPASS: Predicting chemical susceptibility to threatened/endangered species

    Science.gov (United States)

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; https://seqapass.epa.gov/seqapass/) application was devel...

  6. 4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments.

    Science.gov (United States)

    Raviram, Ramya; Rocha, Pedro P; Müller, Christian L; Miraldi, Emily R; Badri, Sana; Fu, Yi; Swanzey, Emily; Proudhon, Charlotte; Snetkova, Valentina; Bonneau, Richard; Skok, Jane A

    2016-03-01

    4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.

  7. 4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments.

    Directory of Open Access Journals (Sweden)

    Ramya Raviram

    2016-03-01

    Full Text Available 4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait" that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.

  8. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data.

    Science.gov (United States)

    Tran, Ngoc Tam L; Huang, Chun-Hsi

    2014-02-20

    ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data.

  9. VoSeq: a voucher and DNA sequence web application.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit.

  10. 12038_2016_9630_Supplementary 1..4

    Indian Academy of Sciences (India)

    Supplementary file 2. b a. HiSeq dataset with SRA accession number SRR892664 and read length 150 bases. (H. sapiens), b. MiSeq dataset with SRA accession number ERP000362 and read length 250 bases (H. sapiens), c. GA IIx dataset with SRA accession number SRR660877 and read length 91 bases (H. sapiens), ...

  11. A comprehensive simulation study on classification of RNA-Seq data.

    Directory of Open Access Journals (Sweden)

    Gökmen Zararsız

    Full Text Available RNA sequencing (RNA-Seq is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM, classification and regression trees (CART, and random forests (RF. We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count

  12. Spatio-temporal model for multiple ChIP-seq experiments

    NARCIS (Netherlands)

    Ranciati, Saverio; Viroli, Cinzia; Wit, Ernst

    2015-01-01

    The increasing availability of ChIP-seq data demands for advanced statistical tools to analyze the results of such experiments. The inherent features of high-throughput sequencing output call for a modelling framework that can account for the spatial dependency between neighboring regions of the

  13. Tracing the Jet Contribution to the Mid-IR over the 2005 Outburst of GRO J1655-40 via Broadband Spectral Modeling

    Science.gov (United States)

    Migliari, S.; Tomsick, J. A.; Markoff, S.; Kalemci, E.; Bailyn, C. D.; Buxton, M.; Corbel, S; Fender, R. P.; Kaaret, P.

    2007-01-01

    We present new results from a multi-wavelength (radio/infrared/optical/X-ray) study of the black hole Xray binary GRO 51655-40 during its 2005 outburst. We detected, for the first time, mid-infrared emission at 24 micron from the compact jet of a black hole X-ray binary during its hard state, when the source shows emission from a radio compact jet, as well as a strong non-thermal hard X-ray component. These detections strongly constrain the optically thick part of the synchrotron spectrum of the compact jet, which is consistent with it being flat over 4 orders of magnitude in frequency. Moreover, using this unprecedented coverage, and especially thanks to the new Spitzer observations, we can test broadband disk and jet models during the hard state. Two of the hard-state broadband spectra are reasonably well fitted using a jet model with parameters that overall are similar to those previously found for Cyg X-1 and GX 339-4. Differences are also present; most notably, the jet power in GRO J1655-40 appears to be a factor of at least approximately 3-5 higher (depending on the distance) than those of Cyg X-1 and GX-339-4 at comparable disk luminosities. Furthermore, a few discrepancies between the model and the data, previously not found for the other two black hole systems for which there was no mid-IR/IR and optical coverage, are evident, and will help to constrain and refine theoretical models.

  14. RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome

    Directory of Open Access Journals (Sweden)

    Severin Andrew J

    2010-08-01

    Full Text Available Abstract Background Next generation sequencing is transforming our understanding of transcriptomes. It can determine the expression level of transcripts with a dynamic range of over six orders of magnitude from multiple tissues, developmental stages or conditions. Patterns of gene expression provide insight into functions of genes with unknown annotation. Results The RNA Seq-Atlas presented here provides a record of high-resolution gene expression in a set of fourteen diverse tissues. Hierarchical clustering of transcriptional profiles for these tissues suggests three clades with similar profiles: aerial, underground and seed tissues. We also investigate the relationship between gene structure and gene expression and find a correlation between gene length and expression. Additionally, we find dramatic tissue-specific gene expression of both the most highly-expressed genes and the genes specific to legumes in seed development and nodule tissues. Analysis of the gene expression profiles of over 2,000 genes with preferential gene expression in seed suggests there are more than 177 genes with functional roles that are involved in the economically important seed filling process. Finally, the Seq-atlas also provides a means of evaluating existing gene model annotations for the Glycine max genome. Conclusions This RNA-Seq atlas extends the analyses of previous gene expression atlases performed using Affymetrix GeneChip technology and provides an example of new methods to accommodate the increase in transcriptome data obtained from next generation sequencing. Data contained within this RNA-Seq atlas of Glycine max can be explored at http://www.soybase.org/soyseq.

  15. Differential Gene Expression in Ovaries of Qira Black Sheep and Hetian Sheep Using RNA-Seq Technique

    Science.gov (United States)

    Jia, Bin; Zhang, Yong Sheng; Wang, Xu Hai; Zeng, Xian Cun

    2015-01-01

    The Qira black sheep and the Hetian sheep are two local breeds in the Northwest of China, which are characterized by high-fecundity and low-fecundity breed respectively. The elucidation of mRNA expression profiles in the ovaries among different sheep breeds representing fecundity extremes will helpful for identification and utilization of major prolificacy genes in sheep. In the present study, we performed RNA-seq technology to compare the difference in ovarian mRNA expression profiles between Qira black sheep and Hetian sheep. From the Qira black sheep and the Hetian sheep libraries, we obtained a total of 11,747,582 and 11,879,968 sequencing reads, respectively. After aligning to the reference sequences, the two libraries included 16,763 and 16,814 genes respectively. A total of 1,252 genes were significantly differentially expressed at Hetian sheep compared with Qira black sheep. Eight differentially expressed genes were randomly selected for validation by real-time RT-PCR. This study provides a basic data for future research of the sheep reproduction. PMID:25790350

  16. Differential gene expression in ovaries of Qira black sheep and Hetian sheep using RNA-Seq technique.

    Directory of Open Access Journals (Sweden)

    Han Ying Chen

    Full Text Available The Qira black sheep and the Hetian sheep are two local breeds in the Northwest of China, which are characterized by high-fecundity and low-fecundity breed respectively. The elucidation of mRNA expression profiles in the ovaries among different sheep breeds representing fecundity extremes will helpful for identification and utilization of major prolificacy genes in sheep. In the present study, we performed RNA-seq technology to compare the difference in ovarian mRNA expression profiles between Qira black sheep and Hetian sheep. From the Qira black sheep and the Hetian sheep libraries, we obtained a total of 11,747,582 and 11,879,968 sequencing reads, respectively. After aligning to the reference sequences, the two libraries included 16,763 and 16,814 genes respectively. A total of 1,252 genes were significantly differentially expressed at Hetian sheep compared with Qira black sheep. Eight differentially expressed genes were randomly selected for validation by real-time RT-PCR. This study provides a basic data for future research of the sheep reproduction.

  17. A SUPER-EDDINGTON, COMPTON-THICK WIND IN GRO J1655–40?

    Energy Technology Data Exchange (ETDEWEB)

    Neilsen, J.; Homan, J. [MIT Kavli Institute for Astrophysics and Space Research, Cambridge, MA 02139 (United States); Rahoui, F. [European Southern Observatory, Karl Schwarzschild-Strasse 2, D-85748 Garching bei Munchen (Germany); Buxton, M., E-mail: jneilsen@space.mit.edu [Department of Astronomy, Yale University, P.O. Box 208101, New Haven, CT 06520-8101 (United States)

    2016-05-01

    During its 2005 outburst, GRO J1655–40 was observed at high spectral resolution with the Chandra High-Energy Transmission Grating Spectrometer, revealing a spectrum rich with blueshifted absorption lines indicative of an accretion disk wind—apparently too hot, too dense, and too close to the black hole to be driven by radiation pressure or thermal pressure (Miller et al.). However, this exotic wind represents just one piece of the puzzle in this outburst, as its presence coincides with an extremely soft and curved X-ray continuum spectrum, remarkable X-ray variability (Uttley and Klein-Wolt), and a bright, unexpected optical/infrared blackbody component that varies on the orbital period. Focusing on the X-ray continuum and the optical/infrared/UV spectral energy distribution, we argue that the unusual features of this “hypersoft state” are natural consequences of a super-Eddington Compton-thick wind from the disk: the optical/infrared blackbody represents the cool photosphere of a dense, extended outflow, while the X-ray emission is explained as Compton scattering by the relatively cool, optically thick wind. This wind obscures the intrinsic luminosity of the inner disk, which we suggest may have been at or above the Eddington limit.

  18. Identification of innate lymphoid cells in single-cell RNA-Seq data.

    Science.gov (United States)

    Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David

    2017-07-01

    Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.

  19. Use of a Chimeric Hsp70 to Enhance the Quality of Recombinant Plasmodium falciparum S-Adenosylmethionine Decarboxylase Protein Produced in Escherichia coli

    Science.gov (United States)

    Makhoba, Xolani Henry; Burger, Adélle; Coertzen, Dina; Zininga, Tawanda; Birkholtz, Lyn-Marie; Shonhai, Addmore

    2016-01-01

    S-adenosylmethionine decarboxylase (PfAdoMetDC) from Plasmodium falciparum is a prospective antimalarial drug target. The production of recombinant PfAdoMetDC for biochemical validation as a drug target is important. The production of PfAdoMetDC in Escherichia coli has been reported to result in unsatisfactory yields and poor quality product. The co-expression of recombinant proteins with molecular chaperones has been proposed as one way to improve the production of the former in E. coli. E. coli heat shock proteins DnaK, GroEL-GroES and DnaJ have previously been used to enhance production of some recombinant proteins. However, the outcomes were inconsistent. An Hsp70 chimeric protein, KPf, which is made up of the ATPase domain of E. coli DnaK and the substrate binding domain of P. falciparum Hsp70 (PfHsp70) has been previously shown to exhibit chaperone function when it was expressed in E. coli cells whose resident Hsp70 (DnaK) function was impaired. We proposed that because of its domain constitution, KPf would most likely be recognised by E. coli Hsp70 co-chaperones. Furthermore, because it possesses a substrate binding domain of plasmodial origin, KPf would be primed to recognise recombinant PfAdoMetDC expressed in E. coli. First, using site-directed mutagenesis, followed by complementation assays, we established that KPf with a mutation in the hydrophobic residue located in its substrate binding cavity was functionally compromised. We further co-expressed PfAdoMetDC with KPf, PfHsp70 and DnaK in E. coli cells either in the absence or presence of over-expressed GroEL-GroES chaperonin. The folded and functional status of the produced PfAdoMetDC was assessed using limited proteolysis and enzyme assays. PfAdoMetDC co-expressed with KPf and PfHsp70 exhibited improved activity compared to protein co-expressed with over-expressed DnaK. Our findings suggest that chimeric KPf may be an ideal Hsp70 co-expression partner for the production of recombinant plasmodial

  20. Transcriptomic analysis of ‘Suli’ pear (Pyrus pyrifolia white pear group buds during the dormancy by RNA-Seq

    Directory of Open Access Journals (Sweden)

    Liu Guoqin

    2012-12-01

    Full Text Available Abstract Background Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy. Results We performed de novo transcriptome assembly and digital gene expression (DGE profiling analyses of ‘Suli’ pear (Pyrus pyrifolia white pear group using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp, including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1% unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR. Conclusions The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species.

  1. RNA-seq analysis of overexpressing ovine AANAT gene of melatonin biosynthesis in switchgrass

    Directory of Open Access Journals (Sweden)

    Shan Yuan

    2016-08-01

    Full Text Available Melatonin serves important functions in the promotion of growth and anti-stress regulation by efficient radical scavenging and regulation of antioxidant enzyme activity in various plants. To investigate its regulatory roles and metabolism pathways, the transcriptomic profile of overexpressing the ovine arylalkylamine N-acetyltransferase (oAANAT gene, encoding the penultimate enzyme in melatonin biosynthesis, was compared with empty vector (EV control using RNA-seq in switchgrass, a model plant of cellulosic ethanol conversion. The 85.22 million high quality reads that were assembled into 135,684 unigenes were generated by Illumina sequencing for transgenic oAANAT switchgrass with an average sequence length of 716 bp. A total of 946 differential expression genes (DEGs in transgenic line comparing to control switchgrass, including 737 up-regulated and 209 down-regulated genes, were mainly enriched with two main functional patterns of melatonin identifying by gene ontology analysis: the growth regulator and stress tolerance. Furthermore, KEGG maps indicated that the biosynthetic pathways of secondary metabolite (phenylpropanoids, flavonoids, steroids, stilbenoid, diarylheptanoid and gingerol and signaling pathways (MAPK signaling pathway, estrogen signaling pathway were involved in melatonin metabolism. This study substantially expands the transcriptome information for switchgrass and provides valuable clues for identifying candidate genes involved in melatonin biosynthesis and elucidating the mechanism of melatonin metabolism.

  2. Computational Methods for ChIP-seq Data Analysis and Applications

    KAUST Repository

    Ashoor, Haitham

    2017-01-01

    four main challenges. First, I address the problem of detecting histone modifications from ChIP-seq cancer samples. The presence of copy number variations (CNVs) in cancer samples results in statistical biases that lead to inaccurate predictions when

  3. Computer and Statistical Analysis of Transcription Factor Binding and Chromatin Modifications by ChIP-seq data in Embryonic Stem Cell

    Directory of Open Access Journals (Sweden)

    Orlov Yuriy

    2012-06-01

    Full Text Available Advances in high throughput sequencing technology have enabled the identification of transcription factor (TF binding sites in genome scale. TF binding studies are important for medical applications and stem cell research. Somatic cells can be reprogrammed to a pluripotent state by the combined introduction of factors such as Oct4, Sox2, c-Myc, Klf4. These reprogrammed cells share many characteristics with embryonic stem cells (ESCs and are known as induced pluripotent stem cells (iPSCs. The signaling requirements for maintenance of human and murine embryonic stem cells (ESCs differ considerably. Genome wide ChIP-seq TF binding maps in mouse stem cells include Oct4, Sox2, Nanog, Tbx3, Smad2 as well as group of other factors. ChIP-seq allows study of new candidate transcription factors for reprogramming. It was shown that Nr5a2 could replace Oct4 for reprogramming. Epigenetic modifications play important role in regulation of gene expression adding additional complexity to transcription network functioning. We have studied associations between different histone modification using published data together with RNA Pol II sites. We found strong associations between activation marks and TF binding sites and present it qualitatively. To meet issues of statistical analysis of genome ChIP-sequencing maps we developed computer program to filter out noise signals and find significant association between binding site affinity and number of sequence reads. The data provide new insights into the function of chromatin organization and regulation in stem cells.

  4. Massively parallel sequencing, aCGH, and RNA-Seq technologies provide a comprehensive molecular diagnosis of Fanconi anemia.

    Science.gov (United States)

    Chandrasekharappa, Settara C; Lach, Francis P; Kimble, Danielle C; Kamat, Aparna; Teer, Jamie K; Donovan, Frank X; Flynn, Elizabeth; Sen, Shurjo K; Thongthip, Supawat; Sanborn, Erica; Smogorzewska, Agata; Auerbach, Arleen D; Ostrander, Elaine A

    2013-05-30

    Current methods for detecting mutations in Fanconi anemia (FA)-suspected patients are inefficient and often miss mutations. We have applied recent advances in DNA sequencing and genomic capture to the diagnosis of FA. Specifically, we used custom molecular inversion probes or TruSeq-enrichment oligos to capture and sequence FA and related genes, including introns, from 27 samples from the International Fanconi Anemia Registry at The Rockefeller University. DNA sequencing was complemented with custom array comparative genomic hybridization (aCGH) and RNA sequencing (RNA-seq) analysis. aCGH identified deletions/duplications in 4 different FA genes. RNA-seq analysis revealed lack of allele specific expression associated with a deletion and splicing defects caused by missense, synonymous, and deep-in-intron variants. The combination of TruSeq-targeted capture, aCGH, and RNA-seq enabled us to identify the complementation group and biallelic germline mutations in all 27 families: FANCA (7), FANCB (3), FANCC (3), FANCD1 (1), FANCD2 (3), FANCF (2), FANCG (2), FANCI (1), FANCJ (2), and FANCL (3). FANCC mutations are often the cause of FA in patients of Ashkenazi Jewish (AJ) ancestry, and we identified 2 novel FANCC mutations in 2 patients of AJ ancestry. We describe here a strategy for efficient molecular diagnosis of FA.

  5. Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster.

    Science.gov (United States)

    Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian

    2014-04-01

    Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations. © 2013 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  6. Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks

    Directory of Open Access Journals (Sweden)

    Courdy Samir J

    2008-12-01

    Full Text Available Abstract Background High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq. Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation. Results Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR. Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds. Conclusion The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from http://useq.sourceforge.net/.

  7. Long-read sequencing data analysis for yeasts.

    Science.gov (United States)

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  8. Role of SeqA and Dam in Escherichia coli gene expression: A global/microarray analysis

    DEFF Research Database (Denmark)

    Løbner-Olesen, Anders; Marinus, M.G.; Hansen, Flemming G.

    2003-01-01

    High-density oligonucleotide arrays were used to monitor global transcription patterns in Escherichia coli with various levels of Dam and SeqA proteins. Cells lacking Dam methyltransferase showed a modest increase in transcription of the genes belonging to the SOS regulon. Bacteria devoid...... of the SeqA protein, which preferentially binds hemimethylated DNA, were found to have a transcriptional profile almost identical to WT bacteria overexpressing Dam methyltransferase. The latter two strains differed from WT in two ways. First, the origin proximal genes were transcribed with increased...... frequency due to increased gene dosage. Second, chromosomal domains of high transcriptional activity alternate with regions of low activity, and our results indicate that the activity in each domain is modulated in the same way by SeqA deficiency or Dam overproduction. We suggest that the methylation status...

  9. Transcriptomic Identification of Drought-Related Genes and SSR Markers in Sudan Grass Based on RNA-Seq

    Directory of Open Access Journals (Sweden)

    Yongqun Zhu

    2017-05-01

    Full Text Available Sudan grass (Sorghum sudanense is an annual warm-season gramineous forage grass that is widely used as pasture, hay, and silage. However, drought stress severely impacts its yield, and there is limited information about the mechanisms of drought tolerance in Sudan grass. In this study, we used next-generation sequencing to identify differentially expressed genes (DEGs in the Sudan grass variety Wulate No.1, and we developed simple sequence repeat (SSR markers associated with drought stress. From 852,543,826 raw reads, nearly 816,854,366 clean reads were identified and used for analysis. A total of 80,686 unigenes were obtained via de novo assembly of the clean reads including 45,065 unigenes (55.9% that were identified as coding sequences (CDSs. According to Gene Ontology analysis, 31,444 unigenes were annotated, 11,778 unigenes were identified to 25 categories in the clusters of orthologous groups of proteins (KOG classification, and 11,223 unigenes were assigned to 280 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Additionally, there were 2,329 DEGs under a short-term of 25% polyethylene glycol (PEG treatment, while 5,101 DEGs were identified under the long-term of 25% PEG treatment. DEGs were enriched in pathways of carbon fixation in photosynthetic organisms and plant hormone signal transduction which played a leading role in short-term of drought stress. However, DEGs were mainly enriched in pathway of plant hormone signal transduction that played an important role under long-term of drought stress. To increase accuracy, we excluded all the DEGs of all controls, specifically, five DEGs that were associated with high PEG concentrations were found through RNA-Seq. All five genes were up-regulated under drought stress, but the functions of the genes remain unclear. In addition, we identified 17,548 SSRs obtained from 80,686 unigenes. The newly identified drought tolerance DEGs will contribute to transgenic breeding efforts, while

  10. iMir: an integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq.

    Science.gov (United States)

    Giurato, Giorgio; De Filippo, Maria Rosaria; Rinaldi, Antonio; Hashim, Adnan; Nassa, Giovanni; Ravo, Maria; Rizzo, Francesca; Tarallo, Roberta; Weisz, Alessandro

    2013-12-13

    RNAs. In addition, iMir allowed also the identification of ~70 piRNAs (piwi-interacting RNAs), some of which differentially expressed in proliferating vs growth arrested cells. The integrated data analysis pipeline described here is based on a reliable, flexible and fully automated workflow, useful to rapidly and efficiently analyze high-throughput smallRNA-Seq data, such as those produced by the most recent high-performance next generation sequencers. iMir is available at http://www.labmedmolge.unisa.it/inglese/research/imir.

  11. Developing reading literacy by reading badge

    OpenAIRE

    Rejc, Blanka

    2017-01-01

    Reading is a fundamental activity of our society and is present in all areas of a person’s life. Authors who deal with reading define reading with different definitions, some of them I also presented in my master’s degree thesis. The ways of reading, typology of readers and knowledge of different reading models are only some of the important theoretical facts that serve as a basis for the research and defining reading. Reading motivation is an important motivational factor, which encourages a...

  12. Exploring the Relationship between Adolescent's Reading Skills, Reading Motivation and Reading Habits

    Science.gov (United States)

    McGeown, Sarah P.; Duncan, Lynne G.; Griffiths, Yvonne M.; Stothard, Sue E.

    2015-01-01

    The present study examines the extent to which adolescents' reading affect (reading motivation) and behaviour (reading habits) predict different components of reading (word reading, comprehension, summarisation and text reading speed) and also adds to the limited research examining group differences (gender, age, ability) in adolescents' reading…

  13. Accurate clinical genetic testing for autoinflammatory diseases using the next-generation sequencing platform MiSeq

    Directory of Open Access Journals (Sweden)

    Manabu Nakayama

    2017-03-01

    Full Text Available Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA, also known as neonatal-onset multisystem inflammatory disease (NOMID. In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program—which detects and avoids common SNPs in gene-specific PCR primers—we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of

  14. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

    Science.gov (United States)

    Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

    2011-03-24

    The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This

  15. Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data

    Directory of Open Access Journals (Sweden)

    Minucci Saverio

    2011-10-01

    Full Text Available Abstract Background High-throughput sequencing is generating massive amounts of data at a pace that largely exceeds the throughput of data analysis routines. Here we introduce Fish the ChIPs (FC, a computational pipeline aimed at a broad public of users and designed to perform complete ChIP-Seq data analysis of an unlimited number of samples, thus increasing throughput, reproducibility and saving time. Results Starting from short read sequences, FC performs the following steps: 1 quality controls, 2 alignment to a reference genome, 3 peak calling, 4 genomic annotation, 5 generation of raw signal tracks for visualization on the UCSC and IGV genome browsers. FC exploits some of the fastest and most effective tools today available. Installation on a Mac platform requires very basic computational skills while configuration and usage are supported by a user-friendly graphic user interface. Alternatively, FC can be compiled from the source code on any Unix machine and then run with the possibility of customizing each single parameter through a simple configuration text file that can be generated using a dedicated user-friendly web-form. Considering the execution time, FC can be run on a desktop machine, even though the use of a computer cluster is recommended for analyses of large batches of data. FC is perfectly suited to work with data coming from Illumina Solexa Genome Analyzers or ABI SOLiD and its usage can potentially be extended to any sequencing platform. Conclusions Compared to existing tools, FC has two main advantages that make it suitable for a broad range of users. First of all, it can be installed and run by wet biologists on a Mac machine. Besides it can handle an unlimited number of samples, being convenient for large analyses. In this context, computational biologists can increase reproducibility of their ChIP-Seq data analyses while saving time for downstream analyses. Reviewers This article was reviewed by Gavin Huttley, George

  16. High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).

    Science.gov (United States)

    Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling

    2018-01-15

    Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.

  17. A validated pipeline for detection of SNVs and short InDels from RNA Sequencing

    Directory of Open Access Journals (Sweden)

    Nitin Mandloi

    2017-12-01

    In this study, we have developed a pipeline to detect germline variants from RNA-seq data. The pipeline steps include: pre-processing, alignment, GATK best practices for RNA-seq and variant filtering. The pre-processing step includes base and adapter trimming and removal of contamination reads from rRNA, tRNA, mitochondrial DNA and repeat regions. The read alignment of the pre-processed reads is performed using STAR/HiSAT. After this we used GATK best practices for the RNA-seq dataset to call germline variants. We benchmarked our pipeline on NA12878 RNA-seq data downloaded from SRA (SRR1258218. After variant calling, the quality passed variants were compared against the gold standard variants provided by GIAB consortium. Of the total ~3.6 million high quality variants reported as gold standard variants for this sample (considering whole genome, our pipeline identified ~58,104 variants to be expressed in RNA-seq. Our pipeline achieved more than 99% of sensitivity in detection of germline variants.

  18. SignalSpider: Probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles

    KAUST Repository

    Wong, Kachun

    2014-09-05

    Motivation: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) measures the genome-wide occupancy of transcription factors in vivo. Different combinations of DNA-binding protein occupancies may result in a gene being expressed in different tissues or at different developmental stages. To fully understand the functions of genes, it is essential to develop probabilistic models on multiple ChIP-Seq profiles to decipher the combinatorial regulatory mechanisms by multiple transcription factors. Results: In this work, we describe a probabilistic model (SignalSpider) to decipher the combinatorial binding events of multiple transcription factors. Comparing with similar existing methods, we found SignalSpider performs better in clustering promoter and enhancer regions. Notably, SignalSpider can learn higher-order combinatorial patterns from multiple ChIP-Seq profiles. We have applied SignalSpider on the normalized ChIP-Seq profiles from the ENCODE consortium and learned model instances. We observed different higher-order enrichment and depletion patterns across sets of proteins. Those clustering patterns are supported by Gene Ontology (GO) enrichment, evolutionary conservation and chromatin interaction enrichment, offering biological insights for further focused studies. We also proposed a specific enrichment map visualization method to reveal the genome-wide transcription factor combinatorial patterns from the models built, which extend our existing fine-scale knowledge on gene regulation to a genome-wide level. Availability and implementation: The matrix-algebra-optimized executables and source codes are available at the authors\\' websites: http://www.cs.toronto.edu/∼wkc/SignalSpider. Contact: Supplementary information: Supplementary data are available at Bioinformatics online.

  19. SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory.

    Science.gov (United States)

    Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N

    2014-01-01

    A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  20. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data

    Directory of Open Access Journals (Sweden)

    Gokmen Zararsiz

    2017-10-01

    Full Text Available RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom extensions of the nearest shrunken centroids (NSC and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom’s precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  1. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.

    Science.gov (United States)

    Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet

    2017-01-01

    RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  2. Comparative Transcriptome Analysis of a Toxin-Producing Dinoflagellate Alexandrium catenella and Its Non-Toxic Mutant

    Directory of Open Access Journals (Sweden)

    Yong Zhang

    2014-11-01

    Full Text Available The dinoflagellates and cyanobacteria are two major kingdoms of life producing paralytic shellfish toxins (PSTs, a large group of neurotoxic alkaloids causing paralytic shellfish poisonings around the world. In contrast to the well elucidated PST biosynthetic genes in cyanobacteria, little is known about the dinoflagellates. This study compared transcriptome profiles of a toxin-producing dinoflagellate, Alexandrium catenella (ACHK-T, and its non-toxic mutant form (ACHK-NT using RNA-seq. All clean reads were assembled de novo into a total of 113,674 unigenes, and 66,812 unigenes were annotated in the known databases. Out of them, 35 genes were found to express differentially between the two strains. The up-regulated genes in ACHK-NT were involved in photosynthesis, carbon fixation and amino acid metabolism processes, indicating that more carbon and energy were utilized for cell growth. Among the down-regulated genes, expression of a unigene assigned to the long isoform of sxtA, the initiator of toxin biosynthesis in cyanobacteria, was significantly depressed, suggesting that this long transcript of sxtA might be directly involved in toxin biosynthesis and its depression resulted in the loss of the ability to synthesize PSTs in ACHK-NT. In addition, 101 putative homologs of 12 cyanobacterial sxt genes were identified, and the sxtO and sxtZ genes were identified in dinoflagellates for the first time. The findings of this study should shed light on the biosynthesis of PSTs in the dinoflagellates.

  3. A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Directory of Open Access Journals (Sweden)

    Timothy T Perkins

    2009-07-01

    Full Text Available High-density, strand-specific cDNA sequencing (ssRNA-seq was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi. By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR. An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

  4. I read, you read, we read: the history of reading in Slovenia

    Directory of Open Access Journals (Sweden)

    Anja Dular

    2013-03-01

    Full Text Available ABSTRACTPurpose: The aim of the article is to research reading habits in Slovenia in the period between 16th and 19th century and to find similarities with Austria and other European countries of that time.Methodology/approach: For the purpose of the analysis different resources were used – study books, catechisms, prayer books and manuals. We were focused on introductions in which readers are advised how to read, explaining to whom the work is intended and emphasizing the importance of meditation on the texts.Results: Historically the laud reading was prefered, as to continue the folk tradition. However, the 16th century texts were transmitted by women while the folk tradition was narrated by males. In the 18th century the higher level of literacy and greater book production and availability caused that the books were not a privilege of a few. At that time more texts were intended for silent, individual reading. Interestingly, the authors emphasized the importance of meditation on the texts, too. It was also advised when to read – it wasrecommedend to read in leisure time on Sundays, and on holidays. The role of books was also to breakaway with the reality and to forget everyday problems. Due to the overproduction of books in the 17th centrury it was concerned that books are misleading the crowds. The church considered the reading of books as inappropriate, and criticized fiction, novels and adventure stories mostly read by women.Research limitation: The study is based on Slovenian texts only, although the foreign literature, especially in German, was generally available, too.Originality/practical implications: The study is fullfiling the gap in the history of reading in Slovenia.

  5. QTL-seq approach identified genomic regions and diagnostic markers for rust and late leaf spot resistance in groundnut (Arachis hypogaea L.).

    Science.gov (United States)

    Pandey, Manish K; Khan, Aamir W; Singh, Vikas K; Vishwakarma, Manish K; Shasidhar, Yaduru; Kumar, Vinay; Garg, Vanika; Bhat, Ramesh S; Chitikineni, Annapurna; Janila, Pasupuleti; Guo, Baozhu; Varshney, Rajeev K

    2017-08-01

    Rust and late leaf spot (LLS) are the two major foliar fungal diseases in groundnut, and their co-occurrence leads to significant yield loss in addition to the deterioration of fodder quality. To identify candidate genomic regions controlling resistance to rust and LLS, whole-genome resequencing (WGRS)-based approach referred as 'QTL-seq' was deployed. A total of 231.67 Gb raw and 192.10 Gb of clean sequence data were generated through WGRS of resistant parent and the resistant and susceptible bulks for rust and LLS. Sequence analysis of bulks for rust and LLS with reference-guided resistant parent assembly identified 3136 single-nucleotide polymorphisms (SNPs) for rust and 66 SNPs for LLS with the read depth of ≥7 in the identified genomic region on pseudomolecule A03. Detailed analysis identified 30 nonsynonymous SNPs affecting 25 candidate genes for rust resistance, while 14 intronic and three synonymous SNPs affecting nine candidate genes for LLS resistance. Subsequently, allele-specific diagnostic markers were identified for three SNPs for rust resistance and one SNP for LLS resistance. Genotyping of one RIL population (TAG 24 × GPBD 4) with these four diagnostic markers revealed higher phenotypic variation for these two diseases. These results suggest usefulness of QTL-seq approach in precise and rapid identification of candidate genomic regions and development of diagnostic markers for breeding applications. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  6. Developmental, Component-Based Model of Reading Fluency: An Investigation of Predictors of Word-Reading Fluency, Text-Reading Fluency, and Reading Comprehension.

    Science.gov (United States)

    Kim, Young-Suk Grace

    2015-01-01

    The primary goal was to expand our understanding of text reading fluency (efficiency or automaticity)-how its relation to other constructs (e.g., word reading fluency and reading comprehension) changes over time and how it is different from word reading fluency and reading comprehension. We examined (1) developmentally changing relations among word reading fluency, listening comprehension, text reading fluency, and reading comprehension; (2) the relation of reading comprehension to text reading fluency; (3) unique emergent literacy predictors (i.e., phonological awareness, orthographic awareness, morphological awareness, letter name knowledge, vocabulary) of text reading fluency vs. word reading fluency; and (4) unique language and cognitive predictors (e.g., vocabulary, grammatical knowledge, theory of mind) of text reading fluency vs. reading comprehension. These questions were addressed using longitudinal data (two timepoints; Mean age = 5;24 & 6;08) from Korean-speaking children ( N = 143). Results showed that listening comprehension was related to text reading fluency at time 2, but not at time 1. At both times text reading fluency was related to reading comprehension, and reading comprehension was related to text reading fluency over and above word reading fluency and listening comprehension. Orthographic awareness was related to text reading fluency over and above other emergent literacy skills and word reading fluency. Vocabulary and grammatical knowledge were independently related to text reading fluency and reading comprehension whereas theory of mind was related to reading comprehension, but not text reading fluency. These results reveal developmental nature of relations and mechanism of text reading fluency in reading development.

  7. Does Extensive Reading Promote Reading Speed?

    Science.gov (United States)

    He, Mu

    2014-01-01

    Research has shown a wide range of learning benefits accruing from extensive reading. Not only is there improvement in reading, but also in a wide range of language uses and areas of language knowledge. However, few research studies have examined reading speed. The existing literature on reading speed focused on students' reading speed without…

  8. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    Directory of Open Access Journals (Sweden)

    Johanna Rhodes

    Full Text Available The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina, along with two new kits: the TruSeq Nano DNA kit (Illumina and the NEBNext Ultra DNA kit (New England Biolabs to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality being considered when ultimately deciding on which library prep method to use.

  9. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    Science.gov (United States)

    Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C

    2014-01-01

    The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.

  10. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

    KAUST Repository

    Shao, Mingfu; Ma, Jianzhu; Wang, Sheng

    2017-01-01

    Motivation: Reconstructing the full- length expressed transcripts (a. k. a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

  11. DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

    KAUST Repository

    Shao, Mingfu

    2017-04-20

    Motivation: Reconstructing the full- length expressed transcripts (a. k. a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak.

  12. Allosteric transitions of supramolecular systems explored by network models: application to chaperonin GroEL.

    Directory of Open Access Journals (Sweden)

    Zheng Yang

    2009-04-01

    Full Text Available Identification of pathways involved in the structural transitions of biomolecular systems is often complicated by the transient nature of the conformations visited across energy barriers and the multiplicity of paths accessible in the multidimensional energy landscape. This task becomes even more challenging in exploring molecular systems on the order of megadaltons. Coarse-grained models that lend themselves to analytical solutions appear to be the only possible means of approaching such cases. Motivated by the utility of elastic network models for describing the collective dynamics of biomolecular systems and by the growing theoretical and experimental evidence in support of the intrinsic accessibility of functional substates, we introduce a new method, adaptive anisotropic network model (aANM, for exploring functional transitions. Application to bacterial chaperonin GroEL and comparisons with experimental data, results from action minimization algorithm, and previous simulations support the utility of aANM as a computationally efficient, yet physically plausible, tool for unraveling potential transition pathways sampled by large complexes/assemblies. An important outcome is the assessment of the critical inter-residue interactions formed/broken near the transition state(s, most of which involve conserved residues.

  13. Genotype-driven identification of a molecular network predictive of advanced coronary calcium in ClinSeq® and Framingham Heart Study cohorts.

    Science.gov (United States)

    Oguz, Cihan; Sen, Shurjo K; Davis, Adam R; Fu, Yi-Ping; O'Donnell, Christopher J; Gibbons, Gary H

    2017-10-26

    One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD). Model inputs were derived from advanced cases in the ClinSeq®; discovery cohort (n=16) and the FHS replication cohort (n=36) from 89 th -99 th CAC score percentile range, and age-matched controls (ClinSeq®; n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables and genotypes of 56 single nucleotide polymorphisms (SNPs) ranked highest in terms of their nominal correlation with the advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristic curves (ROC-AUC). RF models trained and tested with clinical variables generated ROC-AUC values of 0.69 and 0.61 in the discovery and replication cohorts, respectively. In contrast, in both cohorts, the set of SNPs derived from the discovery cohort were highly predictive (ROC-AUC ≥0.85) with no significant change in predictive performance upon integration of clinical and genotype variables. Using the 21 SNPs that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq®; data and tested with FHS data and obtained high predictive accuracy (ROC-AUC=0.80-0.85) with several topologies. Several CAD and "vascular aging" related biological processes were enriched in the network of genes constructed from the predictive SNPs. We identified a molecular network predictive of advanced coronary calcium using genotype data from ClinSeq®; and FHS cohorts. Our results illustrate that machine learning tools, which utilize complex interactions between disease

  14. Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris.

    Directory of Open Access Journals (Sweden)

    Marie Mooney

    Full Text Available Comparative oncology is a developing research discipline that is being used to assist our understanding of human neoplastic diseases. Companion canines are a preferred animal oncology model due to spontaneous tumor development and similarity to human disease at the pathophysiological level. We use a paired RNA sequencing (RNA-Seq/microarray analysis of a set of four normal canine lymph nodes and ten canine lymphoma fine needle aspirates to identify technical biases and variation between the technologies and convergence on biological disease pathways. Surrogate Variable Analysis (SVA provides a formal multivariate analysis of the combined RNA-Seq/microarray data set. Applying SVA to the data allows us to decompose variation into contributions associated with transcript abundance, differences between the technology, and latent variation within each technology. A substantial and highly statistically significant component of the variation reflects transcript abundance, and RNA-Seq appeared more sensitive for detection of transcripts expressed at low levels. Latent random variation among RNA-Seq samples is also distinct in character from that impacting microarray samples. In particular, we observed variation between RNA-Seq samples that reflects transcript GC content. Platform-independent variable decomposition without a priori knowledge of the sources of variation using SVA represents a generalizable method for accomplishing cross-platform data analysis. We identified genes differentially expressed between normal lymph nodes of disease free dogs and a subset of the diseased dogs diagnosed with B-cell lymphoma using each technology. There is statistically significant overlap between the RNA-Seq and microarray sets of differentially expressed genes. Analysis of overlapping genes in the context of biological systems suggests elevated expression and activity of PI3K signaling in B-cell lymphoma biopsies compared with normal biopsies, consistent with

  15. Nascent-Seq reveals novel features of mouse circadian transcriptional regulation

    Science.gov (United States)

    Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael

    2012-01-01

    A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795

  16. RiboMeth-seq

    DEFF Research Database (Denmark)

    Krogh, Nicolai; Birkedal, Ulf; Nielsen, Henrik

    2017-01-01

    combines alkaline fragmentation and a specialized library construction protocol based on 5'-OH and 2',3' cyclic phosphate ends to prepare RNA for sequencing. The read-ends of library fragments are used for mapping with nucleotide resolution and calculation of the fraction of molecules methylated at the 2...

  17. RNA-seq reveals transcriptome changes in goats following myostatin gene knockout

    Science.gov (United States)

    Cai, Bei; Zhou, Shiwei; Zhu, Haijing; Qu, Lei; Wang, Xiaolong

    2017-01-01

    Myostatin (MSTN) is a powerful negative regulator of skeletal muscle mass in mammalian species that is primarily expressed in skeletal muscles, and mutations of its encoding gene can result in the double-muscling trait. In this study, the CRISPR/Cas9 technique was used to edit MSTN in Shaanbei Cashmere goats and generate knockout animals. RNA sequencing was used to determine and compare the transcriptome profiles of the muscles from three wild-type (WT) goats, three fibroblast growth factor 5 (FGF5) knockout goats (FGF5+/- group) and three goats with disrupted expression of both the FGF5 and MSTN genes (FM+/- group). The sequence reads were obtained using the Illumina HiSeq 2000 system and mapped to the Capra hircus reference genome using TopHat (v2.0.9). In total, 68.93, 62.04 and 66.26 million clean sequencing reads were obtained from the WT, FM+/- and FGF5+/- groups, respectively. There were 201 differentially expressed genes (DEGs) between the WT and FGF5+/- groups, with 86 down- and 115 up-regulated genes in the FGF5+/- group. Between the WT and FM+/- groups, 121 DEGs were identified, including 81 down- and 40 up-regulated genes in the FM+/- group. A total of 198 DEGs were detected between the FGF5+/- group and FM+/- group, with 128 down- and 70 up-regulated genes in the FM+/- group. At the transcriptome level, we found substantial changes in genes involved in fatty acid metabolism and the biosynthesis of unsaturated fatty acids, such as stearoyl-CoA dehydrogenase, 3-hydroxyacyl-CoA dehydratase 2, ELOVL fatty acid elongase 6 and fatty acid synthase, suggesting that the expression levels of these genes may be directly regulated by MSTN and that these genes are likely downstream targets of MSTN with potential roles in lipid metabolism in goats. Moreover, five randomly selected DEGs were further validated with qRT-PCR, and the results were consistent with the transcriptome analysis. The present study provides insight into the unique transcriptome profile of the

  18. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.7258 >orf19.7258; Contig19-2507; 88880..89851; DDI1*; response to DNA alkyl...ation; MQLTISLDHSGDIISVDVPDSLCLEDFKAYLSAETGLEASVQVLKFNGRELVGNATLSELQIHDNDLLQLSKKQVA

  19. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.1278 >orf19.1278; Contig19-10104; complement(13162...4..>132028); ; conserved hypothetical protein; truncated protein IQNNKCSGCNLKLDFPVIHFKCKHSFHQKCLSTNLIATSTESS

  20. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.4711 >orf19.4711; Contig19-10212; complement(29836...7..>300616); ; acidic repetitive protein; truncated protein DRSDYNEEDNNDFTRKLNEIQSKESNHEDLAQSEVQEGQKDEPDSVNQ

  1. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available ruitment factor; MAKTRSKSAATAAATSPKASPTAAKVTKNKVTKPSTASPSKTTKTKAVKKTTTKKATPKKEEEEKK... Ca19AnnotatedDec2004aaSeq orf19.124 >orf19.124; Contig19-10035; 67601..68698; CIC1*; protease substrate rec

  2. XplorSeq: a software environment for integrated management and phylogenetic analysis of metagenomic sequence data.

    Science.gov (United States)

    Frank, Daniel N

    2008-10-07

    Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects. XplorSeq is a software package that facilitates the compilation, management and phylogenetic analysis of DNA sequences. XplorSeq was developed for, but is not limited to, high-throughput analysis of environmental rRNA gene sequences. XplorSeq integrates and extends several commonly used UNIX-based analysis tools by use of a Macintosh OS-X-based graphical user interface (GUI). Through this GUI, users may perform basic sequence import and assembly steps (base-calling, vector/primer trimming, contig assembly), perform BLAST (Basic Local Alignment and Search Tool; 123) searches of NCBI and local databases, create multiple sequence alignments, build phylogenetic trees, assemble Operational Taxonomic Units, estimate biodiversity indices, and summarize data in a variety of formats. Furthermore, sequences may be annotated with user-specified meta-data, which then can be used to sort data and organize analyses and reports. A document-based architecture permits parallel analysis of sequence data from multiple clones or amplicons, with sequences and other data stored in a single file. XplorSeq should benefit researchers who are engaged in analyses of environmental sequence data, especially those with little experience using bioinformatics software. Although XplorSeq was developed for management of rDNA sequence data, it can be applied to most any sequencing project. The application is available free of charge for non-commercial use at http://vent.colorado.edu/phyloware.

  3. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  4. The maize glossy13 gene, cloned via BSR-Seq and Seq-walking encodes a putative ABC transporter required for the normal accumulation of epicuticular waxes.

    Directory of Open Access Journals (Sweden)

    Li Li

    Full Text Available Aerial plant surfaces are covered by epicuticular waxes that among other purposes serve to control water loss. Maize glossy mutants originally identified by their "glossy" phenotypes exhibit alterations in the accumulation of epicuticular waxes. By combining data from a BSR-Seq experiment and the newly developed Seq-Walking technology, GRMZM2G118243 was identified as a strong candidate for being the glossy13 gene. The finding that multiple EMS-induced alleles contain premature stop codons in GRMZM2G118243, and the one knockout allele of gl13, validates the hypothesis that gene GRMZM2G118243 is gl13. Consistent with this, GRMZM2G118243 is an ortholog of AtABCG32 (Arabidopsis thaliana, HvABCG31 (barley and OsABCG31 (rice, which encode ABCG subfamily transporters involved in the trans-membrane transport of various secondary metabolites. We therefore hypothesize that gl13 is involved in the transport of epicuticular waxes onto the surfaces of seedling leaves.

  5. csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows

    Science.gov (United States)

    Lun, Aaron T.L.; Smyth, Gordon K.

    2016-01-01

    Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project. PMID:26578583

  6. RNA-Seq Highlights High Clonal Variation in Monoclonal Antibody Producing CHO Cells

    DEFF Research Database (Denmark)

    Orellana, Camila A.; Marcellin, Esteban; Palfreyman, Robin W.

    2018-01-01

    The development of next-generation sequencing technologies has opened new opportunities to better characterize complex eukaryotic cells. Chinese hamster ovary (CHO) cells play a primary role in therapeutic protein production, with currently five of the top ten blockbuster drugs produced in CHO......-regulation of genes encoding secreted glycoproteins is found to be the most significant change. The large number of significant differences even between subclones challenges the notion of identifying and manipulating a few key genes to generate high production CHO cell lines....

  7. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.3361 >orf19.3361; Contig19-10173; 157397..>158185;... YAT2*; carnitine acetyltransferase; gene family | truncated protein MSTYRFQETLEKLPIPDLVQTCNAYLEALKPLQTEQEHE

  8. Beta-Poisson model for single-cell RNA-seq data analyses.

    Science.gov (United States)

    Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Rantalainen, Mattias; Pawitan, Yudi

    2016-07-15

    Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC CONTACT: yudi.pawitan@ki.se or mattias.rantalainen@ki.se Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.4748 >orf19.4748; Contig19-10215; complement(47336.....47731); MSL1*; U2 snRNA-associated protein; MPSTKRSSSTEYSHKDSKKKVKLDYVNLKPSQTLYVKNLNTKINKKILLHNLYLLFSAFGDIISINLQNGFAFIIFSNLNSATLALRNLKNQDFFDKPLVLNYAVKESKAISQEKQKLQDENDEEVMPSYE*

  10. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.2370 >orf19.2370; Contig19-10147; complement(50671..52716); DSL1*; retrogra...de ER-to-golgi transport; MPSIEQQLEDQELYLKDIEQNINKTLSKINKTTLENDNDFRKQFEEIPQDSNTTESN

  11. Developmental, Component-Based Model of Reading Fluency: An Investigation of Predictors of Word-Reading Fluency, Text-Reading Fluency, and Reading Comprehension

    OpenAIRE

    Kim, Young-Suk Grace

    2015-01-01

    The primary goal was to expand our understanding of text reading fluency (efficiency or automaticity)—how its relation to other constructs (e.g., word reading fluency and reading comprehension) changes over time and how it is different from word reading fluency and reading comprehension. We examined (1) developmentally changing relations among word reading fluency, listening comprehension, text reading fluency, and reading comprehension; (2) the relation of reading comprehension to text readi...

  12. The reading teacher as a trainer of citizens

    Directory of Open Access Journals (Sweden)

    Jennie Brand Barajas

    2017-07-01

    Full Text Available The present is a qualitative study by theorizing, from the approach of the problem of reading as a basic resource for the formation of citizens through education. It starts from the definition of the reading capacity, followed by the revision of the general characteristics of the reading brain proposed by Stalisnas Dehaene (2014, as well as the revolutions in the materials and devices used for the writing, besides the changes in the form of reading, from Sumerian tablets to digital technologies. The process of Education for Development and the distinctive features of digital citizenship are presented, which are: immediacy in the production, transmission and reception of messages; interactivity between receiver and producer; the multi-authoritarian, which gives birth to “the prosumers”; the accessibility of the environment; freedom of expression; the democratization of access and the appropriation of a public space. All this allows contextualizing new forms of reading and new profiles of readers, as well as the generation of virtual reading spaces where communities of dialogue and exchange are formed. The study reaches the teachers and their reading biographies, which largely define their competence to encourage reading among their students and their ability to mobilize them towards citizen responsibility through reading.

  13. Developmental relations between reading comprehension and reading strategies

    OpenAIRE

    Muijselaar, M.M.L.; Swart, N.M.; Steenbeek-Planting, E.G.; Droop, W.; Verhoeven, L.T.W.; Jong, P.F. de

    2017-01-01

    We examined the developmental relations between knowledge of reading strategies and reading comprehension in a longitudinal study of 312 Dutch children from the beginning of fourth grade to the end of fifth grade. Measures for reading comprehension, reading strategies, reading fluency, vocabulary, and working memory were administered. A structural equation model was constructed to estimate the unique relations between reading strategies and reading comprehension, while controlling for reading...

  14. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  15. Properties of the Second Outburst of the Bursting Pulsar (GRO J1744-28) as Observed with BASTE

    Science.gov (United States)

    Woods, Peter M.; Kouveliotou, Chryssa; VanParadus, Jan; Briggs, Michael S.; Wilson, C. A.; Deal, Kim; Harmon, B. A.; Fishman, G. J.; Lewin, W. H. G.; Kommers, J.

    1999-01-01

    One year after its discovery, the Bursting Pulsar (GRO J1744-28) went into outburst again, displaying the hard X-ray bursts and pulsations that make this source unique. We report on BATSE (Burst and Transient Source Experiment) observations of both the persistent and burst emission for this second outburst and draw comparisons with the first. The second outburst was smaller than the first in both duration and peak luminosity. The persistent flux, burst peak flux, and burst fluence were all reduced in amplitude by a factor of approximately 1.7. Despite these differences, the two outbursts were very similar with respect to the burst occurrence rate, the durations and spectra of bursts, the absence of spectral evolution during bursts, and the evolution of the ratio alpha of average persistent to burst luminosity. Although no spectral evolution was found within individual bursts, we find evidence for a small (20%) variation of the spectral temperature during the course of the second outburst.

  16. The RNA-Seq based high resolution gene expression atlas of chickpea (Cicer arietinum L.) reveals dynamic spatio-temporal changes associated with growth and development.

    Science.gov (United States)

    Kudapa, Himabindu; Garg, Vanika; Chitikineni, Annapurna; Varshney, Rajeev K

    2018-04-10

    Chickpea is one of the world's largest cultivated food legume and is an excellent source of high-quality protein to the human diet. Plant growth and development are controlled by programmed expression of a suite of genes at the given time, stage and tissue. Understanding how the underlying genome sequence translates into specific plant phenotypes at key developmental stages, information on gene expression patterns is crucial. Here we present a comprehensive Cicer arietinum Gene Expression Atlas (CaGEA) across the plant developmental stages and organs covering the entire life cycle of chickpea. One of the widely used drought tolerant cultivar, ICC 4958 has been used to generate RNA-Seq data from 27 samples at five major developmental stages of the plant. A total of 816 million raw reads were generated and of these, 794 million filtered reads after QC were subjected to downstream analysis. A total of 15,947 unique number of differentially expressed genes across different pairwise tissue combinations were identified. Significant differences in gene expression patterns contributing in the process of flowering, nodulation, seed and root development were inferred in this study. Furthermore, differentially expressed candidate genes from "QTL-hotspot" region associated with drought stress response in chickpea were validated. This article is protected by copyright. All rights reserved.

  17. Clustering of reads with alignment-free measures and quality values.

    Science.gov (United States)

    Comin, Matteo; Leoni, Andrea; Schimd, Michele

    2015-01-01

    The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %). In this scenario it will be fundamental to exploit quality value information within the alignment-free framework. To the best of our knowledge this is the first study that incorporates quality value information and k-mers counts, in the context of alignment-free measures, for the comparison of reads data. Based on this principles, in this paper we present a family of alignment-free measures called D (q) -type. A set of experiments on simulated and real reads data confirms that the new measures are superior to other classical alignment-free statistics, especially when erroneous reads are considered. Also results on de novo assembly and metagenomic reads classification show that the introduction of quality values improves over standard alignment-free measures. These statistics are implemented in a software called QCluster (http://www.dei.unipd.it/~ciompin/main/qcluster.html).

  18. Teaching Reading

    Science.gov (United States)

    Day, Richard R.

    2013-01-01

    "Teaching Reading" uncovers the interactive processes that happen when people learn to read and translates them into a comprehensive easy-to-follow guide on how to teach reading. Richard Day's revelations on the nature of reading, reading strategies, reading fluency, reading comprehension, and reading objectives make fascinating…

  19. Slow Reading: Reading along "Lectio" Lines

    Science.gov (United States)

    Badley, K. Jo-Ann; Badley, Ken

    2011-01-01

    The medieval monastic movement preserved and developed reading practices--lectio--from ancient Greek pedagogy as a slow, mindful approach to reading for formation. This ancient way of reading, now better known as lectio divina, challenges the fast, pragmatic reading so characteristic of our time. We propose that the present moment may be ripe for…

  20. ORF Sequence: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.710 >orf19.710; Contig19-10065; complement(47186.....>47710); LSC2*; succinate-CoA ligase beta subunit; truncated protein | overlap LGFDDNASFRQEEVFSWRDPTQEDPQEAE

  1. Male-biased genes in catfish as revealed by RNA-Seq analysis of the testis transcriptome.

    Directory of Open Access Journals (Sweden)

    Fanyue Sun

    Full Text Available BACKGROUND: Catfish has a male-heterogametic (XY sex determination system, but genes involved in gonadogenesis, spermatogenesis, testicular determination, and sex determination are poorly understood. As a first step of understanding the transcriptome of the testis, here, we conducted RNA-Seq analysis using high throughput Illumina sequencing. METHODOLOGY/PRINCIPAL FINDINGS: A total of 269.6 million high quality reads were assembled into 193,462 contigs with a N50 length of 806 bp. Of these contigs, 67,923 contigs had hits to a set of 25,307 unigenes, including 167 unique genes that had not been previously identified in catfish. A meta-analysis of expressed genes in the testis and in the gynogen (double haploid female allowed the identification of 5,450 genes that are preferentially expressed in the testis, providing a pool of putative male-biased genes. Gene ontology and annotation analysis suggested that many of these male-biased genes were involved in gonadogenesis, spermatogenesis, testicular determination, gametogenesis, gonad differentiation, and possibly sex determination. CONCLUSION/SIGNIFICANCE: We provide the first transcriptome-level analysis of the catfish testis. Our analysis would lay the basis for sequential follow-up studies of genes involved in sex determination and differentiation in catfish.

  2. Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit.

    Science.gov (United States)

    Ramani, Anantharaman; Wong, Yongxun; Tan, Si Zhen; Shue, Bing Hong; Syn, Christopher

    2017-11-01

    The ability to predict bio-geographic ancestry can be valuable to generate investigative leads towards solving crimes. Ancestry informative marker (AIM) sets include large numbers of SNPs to predict an ancestral population. Massively parallel sequencing has enabled forensic laboratories to genotype a large number of such markers in a single assay. Illumina's ForenSeq DNA Signature Kit includes the ancestry informative SNPs reported by Kidd et al. In this study, the ancestry prediction capabilities of the ForenSeq kit through sequencing on the MiSeq FGx were evaluated in 1030 unrelated Singapore population samples of Chinese, Malay and Indian origin. A total of 59 ancestry SNPs and phenotypic SNPs with AIM properties were selected. The bio-geographic ancestry of the 1030 samples, as predicted by Illumina's ForenSeq Universal Analysis Software (UAS), was determined. 712 of the genotyped samples were used as a training sample set for the generation of an ancestry prediction model using STRUCTURE and Snipper. The performance of the prediction model was tested by both methods with the remaining 318 samples. Ancestry prediction in UAS was able to correctly classify the Singapore Chinese as part of the East Asian cluster, while Indians clustered with Ad-mixed Americans and Malays clustered in-between these two reference populations. Principal component analyses showed that the 59 SNPs were only able to account for 26% of the variation between the Singapore sub-populations. Their discriminatory potential was also found to be lower (G ST =0.085) than that reported in ALFRED (F ST =0.357). The Snipper algorithm was able to correctly predict bio-geographic ancestry in 91% of Chinese and Indian, and 88% of Malay individuals, while the success rates for the STRUCTURE algorithm were 94% in Chinese, 80% in Malay, and 91% in Indian individuals. Both these algorithms were able to provide admixture proportions when present. Ancestry prediction accuracy (in terms of likelihood ratio

  3. The immobilization of heavy metals in soil by bioaugmentation of a UV-mutant Bacillus subtilis 38 assisted by NovoGro biostimulation and changes of soil microbial community.

    Science.gov (United States)

    Wang, Ting; Sun, Hongwen; Mao, Hongjun; Zhang, Yanfeng; Wang, Cuiping; Zhang, Zhiyuan; Wang, Baolin; Sun, Lei

    2014-08-15

    Bacillus subtilis 38 (B38) is a mutant species of Bacillus subtilis acquired by UV irradiation with high cadmium tolerance. This study revealed that B38 was a good biosorbent for the adsorption of multiple heavy metals (cadmium, chromium, mercury, and lead). Simultaneous application of B38 and NovoGro (SNB) exhibited a synergetic effect on the immobilization of heavy metals in soil. The heavy metal concentrations in the edible part of the tested plants (lettuce, radish, and soybean) under SNB treatment decreased by 55.4-97.9% compared to the control. Three single extraction methods, diethylenetriaminepentaacetic acid (DTPA), Mehlich 3 (M3), and the first step of the Community Bureau of Reference method (BCR1), showed good predictive capacities for metal bioavailability to leafy, rhizome, and leguminous plant, respectively. The polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) profiles revealed that NovoGro could enhance the proliferation of both exotic B38 and native microbes. Finally, the technology was checked in the field, the reduction in heavy metal concentrations in the edible part of radish was in the range between 30.8% and 96.0% after bioremediation by SNB treatment. This study provides a practical strategy for the remediation of farmland contaminated by multiple heavy metals. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Carbapenemase-producing Organism in Food

    Centers for Disease Control (CDC) Podcasts

    2014-08-06

    Dr. Mike Miller reads an abridged version of the article, Carbapenemase-producing Organism in Food.  Created: 8/6/2014 by National Center for Emerging and Zoonotic Infectious Diseases (NCEZID).   Date Released: 8/13/2014.

  5. RNA-Seq analysis and gene discovery of Andrias davidianus using Illumina short read sequencing.

    Directory of Open Access Journals (Sweden)

    Fenggang Li

    Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.

  6. Biosynthesis of platelet activating factor (PAF) via alternate pathways: subcellular distribution of products in HL-60 cells

    International Nuclear Information System (INIS)

    Record, M.; Snyder, F.

    1986-01-01

    Final steps in the biosynthesis of PAF can be catalyzed by two different routes: CDP-choline:1-alkyl-2-acetyl-Gro cholinephosphotransferase [dithiothrietol (DTT)-insensitive] or acetyl-CoA:1-alkyl-2-lyso-GroPCho acetyltransferase. The authors have investigated the conversion of tritium-labeled 1-alkyl-2-acetyl-Gro and 1-alkyl-2-lyso-GroPCho (lyso-PAF) to PAF and other lipid products in HL-60 cells and in subcellular organelles isolated by centrifugation in a Percoll gradient. When cells are incubated with the labeled precursors (2 μM) the total amount of labeled PAF and 1-alkyl-2-acyl-GroPCho formed was similar from both precursors (60 pmol from 1-alkyl-2-acetyl-Gro and 50 pmol from lyso-PAF). However, PAF formed from 1-alkyl-2-acetyl-Gro represented 70% of the total products, whereas with lyso-PAF the major labeled product was 1-alkyl-2-acyl-GroPCho. Formation of PAF from 1-[ 3 H]alkyl-2-acetyl-Gro was linear to at least 30 min at 20 0 C. After a 15-min incubation of this neutral lipid with HL-60 cells, the labeled PAF produced was located exclusively in the plasma membrane fraction as opposed to the label in the 1-alkyl-2-acyl-GroPCho, which was found only in the endoplasmic reticulum; none of the labeled PAF product was released to the media. The authors results suggest PAF might be synthesized by the DTT-insensitive cholinephosphotransferase at the site of the plasma membrane in HL-60 cells

  7. Die Facharztweiterbildung in Großbritannien oder die "Modernisierung der ärztlichen Karrieren" [Modernising Medical Careers

    Directory of Open Access Journals (Sweden)

    du Moulin, Marcel

    2007-05-01

    Full Text Available [english] This paper analyses the reform of postgraduate medical education introduced in the United Kingdom in 2003 under the heading "Modernising Medical Careers“. This nationally standardised and structured postgraduate medical training was started in 2005. The analysis includes the transition from undergraduate to postgraduate education, the new responsibilities in postgraduate education, the application process and the mode of distribution of the positions and the relationship of posts with and without postgraduate training opportunities. Possible consequences for postgraduate training and for the trainees are discussed. [german] Die Arbeit beschreibt und analysiert die Reform der Facharztweiterbildung in Großbritannien, die 2003 unter dem Titel "Modernising Medical Careers“ eingeführt wurde und seit 2005 umgesetzt wird. Vorgestellt werden der Übergang von der Ausbildung zur Weiterbildung, die neuen Zuständigkeiten für die Weiterbildung, das Auswahlverfahren und der Verteilungsmodus der Weiterbildungsstellen sowie das Verhältnis von Stellen mit und ohne Facharztperspektive. Die potenziellen Folgen für die Facharztweiterbildung und für die Weitergebildeten werden diskutiert.

  8. Combining laser microdissection and RNA-seq to chart the transcriptional landscape of fungal development

    Science.gov (United States)

    2012-01-01

    Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM) and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia) and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated. PMID:23016559

  9. Combining laser microdissection and RNA-seq to chart the transcriptional landscape of fungal development

    Directory of Open Access Journals (Sweden)

    Teichert Ines

    2012-09-01

    Full Text Available Abstract Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated.

  10. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

    Directory of Open Access Journals (Sweden)

    Kumar Parijat Tripathi

    Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is

  11. Reading comprehension and reading related abilities in adolescents with reading disabilities and attention-deficit/hyperactivity disorder.

    Science.gov (United States)

    Ghelani, Karen; Sidhu, Robindra; Jain, Umesh; Tannock, Rosemary

    2004-11-01

    Reading comprehension is a very complex task that requires different cognitive processes and reading abilities over the life span. There are fewer studies of reading comprehension relative to investigations of word reading abilities. Reading comprehension difficulties, however, have been identified in two common and frequently overlapping childhood disorders: reading disability (RD) and attention-deficit/hyperactivity disorder (ADHD). The nature of reading comprehension difficulties in these groups remains unclear. The performance of four groups of adolescents (RD, ADHD, comorbid ADHD and RD, and normal controls) was compared on reading comprehension tasks as well as on reading rate and accuracy tasks. Adolescents with RD showed difficulties across most reading tasks, although their comprehension scores were average. Adolescents with ADHD exhibited adequate single word reading abilities. Subtle difficulties were observed, however, on measures of text reading rate and accuracy as well as on silent reading comprehension, but scores remained in the average range. The comorbid group demonstrated similar difficulties to the RD group on word reading accuracy and on reading rate but experienced problems on only silent reading comprehension. Implications for reading interventions are outlined, as well as the clinical relevance for diagnosis.

  12. The Explicit Instruction of Reading Strategies: Directed Reading Thinking Activity vs. Guided Reading Strategies

    Directory of Open Access Journals (Sweden)

    Mohammad Mehdi Yazdani

    2015-05-01

    Full Text Available Investigating the efficiencies and deficiencies of reading strategies is one of the noticeable issues in the related theory and research in reading comprehension instruction. This study was to examine the impact of Directed Reading Thinking Activity (DRTA and Guided Reading (GR on reading comprehension. Sixty three Iranian students of grade one in Shahed high school in the city of Bojnourd took part in the study. They were assigned in three groups, one control and two experimental groups. The instruction lasted for ten weeks. This study utilized a pretest posttest control group in quantitative quasi- experimental design. The same reading comprehension test was administered as pre-test and post-test. The results were twofold: First, the instruction of learning strategies could foster reading comprehension skill. Second, while the explicit instruction of both strategies could improve the students' reading comprehension skill, Directed Reading Thinking Activity had a more significant positive effect than Guided Reading.

  13. Quantitative ChIP-Seq Normalization Reveals Global Modulation of the Epigenome

    Directory of Open Access Journals (Sweden)

    David A. Orlando

    2014-11-01

    Full Text Available Epigenomic profiling by chromatin immunoprecipitation coupled with massively parallel DNA sequencing (ChIP-seq is a prevailing methodology used to investigate chromatin-based regulation in biological systems such as human disease, but the lack of an empirical methodology to enable normalization among experiments has limited the precision and usefulness of this technique. Here, we describe a method called ChIP with reference exogenous genome (ChIP-Rx that allows one to perform genome-wide quantitative comparisons of histone modification status across cell populations using defined quantities of a reference epigenome. ChIP-Rx enables the discovery and quantification of dynamic epigenomic profiles across mammalian cells that would otherwise remain hidden using traditional normalization methods. We demonstrate the utility of this method for measuring epigenomic changes following chemical perturbations and show how reference normalization of ChIP-seq experiments enables the discovery of disease-relevant changes in histone modification occupancy.

  14. MDMA enhances "mind reading" of positive emotions and impairs "mind reading" of negative emotions.

    Science.gov (United States)

    Hysek, Cédric M; Domes, Gregor; Liechti, Matthias E

    2012-07-01

    3,4-Methylenedioxymethamphetamine (MDMA, ecstasy) increases sociability. The prosocial effects of MDMA may result from the release of the "social hormone" oxytocin and associated alterations in the processing of socioemotional stimuli. We investigated the effects of MDMA (125 mg) on the ability to infer the mental states of others from social cues of the eye region in the Reading the Mind in the Eyes Test. The study included 48 healthy volunteers (24 men, 24 women) and used a double-blind, placebo-controlled, within-subjects design. A choice reaction time test was used to exclude impairments in psychomotor function. We also measured circulating oxytocin and cortisol levels and subjective drug effects. MDMA differentially affected mind reading depending on the emotional valence of the stimuli. MDMA enhanced the accuracy of mental state decoding for positive stimuli (e.g., friendly), impaired mind reading for negative stimuli (e.g., hostile), and had no effect on mind reading for neutral stimuli (e.g., reflective). MDMA did not affect psychomotor performance, increased circulating oxytocin and cortisol levels, and produced subjective prosocial effects, including feelings of being more open, talkative, and closer to others. The shift in the ability to correctly read socioemotional information toward stimuli associated with positive emotional valence, together with the prosocial feelings elicited by MDMA, may enhance social approach behavior and sociability when MDMA is used recreationally and facilitate therapeutic relationships in MDMA-assisted psychotherapeutic settings.

  15. Developmental Relations Between Reading Comprehension and Reading Strategies

    NARCIS (Netherlands)

    Muijselaar, M.; Swart, N.M.; Steenbeek-Planting, E.G,.; Droop, M.; Verhoeven, L.; de Jong, P.F.

    2017-01-01

    We examined the developmental relations between knowledge of reading strategies and reading comprehension in a longitudinal study of 312 Dutch children from the beginning of fourth grade to the end of fifth grade. Measures for reading comprehension, reading strategies, reading fluency, vocabulary,

  16. Investigation of the biological properties of Cinnulin PF in the context of diabetes: mechanistic insights by genome-wide mRNA-Seq analysis

    Directory of Open Access Journals (Sweden)

    Katherine Ververis

    2012-02-01

    Full Text Available The accumulating evidence of the beneficial effects of cinnamon (Cinnamomum burmanni in type-2 diabetes, a chronic age-associated disease, has prompted the commercialisation of various supplemental forms of the spice. One such supplement, Cinnulin PF®, represents the water soluble fraction containing relatively high levels of the double-linked procyanidin type-A polymers of flavanoids. The overall aim of this study was to utilize genome-wide mRNA-Seq analysis to characterise the changes in gene expression caused by Cinnulin PF in immortalised human keratinocytes and microvascular endothelial cells, which are relevant with respect to diabetic complications. In summary, our findings provide insights into the mechanisms of action of Cinnulin PF in diabetes and diabetic complications. More generally, we identify relevant candidate genes which could provide the basis for further investigation. To access the supplementary material to this article: ‘Supplementary tables 1–3’ please see Supplementary files under Reading Tools online.

  17. Forecasting Reading Anxiety for Promoting English-Language Reading Performance Based on Reading Annotation Behavior

    Science.gov (United States)

    Chen, Chih-Ming; Wang, Jung-Ying; Chen, Yong-Ting; Wu, Jhih-Hao

    2016-01-01

    To reduce effectively the reading anxiety of learners while reading English articles, a C4.5 decision tree, a widely used data mining technique, was used to develop a personalized reading anxiety prediction model (PRAPM) based on individual learners' reading annotation behavior in a collaborative digital reading annotation system (CDRAS). In…

  18. Seasonal differences in the testicular transcriptome profile of free-living European beavers (Castor fiber L. determined by the RNA-Seq method.

    Directory of Open Access Journals (Sweden)

    Iwona Bogacka

    Full Text Available The European beaver (Castor fiber L. is an important free-living rodent that inhabits Eurasian temperate forests. Beavers are often referred to as ecosystem engineers because they create or change existing habitats, enhance biodiversity and prepare the environment for diverse plant and animal species. Beavers are protected in most European Union countries, but their genomic background remains unknown. In this study, gene expression patterns in beaver testes and the variations in genetic expression in breeding and non-breeding seasons were determined by high-throughput transcriptome sequencing. Paired-end sequencing in the Illumina HiSeq 2000 sequencer produced a total of 373.06 million of high-quality reads. De novo assembly of contigs yielded 130,741 unigenes with an average length of 1,369.3 nt, N50 value of 1,734, and average GC content of 46.51%. A comprehensive analysis of the testicular transcriptome revealed more than 26,000 highly expressed unigenes which exhibited the highest homology with Rattus norvegicus and Ictidomys tridecemlineatus genomes. More than 8,000 highly expressed genes were found to be involved in fundamental biological processes, cellular components or molecular pathways. The study also revealed 42 genes whose regulation differed between breeding and non-breeding seasons. During the non-breeding period, the expression of 37 genes was up-regulated, and the expression of 5 genes was down-regulated relative to the breeding season. The identified genes encode molecules which are involved in signaling transduction, DNA repair, stress responses, inflammatory processes, metabolism and steroidogenesis. Our results pave the way for further research into season-dependent variations in beaver testes.

  19. Insights into bacterioplankton community structure from Sundarbans mangrove ecoregion using Sanger and Illumina MiSeq sequencing approaches: A comparative analysis

    Directory of Open Access Journals (Sweden)

    Anwesha Ghosh

    2017-03-01

    Full Text Available Next generation sequencing using platforms such as Illumina MiSeq provides a deeper insight into the structure and function of bacterioplankton communities in coastal ecosystems compared to traditional molecular techniques such as clone library approach which incorporates Sanger sequencing. In this study, structure of bacterioplankton communities was investigated from two stations of Sundarbans mangrove ecoregion using both Sanger and Illumina MiSeq sequencing approaches. The Illumina MiSeq data is available under the BioProject ID PRJNA35180 and Sanger sequencing data under accession numbers KX014101-KX014140 (Stn1 and KX014372-KX014410 (Stn3. Proteobacteria-, Firmicutes- and Bacteroidetes-like sequences retrieved from both approaches appeared to be abundant in the studied ecosystem. The Illumina MiSeq data (2.1 GB provided a deeper insight into the structure of bacterioplankton communities and revealed the presence of bacterial phyla such as Actinobacteria, Cyanobacteria, Tenericutes, Verrucomicrobia which were not recovered based on Sanger sequencing. A comparative analysis of bacterioplankton communities from both stations highlighted the presence of genera that appear in both stations and genera that occur exclusively in either station. However, both the Sanger sequencing and Illumina MiSeq data were coherent at broader taxonomic levels. Pseudomonas, Devosia, Hyphomonas and Erythrobacter-like sequences were the abundant bacterial genera found in the studied ecosystem. Both the sequencing methods showed broad coherence although as expected the Illumina MiSeq data helped identify rarer bacterioplankton groups and also showed the presence of unassigned OTUs indicating possible presence of novel bacterioplankton from the studied mangrove ecosystem.

  20. Relating genes to function: identifying enriched transcription factors using the ENCODE ChIP-Seq significance tool.

    Science.gov (United States)

    Auerbach, Raymond K; Chen, Bin; Butte, Atul J

    2013-08-01

    Biological analysis has shifted from identifying genes and transcripts to mapping these genes and transcripts to biological functions. The ENCODE Project has generated hundreds of ChIP-Seq experiments spanning multiple transcription factors and cell lines for public use, but tools for a biomedical scientist to analyze these data are either non-existent or tailored to narrow biological questions. We present the ENCODE ChIP-Seq Significance Tool, a flexible web application leveraging public ENCODE data to identify enriched transcription factors in a gene or transcript list for comparative analyses. The ENCODE ChIP-Seq Significance Tool is written in JavaScript on the client side and has been tested on Google Chrome, Apple Safari and Mozilla Firefox browsers. Server-side scripts are written in PHP and leverage R and a MySQL database. The tool is available at http://encodeqt.stanford.edu. abutte@stanford.edu Supplementary material is available at Bioinformatics online.

  1. StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.

    Science.gov (United States)

    Roosaare, Märt; Vaher, Mihkel; Kaplinski, Lauris; Möls, Märt; Andreson, Reidar; Lepamets, Maarja; Kõressaar, Triinu; Naaber, Paul; Kõljalg, Siiri; Remm, Maido

    2017-01-01

    Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. A tool named StrainSeeker was developed that constructs a list of specific k -mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k -mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.

  2. Genomics and systems biology of boar taint and meat quality in pigs

    DEFF Research Database (Denmark)

    Drag, Markus; Kogelman, Lisette JA; Meinert, Lene

    2015-01-01

    , economic losses associated with castrated pigs and a ban on castration in the EU effective by 2018. The main objective of the PhD project is to unravel the underlying mechanisms of BT at the genomic, transcriptomic and phenotypic levels as well as its connection with sensory meat quality (SMQ) in order...... to enable optimized breeding strategies as alternative to castration. Male pigs with different genetic merit of BT were selected and tissue from liver and testes were subjected to transcriptomic profiling by stranded paired end RNA-Seq which produced ~30 mio. reads per sample. The reads were subjected...

  3. The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.

    Science.gov (United States)

    Petryszak, Robert; Fonseca, Nuno A; Füllgrabe, Anja; Huerta, Laura; Keays, Maria; Tang, Y Amy; Brazma, Alvis

    2017-07-15

    The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  4. Microfluidic PCR Amplification and MiSeq Amplicon Sequencing Techniques for High-Throughput Detection and Genotyping of Human Pathogenic RNA Viruses in Human Feces, Sewage, and Oysters

    Directory of Open Access Journals (Sweden)

    Mamoru Oshiki

    2018-04-01

    Full Text Available Detection and genotyping of pathogenic RNA viruses in human and environmental samples are useful for monitoring the circulation and prevalence of these pathogens, whereas a conventional PCR assay followed by Sanger sequencing is time-consuming and laborious. The present study aimed to develop a high-throughput detection-and-genotyping tool for 11 human RNA viruses [Aichi virus; astrovirus; enterovirus; norovirus genogroup I (GI, GII, and GIV; hepatitis A virus; hepatitis E virus; rotavirus; sapovirus; and human parechovirus] using a microfluidic device and next-generation sequencer. Microfluidic nested PCR was carried out on a 48.48 Access Array chip, and the amplicons were recovered and used for MiSeq sequencing (Illumina, Tokyo, Japan; genotyping was conducted by homology searching and phylogenetic analysis of the obtained sequence reads. The detection limit of the 11 tested viruses ranged from 100 to 103 copies/μL in cDNA sample, corresponding to 101–104 copies/mL-sewage, 105–108 copies/g-human feces, and 102–105 copies/g-digestive tissues of oyster. The developed assay was successfully applied for simultaneous detection and genotyping of RNA viruses to samples of human feces, sewage, and artificially contaminated oysters. Microfluidic nested PCR followed by MiSeq sequencing enables efficient tracking of the fate of multiple RNA viruses in various environments, which is essential for a better understanding of the circulation of human pathogenic RNA viruses in the human population.

  5. Child-centered reading intervention: See, talk, dictate, read, write!

    Directory of Open Access Journals (Sweden)

    Muhammet BAŞTUĞ

    2016-06-01

    Full Text Available Poor reading achievement of children in elementary schools has been one of the major concerns in education. The aim of this study is to examine the effectiveness of a child-centered reading intervention in eliminating the reading problems of a student with poor reading achievement. The research was conducted with a student having difficulty in reading. A reading intervention was designed that targeted multiple areas of reading and aimed to improve reading skills through the use of multiple strategies. This intervention is child-centered and includes visual aids, talking, dictating, reading and writing stages. The study was performed in 35 sessions consisting of stages of a single sentence (5 sessions, two sentences (5 sessions, three sentences (20 sessions and the text stage (5 sessions. The intervention sessions were audio-taped. These recordings and the written responses to the reading comprehension questions provided the data for analysis. The findings on the reading intervention revealed positive outcomes. The student exhibited certain improvements at the levels of reading, reading rate and reading comprehension. These results were discussed in the literature and the findings suggest that child-centered reading strategies such as talking, dictating and writing should be the main focus of instruction for students with low reading literacy achievement to enable these students to meet the demands of the curriculum.

  6. Multiplexed ChIP-Seq Using Direct Nucleosome Barcoding: A Tool for High-Throughput Chromatin Analysis.

    Science.gov (United States)

    Chabbert, Christophe D; Adjalley, Sophie H; Steinmetz, Lars M; Pelechano, Vicent

    2018-01-01

    Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.

  7. A comparative study of techniques for differential expression analysis on RNA-Seq data.

    Directory of Open Access Journals (Sweden)

    Zong Hong Zhang

    Full Text Available Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.

  8. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  9. IMPROVING STUDENTS’ READING COMPREHENSION THROUGH IINTERACTIVE READ-ALOUD TECHNIQUE

    Directory of Open Access Journals (Sweden)

    Edi Santoso

    2015-10-01

    Full Text Available Abstract: The present study, entitled Improving Students’ Reading Comprehension through Interactive Read-Aloud, attempts to unlock problems found in teaching and reading comprehension through interactive read-aloud in a Senior High School of Sport (SMAN Olah Raga Lampung, in Metro. The findings revealed that students’ reading comprehension improved through interactive read-aloud. The improvement can be seen from the increase of test results, meaning construction, and motivation. The process of reading activities showed that the teacher’s gesture and body language, 20 questions, explain and guess activities were proven to help the students construct meaning from the given texts. In addition, interactive read-aloud is effective to boost students’ motivation to comprehend the texts.   Key words: Reading comprehension, interactive read-aloud.

  10. Reading faster

    Directory of Open Access Journals (Sweden)

    Paul Nation

    2009-12-01

    Full Text Available This article describes the visual nature of the reading process as it relates to reading speed. It points out that there is a physical limit on normal reading speed and beyond this limit the reading process will be different from normal reading where almost every word is attended to. The article describes a range of activities for developing reading fluency, and suggests how the development of fluency can become part of a reading programme.

  11. SeqWare Query Engine: storing and searching sequence data in the cloud

    Directory of Open Access Journals (Sweden)

    Merriman Barry

    2010-12-01

    Full Text Available Abstract Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net. Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters

  12. SeqWare Query Engine: storing and searching sequence data in the cloud

    Science.gov (United States)

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data

  13. SeqWare Query Engine: storing and searching sequence data in the cloud.

    Science.gov (United States)

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of

  14. A Comparison of Reading Response Methods to Increase Student Learning

    Directory of Open Access Journals (Sweden)

    Cheryl J. Davis

    2016-01-01

    Full Text Available It is common in college courses to test students on the required readings for that course. With a rise in online education it is often the case that students are required to provide evidence of reading the material. However, there is little empirical research stating the best written means to assess that students read the materials. This study experimentally compared the effect of assigned reading summaries or study questions on student test performance. The results revealed that study questions produced higher quiz scores and higher preparation for the quiz, based on student feedback. Limitations of the study included a small sample size and extraneous activities that may have affected general knowledge on a topic. Results suggest that study questions focusing students on critical information in the required readings improve student learning.

  15. Evaluation of normalization methods in mammalian microRNA-Seq data

    Science.gov (United States)

    Garmire, Lana Xia; Subramaniam, Shankar

    2012-01-01

    Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. PMID:22532701

  16. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data

    Directory of Open Access Journals (Sweden)

    Songbo eHuang

    2011-07-01

    Full Text Available RNA-Seq, a method using next generation sequencing technologies to sequence the transcriptome, facilitates genome-wide analysis of splice junction sites. In this paper, we introduce SOAPsplice, a robust tool to detect splice junctions using RNA-Seq data without using any information of known splice junctions. SOAPsplice uses a novel two-step approach consisting of first identifying as many reasonable splice junction candidates as possible, and then, filtering the false positives with two effective filtering strategies. In both simulated and real datasets, SOAPsplice is able to detect many reliable splice junctions with low false positive rate. The improvement gained by SOAPsplice, when compared to other existing tools, becomes more obvious when the depth of sequencing is low. SOAPsplice is freely available at http://soap.genomics.org.cn/soapsplice.html.

  17. Promoting preschool reading

    OpenAIRE

    Istenič, Vesna

    2013-01-01

    The thesis titled Promoting preschool reading consists of a theoretiral and an empirical part. In the theoretical part I wrote about reading, the importance of reading, types of reading, about reading motivation, promoting reading motivation, internal and external motivation, influence of reading motivation on the child's reading activity, reading and familial literacy, the role of adults in promotion reading literacy, reading to a child and promoting reading in pre-school years, where I ...

  18. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    Science.gov (United States)

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  19. Seqüestro pulmonar: uma série de nove casos operados

    Directory of Open Access Journals (Sweden)

    PÊGO-FERNANDES PAULO M.

    2002-01-01

    Full Text Available Seqüestro pulmonar é uma anomalia congênita que envolve parênquima e vascularização pulmonar e apresenta-se como extralobar ou intralobar. Objetivo: Descrever os casos de seqüestro pulmonar tratados no InCor e Hospital das Clínicas da FMUSP no período de 1987 a 1996. Método: Análise retrospectiva dos prontuários. Resultados: Foram tratados nove pacientes, sendo quatro mulheres e cinco homens; duas crianças e sete adultos. Infecção respiratória de repetição e hemoptise foram achados clínicos freqüentes nesses pacientes. Todos os casos eram intralobares. A principal localização foi no lobo inferior esquerdo (66%. Apenas um diagnóstico foi intra-operatório. Nos outros oito casos, o diagnóstico foi suspeitado pela radiografia de tórax (100% e confirmado pela arteriografia (77% e/ou tomografia computadorizada (66%. Lobectomia (77% foi o principal tratamento cirúrgico, com baixa morbidade pós-operatória e sem mortalidade. Exame anatomopatológico foi realizado em sete casos e confirmou a doença. Conclusões: O seqüestro pulmonar é uma entidade incomum, em que a tomografia computadorizada e a arteriografia são os exames que mais informações oferecem para um diagnóstico definitivo e seguro. A ressecção do tecido envolvido leva a excelentes resultados.

  20. SignalSpider: Probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles

    KAUST Repository

    Wong, Kachun; Li, Yue; Peng, Chengbin; Zhang, Zhaolei

    2014-01-01

    Motivation: Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) measures the genome-wide occupancy of transcription factors in vivo. Different combinations of DNA-binding protein occupancies may result in a gene

  1. RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets.

    Directory of Open Access Journals (Sweden)

    Guorong Xu

    Full Text Available High-throughput RNA sequencing (RNA-seq has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs generated by the infection of naïve B-cells with the Epstein Barr virus (EBV, while another 23 samples were derived from Burkitt's lymphomas (BL, some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype from the LCLs (which have a blast-like phenotype with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely

  2. Properties of the Second Outburst of the Bursting Pulsar (GRO J1744-28) as Observed with BATSE

    Science.gov (United States)

    Woods, P.; Kouveliotou, C.; vanParadijs, J.; Briggs, M. S.; Wilson, C. A.; Deal, K. J.; Harmon, B. A.; Fishman, G. J.; Lewin, W. H.; Kommers, J.

    1998-01-01

    One year after its discovery, the Bursting Pulsar (GRO J1744-28) went into outburst again, displaying the hard X-ray bursts and pulsations that make this source unique. We report on Burst and Transient Source Experiment (BATSE) observations of both the persistent and burst emission for this second outburst and draw comparisons to the first. The second outburst was smaller than the first in both duration and peak luminosity. The persistent flux, burst peak flux and burst fluence were all reduced in amplitude by a factor approximately 1.7. Despite these differences, the average burst occurrence rate and average burst durations were roughly the same through each outburst. Similar to the first outburst, no spectral evolution was found within bursts and the parameter alpha was very small at the start of the outburst (alpha = 2.1 +/- 1.7 on 1996 December 2). Although no spectral evolution was found within individual bursts, we find evidence for a small (20%) variation of the spectral temperature during the course of the second outburst.

  3. Promoting toddlers' vegetable consumption through interactive reading and puppetry.

    Science.gov (United States)

    de Droog, Simone M; van Nee, Roselinde; Govers, Mieke; Buijzen, Moniek

    2017-09-01

    Picture books with characters that promote healthy eating are increasingly being used to make this behavior more attractive. The first aim of this study was to investigate whether the effect of vegetable-promoting picture books on toddlers' vegetable consumption differed according to the reading style and the use of a hand puppet during reading. The second aim was to investigate whether these effects were mediated by toddlers' narrative involvement and character imitation. In a 2 (reading style: interactive vs. passive) x 2 (puppet use: with vs. without puppet) between-subjects design, 163 toddlers (2-3 years) were randomly assigned to one of the four reading conditions. The story was about a rabbit that loves to eat carrots. After the fourth reading day, the eating task was conducted in which children could eat freely from four different snacks, including carrots. The main finding was that interactive reading produced the greatest carrot consumption. The explanation for this effect was that interactive reading stimulated toddlers to imitate poses of the book characters, even more when interactive reading was supported by the use of a hand puppet. The findings underline that young children should be actively involved with health interventions in order for them to be effective. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Parents' reading-related knowledge and children's reading acquisition.

    Science.gov (United States)

    Ladd, Megan; Martin-Chang, Sandra; Levesque, Kyle

    2011-12-01

    Teacher reading-related knowledge (phonological awareness and phonics knowledge) predicts student reading, however little is known about the reading-related knowledge of parents. Participants comprised 70 dyads (children from kindergarten and grade 1 and their parents). Parents were administered a questionnaire tapping into reading-related knowledge, print exposure, storybook reading, and general cultural knowledge. Children were tested on measures of letter-word knowledge, sound awareness, receptive vocabulary, oral expression, and mathematical skill. Parent reading-related knowledge showed significant positive links with child letter-word knowledge and sound awareness, but showed no correlations with child measures of mathematical skill or vocabulary. Furthermore, parent reading-related knowledge was not associated with parents' own print exposure or cultural knowledge, indicating that knowledge about English word structure may be separate from other cognitive skills. Implications are discussed in terms of improving parent reading-related knowledge to promote child literacy.

  5. Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.

    Science.gov (United States)

    Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D

    2015-07-30

    RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove

  6. Why should I read? - A cross-cultural investigation into adolescents' reading socialisation and reading attitude

    Science.gov (United States)

    Broeder, Peter; Stokmans, Mia

    2013-06-01

    While reading behaviour of adolescents is a frequent object of research, most studies in this field are restricted to a single country. This study investigates reading as a leisure-time activity across social groups from three regions differing in reading tradition as well as in the facilities available for reading. The authors analyse the reading behaviour of a total of 2,173 adolescents in the Netherlands, in Beijing (China), and in Cape Town (South Africa). Taking Icek Ajzen's Theory of Planned Behaviour as a starting point, the authors adjusted it to model the three most important determinants of reading behaviour, namely (1) reading attitude; (2) subjective norms (implicit and explicit social pressure to read); and (3) perceived behavioural control, which includes reading proficiency and appropriateness of the available books (book supply). While they found the adjusted model to fit the Dutch and Beijing situation quite well, it appeared to be inappropriate for the Cape Town situation. Despite considerable cultural and situational differences between the Netherlands and Beijing, the results show a similar pattern for these two environments. The most important determinants turn out to be: the hedonic reading attitude, the implicit norm of family and friends, the attractiveness of the available choice of books, and the perceived reading proficiency.

  7. Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.

    Science.gov (United States)

    Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh

    2018-01-01

    Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.

  8. Clock domain crossing modules for OCP-style read/write interfaces

    DEFF Research Database (Denmark)

    Herlev, Mathias; Sparsø, Jens

    The open core protocol (OCP) is an openly licensed, configurable, and scalable interface protocol for on-chip subsystem communications. The protocol defines read and write transactions from a master towards a slave across a point-to-point connection and the protocol assumes a single common clock....... This paper presents the design of two OCP clock domain crossing interface modules, that can be used to construct systems with multiple clock domains. One module (called OCPio) supports a single word read-write interface and the other module (called OCPburst) supports a four word burst read-write interface......-style read-write transaction interfaces. An OCP interface typically has control signals related to both the master issuing a read or write request and the slave producing a response. If all these control signals are passed across the clock domain boundary and synchronized it may add significant latency...

  9. Estimação de parâmetros da cinética de trânsito de partículas em bovinos sob pastejo por diferentes seqüências amostrais

    Directory of Open Access Journals (Sweden)

    Detmann Edenio

    2001-01-01

    Full Text Available Objetivou-se, neste estudo, avaliar a capacidade de ajustamento de um modelo não-linear no processo de estimação dos parâmetros da cinética de trânsito de partículas de bovinos em pastejo, empregando-se diferentes seqüências de amostragem fecal. Foram utilizados cinco novilhos F1 Limousin x Nelore, fistulados no esôfago e rúmen, sob pastejo de Brachiaria decumbens, com suplementação concentrada, durante o período das águas. O experimento constituiu de três períodos experimentais, conduzido em delineamento em blocos casualizados. Empregou-se como indicador o cromo mordante, produzido a partir de amostras de extrusa. Ajustou-se às curvas de excreção fecal do indicador o modelo duplo exponencial tempo-dependente, empregando-se as seguintes seqüências amostrais: SEQ 1 - 22 coletas (seqüência amostral completa; SEQ 2 e SEQ 3 - 17 coletas (redução de pontos de coleta nas fases ascendente e descendente da curva, respectivamente; e SEQ 4 e SEQ 5 - 13 e 10 coletas fecais (redução do número de coletas em todo o perfil da curva. As seqüências reduzidas foram produzidas a partir da omissão de pontos específicos da seqüência amostral completa (SEQ 1. A comparação entre as estimativas dos parâmetros cinéticos e da excreção fecal; a análise descritiva do número de iterações necessário à convergência do modelo e do coeficiente de determinação; e a avaliação de falta de ajustamento não apontaram diferenças entre as seqüências. A análise residual apontou, contudo, melhorias quanto ao comportamento gráfico e perfil de corridas de sinal dos resíduos com a redução do número de coletas para SEQ 4 e SEQ 5. Em função de pequena perda de eficiência, avaliada frente à variância residual, observada em SEQ 5, recomenda-se o emprego de 13 coletas fecais (SEQ 4 para avaliação da curva de excreção fecal do indicador em estudos desta natureza.

  10. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    Science.gov (United States)

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop

  11. ORF Alignment: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.6649; Contig19-10251; complement(36800..38461); BR...87398.1| ... TFIIB related subunit of TFIIIB (BRF1) [Candida ... albicans] pir||B55483 transcr...L Transcription factor IIIB 70 kDa ... subunit (TFIIIB) (B-related factor)

  12. ORF Alignment: Ca19AnnotatedDec2004aaSeq [GENIUS II[Archive

    Lifescience Database Archive (English)

    Full Text Available Ca19AnnotatedDec2004aaSeq orf19.2029; Contig19-10139; 79190..80278; RFC5*; DNA replicationn factor C | lead...ing strand elongation mismatch repair ... (ATPase); >1a5t0 2 329 7 339 1e-22 ... gb|EAL00

  13. Patterns of Reading Performance in Acute Stroke: A Descriptive Analysis

    Directory of Open Access Journals (Sweden)

    Lauren L. Cloutman

    2010-01-01

    Full Text Available One of the main sources of information regarding the underlying processes involved in both normal and impaired reading has been the study of reading deficits that occur as a result of brain damage. However, patterns of reading deficits found acutely after brain injury have been little explored. The observed patterns of performance in chronic stroke patients might reflect reorganization of the cognitive processes underlying reading or development of compensatory strategies that are not normally used to read. Method: 112 acute left hemisphere stroke patients were administered a task of oral reading of words and pseudowords within 1–2 days of hospital admission; performance was examined for error rate and type, and compared to that on tasks involving visual lexical decision, visual/auditory comprehension, and naming. Results: Several distinct patterns of performance were identified. Although similarities were found between the patterns of reading performance observed acutely and the classical acquired dyslexias generally identified more chronically, some notable differences were observed. Of interest was the finding that no patient produced any pure semantic errors in reading, despite finding such errors in comprehension and naming.

  14. Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics

    Science.gov (United States)

    Kevin Weitemier; Shannon C.K. Straub; Richard C. Cronn; Mark Fishbein; Roswitha Schmickl; Angela McDonnell; Aaron. Liston

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed ( Asclepias syriaca ) were used to design enrichment probes for 3385...

  15. Simultaneous characterization of cellular RNA structure and function with in-cell SHAPE-Seq.

    Science.gov (United States)

    Watters, Kyle E; Abbott, Timothy R; Lucks, Julius B

    2016-01-29

    Many non-coding RNAs form structures that interact with cellular machinery to control gene expression. A central goal of molecular and synthetic biology is to uncover design principles linking RNA structure to function to understand and engineer this relationship. Here we report a simple, high-throughput method called in-cell SHAPE-Seq that combines in-cell probing of RNA structure with a measurement of gene expression to simultaneously characterize RNA structure and function in bacterial cells. We use in-cell SHAPE-Seq to study the structure-function relationship of two RNA mechanisms that regulate translation in Escherichia coli. We find that nucleotides that participate in RNA-RNA interactions are highly accessible when their binding partner is absent and that changes in RNA structure due to RNA-RNA interactions can be quantitatively correlated to changes in gene expression. We also characterize the cellular structures of three endogenously expressed non-coding RNAs: 5S rRNA, RNase P and the btuB riboswitch. Finally, a comparison between in-cell and in vitro folded RNA structures revealed remarkable similarities for synthetic RNAs, but significant differences for RNAs that participate in complex cellular interactions. Thus, in-cell SHAPE-Seq represents an easily approachable tool for biologists and engineers to uncover relationships between sequence, structure and function of RNAs in the cell. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Comparison of reading speed with 3 different log-scaled reading charts.

    Science.gov (United States)

    Buari, Noor Halilah; Chen, Ai-Hong; Musa, Nuraini

    2014-01-01

    A reading chart that resembles real reading conditions is important to evaluate the quality of life in terms of reading performance. The purpose of this study was to compare the reading speed of UiTM Malay related words (UiTM-Mrw) reading chart with MNread Acuity Chart and Colenbrander Reading Chart. Fifty subjects with normal sight were randomly recruited through randomized sampling in this study (mean age=22.98±1.65 years). Subjects were asked to read three different near charts aloud and as quickly as possible at random sequence. The charts were the UiTM-Mrw Reading Chart, MNread Acuity Chart and Colenbrander Reading Chart, respectively. The time taken to read each chart was recorded and any errors while reading were noted. Reading performance was quantified in terms of reading speed as words per minute (wpm). The mean reading speed for UiTM-Mrw Reading Chart, MNread Acuity Chart and Colenbrander Reading Chart was 200±30wpm, 196±28wpm and 194±31wpm, respectively. Comparison of reading speed between UiTM-Mrw Reading Chart and MNread Acuity Chart showed no significant difference (t=-0.73, p=0.72). The same happened with the reading speed between UiTM-Mrw Reading Chart and Colenbrander Reading Chart (t=-0.97, p=0.55). Bland and Altman plot showed good agreement between reading speed of UiTM-Mrw Reading Chart with MNread Acuity Chart with the Colenbrander Reading Chart. UiTM-Mrw Reading Chart in Malay language is highly comparable with standardized charts and can be used for evaluating reading speed. Copyright © 2013 Spanish General Council of Optometry. Published by Elsevier Espana. All rights reserved.

  17. esATAC: An Easy-to-use Systematic pipeline for ATAC-seq data analysis.

    Science.gov (United States)

    Wei, Zheng; Zhang, Wei; Fang, Huan; Li, Yanda; Wang, Xiaowo

    2018-03-07

    ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present "esATAC", a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data analysis. It covers essential steps for full analyzing procedure, including raw data processing, quality control and downstream statistical analysis such as peak calling, enrichment analysis and transcription factor footprinting. esATAC supports one command line execution for preset pipelines, and provides flexible interfaces for building customized pipelines. esATAC package is open source under the GPL-3.0 license. It is implemented in R and C ++. Source code and binaries for Linux, MAC OS X and Windows are available through Bioconductor https://www.bioconductor.org/packages/release/bioc/html/esATAC.html). xwwang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.

  18. Subjective Impressions Do Not Mirror Online Reading Effort: Concurrent EEG-Eyetracking Evidence from the Reading of Books and Digital Media

    Science.gov (United States)

    Kretzschmar, Franziska; Pleimling, Dominique; Hosemann, Jana; Füssel, Stephan; Bornkessel-Schlesewsky, Ina; Schlesewsky, Matthias

    2013-01-01

    In the rapidly changing circumstances of our increasingly digital world, reading is also becoming an increasingly digital experience: electronic books (e-books) are now outselling print books in the United States and the United Kingdom. Nevertheless, many readers still view e-books as less readable than print books. The present study thus used combined EEG and eyetracking measures in order to test whether reading from digital media requires higher cognitive effort than reading conventional books. Young and elderly adults read short texts on three different reading devices: a paper page, an e-reader and a tablet computer and answered comprehension questions about them while their eye movements and EEG were recorded. The results of a debriefing questionnaire replicated previous findings in that participants overwhelmingly chose the paper page over the two electronic devices as their preferred reading medium. Online measures, by contrast, showed shorter mean fixation durations and lower EEG theta band voltage density – known to covary with memory encoding and retrieval – for the older adults when reading from a tablet computer in comparison to the other two devices. Young adults showed comparable fixation durations and theta activity for all three devices. Comprehension accuracy did not differ across the three media for either group. We argue that these results can be explained in terms of the better text discriminability (higher contrast) produced by the backlit display of the tablet computer. Contrast sensitivity decreases with age and degraded contrast conditions lead to longer reading times, thus supporting the conclusion that older readers may benefit particularly from the enhanced contrast of the tablet. Our findings thus indicate that people's subjective evaluation of digital reading media must be dissociated from the cognitive and neural effort expended in online information processing while reading from such devices. PMID:23405265

  19. Subjective impressions do not mirror online reading effort: concurrent EEG-eyetracking evidence from the reading of books and digital media.

    Science.gov (United States)

    Kretzschmar, Franziska; Pleimling, Dominique; Hosemann, Jana; Füssel, Stephan; Bornkessel-Schlesewsky, Ina; Schlesewsky, Matthias

    2013-01-01

    In the rapidly changing circumstances of our increasingly digital world, reading is also becoming an increasingly digital experience: electronic books (e-books) are now outselling print books in the United States and the United Kingdom. Nevertheless, many readers still view e-books as less readable than print books. The present study thus used combined EEG and eyetracking measures in order to test whether reading from digital media requires higher cognitive effort than reading conventional books. Young and elderly adults read short texts on three different reading devices: a paper page, an e-reader and a tablet computer and answered comprehension questions about them while their eye movements and EEG were recorded. The results of a debriefing questionnaire replicated previous findings in that participants overwhelmingly chose the paper page over the two electronic devices as their preferred reading medium. Online measures, by contrast, showed shorter mean fixation durations and lower EEG theta band voltage density--known to covary with memory encoding and retrieval--for the older adults when reading from a tablet computer in comparison to the other two devices. Young adults showed comparable fixation durations and theta activity for all three devices. Comprehension accuracy did not differ across the three media for either group. We argue that these results can be explained in terms of the better text discriminability (higher contrast) produced by the backlit display of the tablet computer. Contrast sensitivity decreases with age and degraded contrast conditions lead to longer reading times, thus supporting the conclusion that older readers may benefit particularly from the enhanced contrast of the tablet. Our findings thus indicate that people's subjective evaluation of digital reading media must be dissociated from the cognitive and neural effort expended in online information processing while reading from such devices.

  20. Subjective impressions do not mirror online reading effort: concurrent EEG-eyetracking evidence from the reading of books and digital media.

    Directory of Open Access Journals (Sweden)

    Franziska Kretzschmar

    Full Text Available In the rapidly changing circumstances of our increasingly digital world, reading is also becoming an increasingly digital experience: electronic books (e-books are now outselling print books in the United States and the United Kingdom. Nevertheless, many readers still view e-books as less readable than print books. The present study thus used combined EEG and eyetracking measures in order to test whether reading from digital media requires higher cognitive effort than reading conventional books. Young and elderly adults read short texts on three different reading devices: a paper page, an e-reader and a tablet computer and answered comprehension questions about them while their eye movements and EEG were recorded. The results of a debriefing questionnaire replicated previous findings in that participants overwhelmingly chose the paper page over the two electronic devices as their preferred reading medium. Online measures, by contrast, showed shorter mean fixation durations and lower EEG theta band voltage density--known to covary with memory encoding and retrieval--for the older adults when reading from a tablet computer in comparison to the other two devices. Young adults showed comparable fixation durations and theta activity for all three devices. Comprehension accuracy did not differ across the three media for either group. We argue that these results can be explained in terms of the better text discriminability (higher contrast produced by the backlit display of the tablet computer. Contrast sensitivity decreases with age and degraded contrast conditions lead to longer reading times, thus supporting the conclusion that older readers may benefit particularly from the enhanced contrast of the tablet. Our findings thus indicate that people's subjective evaluation of digital reading media must be dissociated from the cognitive and neural effort expended in online information processing while reading from such devices.

  1. Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq.

    Science.gov (United States)

    Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao

    2017-12-07

    Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. What oral text reading fluency can reveal about reading comprehension

    NARCIS (Netherlands)

    Veenendaal, N.J.; Groen, M.A.; Verhoeven, L.T.W.

    2015-01-01

    Text reading fluency – the ability to read quickly, accurately and with a natural intonation – has been proposed as a predictor of reading comprehension. In the current study, we examined the role of oral text reading fluency, defined as text reading rate and text reading prosody, as a contributor

  3. Designing Reading Materials for the Faculty of Social and Political Sciences at UIN Syarif Hidayatullah Jakarta

    Directory of Open Access Journals (Sweden)

    Devi Yusnita

    2016-01-01

    Full Text Available This research is aimed to design reading materials for the Faculty of Social and Political Sciences, UIN Syarif HIdayatullah Jakarta, due to the absence of such specific materials in the market. To produce satisfactory teaching materials, the researcher did some steps i.e. doing needs analysis, reviewing the principles of materials design and reading strategies, designing course framework, designing syllabus, designing the reading materials, and implementing the sample lessons. The needs analysis was intended to find out what the students needed and to find out the subjects the students learned from the institution in order to produce adequate reading materials. Based on the needs analysis, the researcher then identified the global aims of the course, thereby, the writer designed course framework. This course framework contained general points of reading themes and topics, information of classroom activities that followed up reading, the length of study session, the number of the course meetings, and the number of participants. The course framework became the basis to write the syllabus. Finally the syllabus became the basis for designing reading materials.

  4. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  5. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  6. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.; Kö ser, Claudio U.; Ross, Nicholas E.; Archer, John A.C.

    2010-01-01

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  7. The Effects of Extensive Reading on Reading Comprehension, Reading Rate, and Vocabulary Acquisition

    Science.gov (United States)

    Suk, Namhee

    2017-01-01

    Several empirical studies and syntheses of extensive reading have concluded that extensive reading has positive impacts on language learning in second- and foreign-language settings. However, many of the studies contained methodological or curricular limitations, raising questions about the asserted positive effects of extensive reading. The…

  8. RNA-Seq analysis of chikungunya virus infection and identification of granzyme A as a major promoter of arthritic inflammation.

    Directory of Open Access Journals (Sweden)

    Jane A C Wilson

    2017-02-01

    Full Text Available Chikungunya virus (CHIKV is an arthritogenic alphavirus causing epidemics of acute and chronic arthritic disease. Herein we describe a comprehensive RNA-Seq analysis of feet and lymph nodes at peak viraemia (day 2 post infection, acute arthritis (day 7 and chronic disease (day 30 in the CHIKV adult wild-type mouse model. Genes previously shown to be up-regulated in CHIKV patients were also up-regulated in the mouse model. CHIKV sequence information was also obtained with up to ≈8% of the reads mapping to the viral genome; however, no adaptive viral genome changes were apparent. Although day 2, 7 and 30 represent distinct stages of infection and disease, there was a pronounced overlap in up-regulated host genes and pathways. Type I interferon response genes (IRGs represented up to ≈50% of up-regulated genes, even after loss of type I interferon induction on days 7 and 30. Bioinformatic analyses suggested a number of interferon response factors were primarily responsible for maintaining type I IRG induction. A group of genes prominent in the RNA-Seq analysis and hitherto unexplored in viral arthropathies were granzymes A, B and K. Granzyme A-/- and to a lesser extent granzyme K-/-, but not granzyme B-/-, mice showed a pronounced reduction in foot swelling and arthritis, with analysis of granzyme A-/- mice showing no reductions in viral loads but reduced NK and T cell infiltrates post CHIKV infection. Treatment with Serpinb6b, a granzyme A inhibitor, also reduced arthritic inflammation in wild-type mice. In non-human primates circulating granzyme A levels were elevated after CHIKV infection, with the increase correlating with viral load. Elevated granzyme A levels were also seen in a small cohort of human CHIKV patients. Taken together these results suggest granzyme A is an important driver of arthritic inflammation and a potential target for therapy.ClinicalTrials.gov NCT00281294.

  9. oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets.

    Science.gov (United States)

    Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W

    2012-09-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.

  10. Imitated prosodic fluency predicts reading comprehension ability in good and poor high school readers

    Directory of Open Access Journals (Sweden)

    Mara Breen

    2016-07-01

    Full Text Available Researchers have established a relationship between beginning readers’ silent comprehension ability and their prosodic fluency, such that readers who read aloud with appropriate prosody tend to have higher scores on silent reading comprehension assessments. The current study was designed to investigate this relationship in two groups of high school readers: Specifically Poor Comprehenders (SPCs, who have adequate word level and phonological skills but poor reading comprehension ability, and a group of age- and decoding skill-matched controls. We compared the prosodic fluency of the two groups by determining how effectively they produced prosodic cues to syntactic and semantic structure in imitations of a model speaker’s production of syntactically and semantically varied sentences. Analyses of pitch and duration patterns revealed that speakers in both groups produced the expected prosodic patterns; however, controls provided stronger durational cues to syntactic structure. These results demonstrate that the relationship between prosodic fluency and reading comprehension continues past the stage of early reading instruction. Moreover, they suggest that prosodically fluent speakers may also generate more fluent implicit prosodic representations during silent reading, leading to more effective comprehension.

  11. Transcriptomic analysis across nasal, temporal, and macular regions of human neural retina and RPE/choroid by RNA-Seq

    Science.gov (United States)

    Whitmore, S. Scott; Wagner, Alex H.; DeLuca, Adam P.; Drack, Arlene V.; Stone, Edwin M.; Tucker, Budd A.; Zeng, Shemin; Braun, Terry A.; Mullins, Robert F.; Scheetz, Todd E.

    2014-01-01

    Proper spatial differentiation of retinal cell types is necessary for normal human vision. Many retinal diseases, such as Best disease and male germ cell associated kinase (MAK)-associated retinitis pigmentosa, preferentially affect distinct topographic regions of the retina. While much is known about the distribution of cell-types in the retina, the distribution of molecular components across the posterior pole of the eye has not been well-studied. To investigate regional difference in molecular composition of ocular tissues, we assessed differential gene expression across the temporal, macular, and nasal retina and retinal pigment epithelium (RPE)/choroid of human eyes using RNA-Seq. RNA from temporal, macular, and nasal retina and RPE/choroid from four human donor eyes was extracted, poly-A selected, fragmented, and sequenced as 100 bp read pairs. Digital read files were mapped to the human genome and analyzed for differential expression using the Tuxedo software suite. Retina and RPE/choroid samples were clearly distinguishable at the transcriptome level. Numerous transcription factors were differentially expressed between regions of the retina and RPE/choroid. Photoreceptor-specific genes were enriched in the peripheral samples, while ganglion cell and amacrine cell genes were enriched in the macula. Within the RPE/choroid, RPE-specific genes were upregulated at the periphery while endothelium associated genes were upregulated in the macula. Consistent with previous studies, BEST1 expression was lower in macular than extramacular regions. The MAK gene was expressed at lower levels in macula than in extramacular regions, but did not exhibit a significant difference between nasal and temporal retina. The regional molecular distinction is greatest between macula and periphery and decreases between different peripheral regions within a tissue. Datasets such as these can be used to prioritize candidate genes for possible involvement in retinal diseases with

  12. Transcriptomic analysis across nasal, temporal, and macular regions of human neural retina and RPE/choroid by RNA-Seq.

    Science.gov (United States)

    Whitmore, S Scott; Wagner, Alex H; DeLuca, Adam P; Drack, Arlene V; Stone, Edwin M; Tucker, Budd A; Zeng, Shemin; Braun, Terry A; Mullins, Robert F; Scheetz, Todd E

    2014-12-01

    Proper spatial differentiation of retinal cell types is necessary for normal human vision. Many retinal diseases, such as Best disease and male germ cell associated kinase (MAK)-associated retinitis pigmentosa, preferentially affect distinct topographic regions of the retina. While much is known about the distribution of cell types in the retina, the distribution of molecular components across the posterior pole of the eye has not been well-studied. To investigate regional difference in molecular composition of ocular tissues, we assessed differential gene expression across the temporal, macular, and nasal retina and retinal pigment epithelium (RPE)/choroid of human eyes using RNA-Seq. RNA from temporal, macular, and nasal retina and RPE/choroid from four human donor eyes was extracted, poly-A selected, fragmented, and sequenced as 100 bp read pairs. Digital read files were mapped to the human genome and analyzed for differential expression using the Tuxedo software suite. Retina and RPE/choroid samples were clearly distinguishable at the transcriptome level. Numerous transcription factors were differentially expressed between regions of the retina and RPE/choroid. Photoreceptor-specific genes were enriched in the peripheral samples, while ganglion cell and amacrine cell genes were enriched in the macula. Within the RPE/choroid, RPE-specific genes were upregulated at the periphery while endothelium associated genes were upregulated in the macula. Consistent with previous studies, BEST1 expression was lower in macular than extramacular regions. The MAK gene was expressed at lower levels in macula than in extramacular regions, but did not exhibit a significant difference between nasal and temporal retina. The regional molecular distinction is greatest between macula and periphery and decreases between different peripheral regions within a tissue. Datasets such as these can be used to prioritize candidate genes for possible involvement in retinal diseases with

  13. Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization.

    Science.gov (United States)

    Locati, Mauro D; Terpstra, Inez; de Leeuw, Wim C; Kuzak, Mateusz; Rauwerda, Han; Ensink, Wim A; van Leeuwen, Selina; Nehrdich, Ulrike; Spaink, Herman P; Jonker, Martijs J; Breit, Timo M; Dekker, Rob J

    2015-08-18

    There is an increasing interest in complementing RNA-seq experiments with small-RNA (sRNA) expression data to obtain a comprehensive view of a transcriptome. Currently, two main experimental challenges concerning sRNA-seq exist: how to check the size distribution of isolated sRNAs, given the sensitive size-selection steps in the protocol; and how to normalize data between samples, given the low complexity of sRNA types. We here present two separate sets of synthetic RNA spike-ins for monitoring size-selection and for performing data normalization in sRNA-seq. The size-range quality control (SRQC) spike-in set, consisting of 11 oligoribonucleotides (10-70 nucleotides), was tested by intentionally altering the size-selection protocol and verified via several comparative experiments. We demonstrate that the SRQC set is useful to reproducibly track down biases in the size-selection in sRNA-seq. The external reference for data-normalization (ERDN) spike-in set, consisting of 19 oligoribonucleotides, was developed for sample-to-sample normalization in differential-expression analysis of sRNA-seq data. Testing and applying the ERDN set showed that it can reproducibly detect differential expression over a dynamic range of 2(18). Hence, biological variation in sRNA composition and content between samples is preserved while technical variation is effectively minimized. Together, both spike-in sets can significantly improve the technical reproducibility of sRNA-seq. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Efficient alignment of pyrosequencing reads for re-sequencing applications

    Directory of Open Access Journals (Sweden)

    Russo Luis MS

    2011-05-01

    Full Text Available Abstract Background Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. Results We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454 system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. Conclusions The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.

  15. Enhancing academic reading skills through extensive reading ...

    African Journals Online (AJOL)

    Abstract. The current study explores the feasibility of an extensive reading programme in the context of a low-income country (Mozambique), as well as the influence of extensive reading on academic reading. The programme took over 4 months and was conducted among 30 students majoring in Journalism at the Eduardo ...

  16. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    Directory of Open Access Journals (Sweden)

    Sally Ali Tawfik

    2018-01-01

    Full Text Available The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials.

  17. Pseudo-synesthesia through reading books with colored letters.

    Directory of Open Access Journals (Sweden)

    Olympia Colizoli

    Full Text Available BACKGROUND: Synesthesia is a phenomenon where a stimulus produces consistent extraordinary subjective experiences. A relatively common type of synesthesia involves perception of color when viewing letters (e.g. the letter 'a' always appears as light blue. In this study, we examine whether traits typically regarded as markers of synesthesia can be acquired by simply reading in color. METHODOLOGY/PRINCIPAL FINDINGS: Non-synesthetes were given specially prepared colored books to read. A modified Stroop task was administered before and after reading. A perceptual crowding task was administered after reading. Reading one book (>49,000 words was sufficient to induce effects regarded as behavioral markers for synesthesia. The results of the Stroop tasks indicate that it is possible to learn letter-color associations through reading in color (F(1, 14 = 5.85, p = .030. Furthermore, Stroop effects correlated with subjective reports about experiencing letters in color (r(13 = 0.51, p = .05. The frequency of viewing letters is related to the level of association as seen by the difference in the Stroop effect size between upper- and lower-case letters (t(14 = 2.79, p = .014 and in a subgroup of participants whose Stroop effects increased as they continued to read in color. Readers did not show significant performance advantages on the crowding task compared to controls. Acknowledging the many differences between trainees and synesthetes, results suggest that it may be possible to acquire a subset of synesthetic behavioral traits in adulthood through training. CONCLUSION/SIGNIFICANCE: To our knowledge, this is the first evidence of acquiring letter-color associations through reading in color. Reading in color appears to be a promising avenue in which we may explore the differences and similarities between synesthetes and non-synesthetes. Additionally, reading in color is a plausible method for a long-term 'synesthetic' training program.

  18. Reading: Time

    NARCIS (Netherlands)

    Annemarie Wennekers; Frank Huysmans; Jos de Haan

    2018-01-01

    Original title: Lees:Tijd The amount of time that Dutch people spend reading has been declining steadily since the 1950s. This decline in reading time contrasts starkly with the positive personal and social benefits that can be derived from reading, according to lots of research. The Reading:

  19. Antimicrobial resistance and prevalence of CvfB, SEK and SEQ genes among Staphylococcus aureus isolates from paediatric patients with bloodstream infections.

    Science.gov (United States)

    Liang, Bing-Shao; Huang, Yan-Mei; Chen, Yin-Shuang; Dong, Hui; Mai, Jia-Liang; Xie, Yong-Qiang; Zhong, Hua-Min; Deng, Qiu-Lian; Long, Yan; Yang, Yi-Yu; Gong, Si-Tang; Zhou, Zhen-Wen

    2017-11-01

    Staphylococcus aureus ( S. aureus ) is one of the most frequently isolated pathogens in neonatal cases of early and late-onset sepsis. Drug resistance profiles and carriage of toxin genes may affect the treatment and outcome of an infection. The present study aimed to determine the antimicrobial resistance patterns and frequencies of the toxin-associated genes conserved virulence factor B (CvfB), staphylococcal enterotoxin Q (SEQ) and staphylococcal enterotoxin K (SEK) among S. aureus isolates recovered from paediatric patients with bloodstream infections (BSIs) in Guangzhou (China). Of the 53 isolates, 43.4% were methicillin-resistant S. aureus (MRSA), and resistance rates to penicillin, erythromycin, clindamycin, trimethoprim/sulfamethoxazole, tetracycline, and ciprofloxacin of 92.5, 66.0, 62.3, 13.2, 20.8 and 1.9% were recorded, respectively. However, no resistance to nitrofurantoin, dalfopristin/quinupristin, rifampicin, gentamicin, linezolid or vancomycin was detected. Resistance to erythromycin, clindamycin and tetracycline in the MRSA group was significantly higher than that in the methicillin-susceptible S. aureus (MSSA) group. No significant differences in antimicrobial resistance patterns were noted between two age groups (≤1 year and >1 year). The proportion of S. aureus isolates positive for CvfB, SEQ and SEK was 100, 34.0 and 35.8%, respectively, with 24.5% (13/53) of strains carrying all three genes. Compared with those in MSSA isolates, the rates of SEK, SEQ and SEK + SEQ carriage among MRSA isolates were significantly higher. Correlations were identified between the carriage of SEQ, SEK and SEQ + SEK genes and MRSA (contingency coefficient 0.500, 0.416, 0.546, respectively; Pstudy clarified the characteristics of BSI-associated S. aureus and enhanced the current understanding of the pathogenicity and treatment of MRSA.

  20. ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.

    Science.gov (United States)

    Ravanmehr, Vida; Kim, Minji; Wang, Zhiying; Milenkovic, Olgica

    2018-03-15

    Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers. We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++. milenkov@illinois.edu. Supplementary data are available at Bioinformatics online.