WorldWideScience

Sample records for rna-seq-based transcript quantitation

  1. RNA-Seq-Based Transcript Structure Analysis with TrBorderExt.

    Science.gov (United States)

    Wang, Yejun; Sun, Ming-An; White, Aaron P

    2018-01-01

    RNA-Seq has become a routine strategy for genome-wide gene expression comparisons in bacteria. Despite lower resolution in transcript border parsing compared with dRNA-Seq, TSS-EMOTE, Cappable-seq, Term-seq, and others, directional RNA-Seq still illustrates its advantages: low cost, quantification and transcript border analysis with a medium resolution (±10-20 nt). To facilitate mining of directional RNA-Seq datasets especially with respect to transcript structure analysis, we developed a tool, TrBorderExt, which can parse transcript start sites and termination sites accurately in bacteria. A detailed protocol is described in this chapter for how to use the software package step by step to identify bacterial transcript borders from raw RNA-Seq data. The package was developed with Perl and R programming languages, and is accessible freely through the website: http://www.szu-bioinf.org/TrBorderExt .

  2. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

    Science.gov (United States)

    Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar

    2013-10-15

    High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.

  3. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    Science.gov (United States)

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  4. An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

    Science.gov (United States)

    Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D

    2017-01-01

    We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.

  5. DETECTION OF BACTERIAL SMALL TRANSCRIPTS FROM RNA-SEQ DATA: A COMPARATIVE ASSESSMENT.

    Science.gov (United States)

    Peña-Castillo, Lourdes; Grüell, Marc; Mulligan, Martin E; Lang, Andrew S

    2016-01-01

    Small non-coding RNAs (sRNAs) are regulatory RNA molecules that have been identified in a multitude of bacterial species and shown to control numerous cellular processes through various regulatory mechanisms. In the last decade, next generation RNA sequencing (RNA-seq) has been used for the genome-wide detection of bacterial sRNAs. Here we describe sRNA-Detect, a novel approach to identify expressed small transcripts from prokaryotic RNA-seq data. Using RNA-seq data from three bacterial species and two sequencing platforms, we performed a comparative assessment of five computational approaches for the detection of small transcripts. We demonstrate that sRNA-Detect improves upon current standalone computational approaches for identifying novel small transcripts in bacteria.

  6. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

    Science.gov (United States)

    Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

    2015-06-25

    Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

  7. Network-Based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis.

    Directory of Open Access Journals (Sweden)

    Wei Zhang

    2015-12-01

    Full Text Available High-throughput mRNA sequencing (RNA-Seq is widely used for transcript quantification of gene isoforms. Since RNA-Seq data alone is often not sufficient to accurately identify the read origins from the isoforms for quantification, we propose to explore protein domain-domain interactions as prior knowledge for integrative analysis with RNA-Seq data. We introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Based on our observation that the abundances of the neighboring isoforms by domain-domain interactions in the network are positively correlated, Net-RSTQ models the expression of the neighboring transcripts as Dirichlet priors on the likelihood of the observed read alignments against the transcripts in one gene. The transcript abundances of all the genes are then jointly estimated with alternating optimization of multiple EM problems. In simulation Net-RSTQ effectively improved isoform transcript quantifications when isoform co-expressions correlate with their interactions. qRT-PCR results on 25 multi-isoform genes in a stem cell line, an ovarian cancer cell line, and a breast cancer cell line also showed that Net-RSTQ estimated more consistent isoform proportions with RNA-Seq data. In the experiments on the RNA-Seq data in The Cancer Genome Atlas (TCGA, the transcript abundances estimated by Net-RSTQ are more informative for patient sample classification of ovarian cancer, breast cancer and lung cancer. All experimental results collectively support that Net-RSTQ is a promising approach for isoform quantification. Net-RSTQ toolbox is available at http://compbio.cs.umn.edu/Net-RSTQ/.

  8. An empirical strategy to detect bacterial transcript structure from directional RNA-seq transcriptome data.

    Science.gov (United States)

    Wang, Yejun; MacKenzie, Keith D; White, Aaron P

    2015-05-07

    As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis. In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s. Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for

  9. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    Science.gov (United States)

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  10. Combining laser microdissection and RNA-seq to chart the transcriptional landscape of fungal development

    Science.gov (United States)

    2012-01-01

    Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM) and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia) and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated. PMID:23016559

  11. Combining laser microdissection and RNA-seq to chart the transcriptional landscape of fungal development

    Directory of Open Access Journals (Sweden)

    Teichert Ines

    2012-09-01

    Full Text Available Abstract Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated.

  12. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Directory of Open Access Journals (Sweden)

    Dewey Colin N

    2011-08-01

    Full Text Available Abstract Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost

  13. Characterization and Improvement of RNA-Seq Precision in Quantitative Transcript Expression Profiling

    Energy Technology Data Exchange (ETDEWEB)

    Labaj, Pawel P.; Leparc, German G.; Linggi, Bryan E.; Markillie, Lye Meng; Wiley, H. S.; Kreil, David P.

    2011-07-01

    Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large scale RNA-Seq data sets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. Results: We report on a comprehensive study of target coverage and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive target coverage of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, less than 30% of all transcripts could be quantified reliably with a relative error < 20%. Based on established tools, we then introduce a new approach for mapping and analyzing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

  14. RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.

    Science.gov (United States)

    Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei

    2014-07-07

    Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.

  15. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    Science.gov (United States)

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  16. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification

    Directory of Open Access Journals (Sweden)

    Tamar Hashimshony

    2012-09-01

    Full Text Available High-throughput sequencing has allowed for unprecedented detail in gene expression analyses, yet its efficient application to single cells is challenged by the small starting amounts of RNA. We have developed CEL-Seq, a method for overcoming this limitation by barcoding and pooling samples before linearly amplifying mRNA with the use of one round of in vitro transcription. We show that CEL-Seq gives more reproducible, linear, and sensitive results than a PCR-based amplification method. We demonstrate the power of this method by studying early C. elegans embryonic development at single-cell resolution. Differential distribution of transcripts between sister cells is seen as early as the two-cell stage embryo, and zygotic expression in the somatic cell lineages is enriched for transcription factors. The robust transcriptome quantifications enabled by CEL-Seq will be useful for transcriptomic analyses of complex tissues containing populations of diverse cell types.

  17. Characterizing and annotating the genome using RNA-seq data.

    Science.gov (United States)

    Chen, Geng; Shi, Tieliu; Shi, Leming

    2017-02-01

    Bioinformatics methods for various RNA-seq data analyses are in fast evolution with the improvement of sequencing technologies. However, many challenges still exist in how to efficiently process the RNA-seq data to obtain accurate and comprehensive results. Here we reviewed the strategies for improving diverse transcriptomic studies and the annotation of genetic variants based on RNA-seq data. Mapping RNA-seq reads to the genome and transcriptome represent two distinct methods for quantifying the expression of genes/transcripts. Besides the known genes annotated in current databases, many novel genes/transcripts (especially those long noncoding RNAs) still can be identified on the reference genome using RNA-seq. Moreover, owing to the incompleteness of current reference genomes, some novel genes are missing from them. Genome- guided and de novo transcriptome reconstruction are two effective and complementary strategies for identifying those novel genes/transcripts on or beyond the reference genome. In addition, integrating the genes of distinct databases to conduct transcriptomics and genetics studies can improve the results of corresponding analyses.

  18. AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data

    KAUST Repository

    Zhang, Runxuan

    2016-05-06

    Background Alternative splicing is the major post-transcriptional mechanism by which gene expression is regulated and affects a wide range of processes and responses in most eukaryotic organisms. RNA-sequencing (RNA-seq) can generate genome-wide quantification of individual transcript isoforms to identify changes in expression and alternative splicing. RNA-seq is an essential modern tool but its ability to accurately quantify transcript isoforms depends on the diversity, completeness and quality of the transcript information. Results We have developed a new Reference Transcript Dataset for Arabidopsis (AtRTD2) for RNA-seq analysis containing over 82k non-redundant transcripts, whereby 74,194 transcripts originate from 27,667 protein-coding genes. A total of 13,524 protein-coding genes have at least one alternatively spliced transcript in AtRTD2 such that about 60% of the 22,453 protein-coding, intron-containing genes in Arabidopsis undergo alternative splicing. More than 600 putative U12 introns were identified in more than 2,000 transcripts. AtRTD2 was generated from transcript assemblies of ca. 8.5 billion pairs of reads from 285 RNA-seq data sets obtained from 129 RNA-seq libraries and merged along with the previous version, AtRTD, and Araport11 transcript assemblies. AtRTD2 increases the diversity of transcripts and through application of stringent filters represents the most extensive and accurate transcript collection for Arabidopsis to date. We have demonstrated a generally good correlation of alternative splicing ratios from RNA-seq data analysed by Salmon and experimental data from high resolution RT-PCR. However, we have observed inaccurate quantification of transcript isoforms for genes with multiple transcripts which have variation in the lengths of their UTRs. This variation is not effectively corrected in RNA-seq analysis programmes and will therefore impact RNA-seq analyses generally. To address this, we have tested different genome

  19. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.

    Science.gov (United States)

    D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano

    2015-01-01

    The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export

  20. Nascent-Seq reveals novel features of mouse circadian transcriptional regulation

    Science.gov (United States)

    Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael

    2012-01-01

    A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795

  1. ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data.

    Science.gov (United States)

    Yang, Jian-Hua; Li, Jun-Hao; Jiang, Shan; Zhou, Hui; Qu, Liang-Hu

    2013-01-01

    Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, two web-based servers were developed to annotate and discover transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. In addition, we developed two genome browsers, deepView and genomeView, to provide integrated views of multidimensional data. Moreover, our web implementation supports diverse query types and the exploration of TFs, lncRNAs, miRNAs, gene ontologies and pathways.

  2. RNA-Seq of Bacillus licheniformis: active regulatory RNA features expressed within a productive fermentation

    Science.gov (United States)

    2013-01-01

    Background The production of enzymes by an industrial strain requires a complex adaption of the bacterial metabolism to the conditions within the fermenter. Regulatory events within the process result in a dynamic change of the transcriptional activity of the genome. This complex network of genes is orchestrated by proteins as well as regulatory RNA elements. Here we present an RNA-Seq based study considering selected phases of an industry-oriented fermentation of Bacillus licheniformis. Results A detailed analysis of 20 strand-specific RNA-Seq datasets revealed a multitude of transcriptionally active genomic regions. 3314 RNA features encoded by such active loci have been identified and sorted into ten functional classes. The identified sequences include the expected RNA features like housekeeping sRNAs, metabolic riboswitches and RNA switches well known from studies on Bacillus subtilis as well as a multitude of completely new candidates for regulatory RNAs. An unexpectedly high number of 855 RNA features are encoded antisense to annotated protein and RNA genes, in addition to 461 independently transcribed small RNAs. These antisense transcripts contain molecules with a remarkable size range variation from 38 to 6348 base pairs in length. The genome of the type strain B. licheniformis DSM13 was completely reannotated using data obtained from RNA-Seq analyses and from public databases. Conclusion The hereby generated data-sets represent a solid amount of knowledge on the dynamic transcriptional activities during the investigated fermentation stages. The identified regulatory elements enable research on the understanding and the optimization of crucial metabolic activities during a productive fermentation of Bacillus licheniformis strains. PMID:24079885

  3. RNA-Seq of Bacillus licheniformis: active regulatory RNA features expressed within a productive fermentation.

    Science.gov (United States)

    Wiegand, Sandra; Dietrich, Sascha; Hertel, Robert; Bongaerts, Johannes; Evers, Stefan; Volland, Sonja; Daniel, Rolf; Liesegang, Heiko

    2013-10-01

    The production of enzymes by an industrial strain requires a complex adaption of the bacterial metabolism to the conditions within the fermenter. Regulatory events within the process result in a dynamic change of the transcriptional activity of the genome. This complex network of genes is orchestrated by proteins as well as regulatory RNA elements. Here we present an RNA-Seq based study considering selected phases of an industry-oriented fermentation of Bacillus licheniformis. A detailed analysis of 20 strand-specific RNA-Seq datasets revealed a multitude of transcriptionally active genomic regions. 3314 RNA features encoded by such active loci have been identified and sorted into ten functional classes. The identified sequences include the expected RNA features like housekeeping sRNAs, metabolic riboswitches and RNA switches well known from studies on Bacillus subtilis as well as a multitude of completely new candidates for regulatory RNAs. An unexpectedly high number of 855 RNA features are encoded antisense to annotated protein and RNA genes, in addition to 461 independently transcribed small RNAs. These antisense transcripts contain molecules with a remarkable size range variation from 38 to 6348 base pairs in length. The genome of the type strain B. licheniformis DSM13 was completely reannotated using data obtained from RNA-Seq analyses and from public databases. The hereby generated data-sets represent a solid amount of knowledge on the dynamic transcriptional activities during the investigated fermentation stages. The identified regulatory elements enable research on the understanding and the optimization of crucial metabolic activities during a productive fermentation of Bacillus licheniformis strains.

  4. Annotating and quantifying pri-miRNA transcripts using RNA-Seq data of wild type and serrate-1 globular stage embryos of Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Daniel Lepe-Soltero

    2017-12-01

    Full Text Available The genome annotation for the model plant Arabidopsis thaliana does not include the primary transcripts from which MIRNAs are processed. Here we present and analyze the raw mRNA sequencing data from wild type and serrate-1 globular stage embryos of A. thaliana, ecotype Columbia. Because SERRATE is required for pri-miRNA processing, these precursors accumulate in serrate-1 mutants, facilitating their detection using standard RNA-Seq protocols. We first use the mapping of the RNA-Seq reads to the reference genome to annotate the potential primary transcripts of MIRNAs expressed in the embryo. We then quantify these pri-miRNAs in wild type and serrate-1 mutants. Finally, we use differential expression analysis to determine which are up-regulated in serrate-1 compared to wild type, to select the best candidates for bona fide pri-miRNAs expressed in the globular stage embryos. In addition, we analyze a previously published RNA-Seq dataset of wild type and dicer-like 1 mutant embryos at the globular stage [1]. Our data are interpreted and discussed in a separate article [2].

  5. Annotating and quantifying pri-miRNA transcripts using RNA-Seq data of wild type and serrate-1 globular stage embryos of Arabidopsis thaliana.

    Science.gov (United States)

    Lepe-Soltero, Daniel; Armenta-Medina, Alma; Xiang, Daoquan; Datla, Raju; Gillmor, C Stewart; Abreu-Goodger, Cei

    2017-12-01

    The genome annotation for the model plant Arabidopsis thaliana does not include the primary transcripts from which MIRNAs are processed. Here we present and analyze the raw mRNA sequencing data from wild type and serrate-1 globular stage embryos of A. thaliana , ecotype Columbia. Because SERRATE is required for pri-miRNA processing, these precursors accumulate in serrate-1 mutants, facilitating their detection using standard RNA-Seq protocols. We first use the mapping of the RNA-Seq reads to the reference genome to annotate the potential primary transcripts of MIRNAs expressed in the embryo. We then quantify these pri-miRNAs in wild type and serrate-1 mutants. Finally, we use differential expression analysis to determine which are up-regulated in serrate-1 compared to wild type, to select the best candidates for bona fide pri-miRNAs expressed in the globular stage embryos. In addition, we analyze a previously published RNA-Seq dataset of wild type and dicer-like 1 mutant embryos at the globular stage [1]. Our data are interpreted and discussed in a separate article [2].

  6. ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

    Science.gov (United States)

    Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk

    2014-03-01

    RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net

  7. PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.

    Science.gov (United States)

    Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng

    2018-05-01

    The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.

  8. Determination of in vivo RNA kinetics using RATE-seq.

    Science.gov (United States)

    Neymotin, Benjamin; Athanasiadou, Rodoniki; Gresham, David

    2014-10-01

    The abundance of a transcript is determined by its rate of synthesis and its rate of degradation; however, global methods for quantifying RNA abundance cannot distinguish variation in these two processes. Here, we introduce RNA approach to equilibrium sequencing (RATE-seq), which uses in vivo metabolic labeling of RNA and approach to equilibrium kinetics, to determine absolute RNA degradation and synthesis rates. RATE-seq does not disturb cellular physiology, uses straightforward normalization with exogenous spike-ins, and can be readily adapted for studies in most organisms. We demonstrate the use of RATE-seq to estimate genome-wide kinetic parameters for coding and noncoding transcripts in Saccharomyces cerevisiae. © 2014 Neymotin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  9. A statistical method for the detection of alternative splicing using RNA-seq.

    Directory of Open Access Journals (Sweden)

    Liguo Wang

    2010-01-01

    Full Text Available Deep sequencing of transcriptome (RNA-seq provides unprecedented opportunity to interrogate plausible mRNA splicing patterns by mapping RNA-seq reads to exon junctions (thereafter junction reads. In most previous studies, exon junctions were detected by using the quantitative information of junction reads. The quantitative criterion (e.g. minimum of two junction reads, although is straightforward and widely used, usually results in high false positive and false negative rates, owning to the complexity of transcriptome. Here, we introduced a new metric, namely Minimal Match on Either Side of exon junction (MMES, to measure the quality of each junction read, and subsequently implemented an empirical statistical model to detect exon junctions. When applied to a large dataset (>200M reads consisting of mouse brain, liver and muscle mRNA sequences, and using independent transcripts databases as positive control, our method was proved to be considerably more accurate than previous ones, especially for detecting junctions originated from low-abundance transcripts. Our results were also confirmed by real time RT-PCR assay. The MMES metric can be used either in this empirical statistical model or in other more sophisticated classifiers, such as logistic regression.

  10. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

    Directory of Open Access Journals (Sweden)

    Yuttachon Promworn

    Full Text Available Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance.We present Transformation of Nucleotide Enrichment Ratios (ToNER, a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR.ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a

  11. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

    Science.gov (United States)

    Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima

    2017-01-01

    Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and Git

  12. Integrated RNA-Seq and sRNA-Seq Analysis Identifies Chilling and Freezing Responsive Key Molecular Players and Pathways in Tea Plant (Camellia sinensis)

    Science.gov (United States)

    Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang

    2015-01-01

    Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants’ growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., ‘Photosynthesis’), GO terms (e.g., ‘response to karrikin’) and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology. PMID:25901577

  13. Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.

    Science.gov (United States)

    Liu, Ruolin; Dickerson, Julie

    2017-11-01

    We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.

  14. PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.

    Science.gov (United States)

    Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai

    2012-02-15

    RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.

  15. A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

    Directory of Open Access Journals (Sweden)

    Mickael Orgeur

    2018-01-01

    Full Text Available The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.

  16. Transcription-factor occupancy at HOT regions quantitatively predicts RNA polymerase recruitment in five human cell lines.

    KAUST Repository

    Foley, Joseph W; Sidow, Arend

    2013-01-01

    BACKGROUND: High-occupancy target (HOT) regions are compact genome loci occupied by many different transcription factors (TFs). HOT regions were initially defined in invertebrate model organisms, and we here show that they are a ubiquitous feature of the human gene-regulation landscape. RESULTS: We identified HOT regions by a comprehensive analysis of ChIP-seq data from 96 DNA-associated proteins in 5 human cell lines. Most HOT regions co-localize with RNA polymerase II binding sites, but many are not near the promoters of annotated genes. At HOT promoters, TF occupancy is strongly predictive of transcription preinitiation complex recruitment and moderately predictive of initiating Pol II recruitment, but only weakly predictive of elongating Pol II and RNA transcript abundance. TF occupancy varies quantitatively within human HOT regions; we used this variation to discover novel associations between TFs. The sequence motif associated with any given TF's direct DNA binding is somewhat predictive of its empirical occupancy, but a great deal of occupancy occurs at sites without the TF's motif, implying indirect recruitment by another TF whose motif is present. CONCLUSIONS: Mammalian HOT regions are regulatory hubs that integrate the signals from diverse regulatory pathways to quantitatively tune the promoter for RNA polymerase II recruitment.

  17. Transcription-factor occupancy at HOT regions quantitatively predicts RNA polymerase recruitment in five human cell lines.

    KAUST Repository

    Foley, Joseph W

    2013-10-20

    BACKGROUND: High-occupancy target (HOT) regions are compact genome loci occupied by many different transcription factors (TFs). HOT regions were initially defined in invertebrate model organisms, and we here show that they are a ubiquitous feature of the human gene-regulation landscape. RESULTS: We identified HOT regions by a comprehensive analysis of ChIP-seq data from 96 DNA-associated proteins in 5 human cell lines. Most HOT regions co-localize with RNA polymerase II binding sites, but many are not near the promoters of annotated genes. At HOT promoters, TF occupancy is strongly predictive of transcription preinitiation complex recruitment and moderately predictive of initiating Pol II recruitment, but only weakly predictive of elongating Pol II and RNA transcript abundance. TF occupancy varies quantitatively within human HOT regions; we used this variation to discover novel associations between TFs. The sequence motif associated with any given TF\\'s direct DNA binding is somewhat predictive of its empirical occupancy, but a great deal of occupancy occurs at sites without the TF\\'s motif, implying indirect recruitment by another TF whose motif is present. CONCLUSIONS: Mammalian HOT regions are regulatory hubs that integrate the signals from diverse regulatory pathways to quantitatively tune the promoter for RNA polymerase II recruitment.

  18. Intergenic and repeat transcription in human, chimpanzee and macaque brains measured by RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Augix Guohua Xu

    Full Text Available Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain. In all species, only 20-28% of non-ribosomal transcripts correspond to annotated exons and 20-23% to introns. By contrast, transcripts originating within intronic and intergenic repetitive sequences constitute 40-48% of the total brain transcriptome. Notably, some repeat families show elevated transcription. In non-repetitive intergenic regions, we identify and characterize 1,093 distinct regions highly expressed in the human brain. These regions are conserved at the RNA expression level across primates studied and at the DNA sequence level across mammals. A large proportion of these transcripts (20% represents 3'UTR extensions of known genes and may play roles in alternative microRNA-directed regulation. Finally, we show that while transcriptome divergence between species increases with evolutionary time, intergenic transcripts show more expression differences among species and exons show less. Our results show that many yet uncharacterized evolutionary conserved transcripts exist in the human brain. Some of these transcripts may play roles in transcriptional regulation and contribute to evolution of human-specific phenotypic traits.

  19. Simultaneous RNA-seq based transcriptional profiling of intracellular Brucella abortus and B. abortus-infected murine macrophages.

    Science.gov (United States)

    Hop, Huynh Tan; Arayan, Lauren Togonon; Reyes, Alisha Wehdnesday Bernardo; Huy, Tran Xuan Ngoc; Min, WonGi; Lee, Hu Jang; Son, Jee Soo; Kim, Suk

    2017-12-01

    Brucella is a zoonotic pathogen that survives within macrophages; however the replicative mechanisms involved are not fully understood. We describe the isolation of sufficient Brucella abortus RNA from primary host cell environment using modified reported methods for RNA-seq analysis, and simultaneously characterize the transcriptional profiles of intracellular B. abortus and bone marrow-derived macrophages (BMM) from BALB/c mice at 24 h (replicative phase) post-infection. Our results revealed that 25.12% (801/3190) and 16.16% (515/3190) of the total B. abortus genes were up-regulated and down-regulated at >2-fold, respectively as compared to the free-living B. abortus. Among >5-fold differentially expressed genes, the up-regulated genes are mostly involved in DNA, RNA manipulations as well as protein biosynthesis and secretion while the down-regulated genes are mainly involved in energy production and metabolism. On the other hand, the host responses during B. abortus infection revealed that 14.01% (6071/43,346) of BMM genes were reproducibly transcribed at >5-fold during infection. Transcription of cytokines, chemokines and transcriptional factors, such as tumor necrosis factor (Tnf), interleukin-1α (Il1α), interleukin-1β (Il1β), interleukin-6 (Il6), interleukin-12 (Il12), chemokine C-X-C motif (CXCL) family, nuclear factor kappa B (Nf-κb), signal transducer and activator of transcription 1 (Stat1), that may contribute to host defense were markedly induced while transcription of various genes involved in cell proliferation and metabolism were suppressed upon B. abortus infection. In conclusion, these data suggest that Brucella modulates gene expression in hostile intracellular environment while simultaneously alters the host pathways that may lead to the pathogen's intracellular survival and infection. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea.

    Directory of Open Access Journals (Sweden)

    Hajime Muraguchi

    Full Text Available The basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC. To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.

  1. [Identifying transcription factors involved in Arabidopsis adventious shoot regeneration by RNA-Seq technology].

    Science.gov (United States)

    Wang, Xingchun; Chen, Zhao; Fan, Juan; He, Miaomiao; Han, Yuanhuai; Yang, Zhirong

    2015-04-01

    Transcriptional regulation is one of the major regulations in plant adventious shoot regeneration, but the exact mechanism remains unclear. In our study, the RNA-seq technology based on the IlluminaHiSeq 2000 sequencing platform was used to identify differentially expressed transcription factor (TF) encoding genes during callus formation stage and adventious shoot regeneration stage between wild type and adventious shoot formation defective mutant be1-3 and during the transition from dedifferentiation to redifferentiation stage in wildtype WS. Results show that 155 TFs were differentially expressed between be1-3 mutant and wild type during callus formation, of which 97 genes were up-regulated, and 58 genes were down-regulated; and that 68 genes were differentially expressed during redifferentiation stage, with 40 genes up-regulated and 28 genes down-regulated; whereas at the transition stage from dedifferentiation to redifferention in WS wild type explants, a total of 231 differentially expressed TF genes were identified, including 160 up-regualted genes and 71 down-regulated genes. Among these TF genes, the adventious shoot related transcription factor 1 (ART1) gene encoding a MYB-related (v-myb avian myeloblastosis viral oncogene homolog) TF, was up-regulated 3 217 folds, and was the highest up-regulated gene during be1-3 callus formation. Over expression of the ART1 gene caused defects in callus formation and shoot regeneration and inhibited seedling growth, indicating that the ART1 gene is a negative regulator of callus formation and shoot regeneration. This work not only enriches our knowledge about the transcriptional regulation mechanism of adventious shoot regeneration, but also provides valuable information on candidate TF genes associated with adventious shoot regeneration for future research.

  2. Strand-specific RNA-seq reveals widespread occurrence of novel cis-natural antisense transcripts in rice

    Directory of Open Access Journals (Sweden)

    Lu Tingting

    2012-12-01

    Full Text Available Abstract Background Cis-natural antisense transcripts (cis-NATs are RNAs transcribed from the antisense strand of a gene locus, and are complementary to the RNA transcribed from the sense strand. Common techniques including microarray approach and analysis of transcriptome databases are the major ways to globally identify cis-NATs in various eukaryotic organisms. Genome-wide in silico analysis has identified a large number of cis-NATs that may generate endogenous short interfering RNAs (nat-siRNAs, which participate in important biogenesis mechanisms for transcriptional and post-transcriptional regulation in rice. However, the transcriptomes are yet to be deeply sequenced to comprehensively investigate cis-NATs. Results We applied high-throughput strand-specific complementary DNA sequencing technology (ssRNA-seq to deeply sequence mRNA for assessing sense and antisense transcripts that were derived under salt, drought and cold stresses, and normal conditions, in the model plant rice (Oryza sativa. Combined with RAP-DB genome annotation (the Rice Annotation Project Database build-5 data set, 76,013 transcripts corresponding to 45,844 unique gene loci were assembled, in which 4873 gene loci were newly identified. Of 3819 putative rice cis-NATs, 2292 were detected as expressed and giving rise to small RNAs from their overlapping regions through integrated analysis of ssRNA-seq data and small RNA data. Among them, 503 cis-NATs seemed to be associated with specific conditions. The deep sequence data from isolated epidermal cells of rice seedlings further showed that 54.0% of cis-NATs were expressed simultaneously in a population of homogenous cells. Nearly 9.7% of rice transcripts were involved in one-to-one or many-to-many cis-NATs formation. Furthermore, only 17.4-34.7% of 223 many-to-many cis-NAT groups were all expressed and generated nat-siRNAs, indicating that only some cis-NAT groups may be involved in complex regulatory networks. Conclusions

  3. The integrated analysis of RNA-seq and microRNA-seq depicts miRNA-mRNA networks involved in Japanese flounder (Paralichthys olivaceus) albinism.

    Science.gov (United States)

    Wang, Na; Wang, Ruoqing; Wang, Renkai; Tian, Yongsheng; Shao, Changwei; Jia, Xiaodong; Chen, Songlin

    2017-01-01

    Albinism, a phenomenon characterized by pigmentation deficiency on the ocular side of Japanese flounder (Paralichthys olivaceus), has caused significant damage. Limited mRNA and microRNA (miRNA) information is available on fish pigmentation deficiency. In this study, a high-throughput sequencing strategy was employed to identify the mRNA and miRNAs involved in P. olivaceus albinism. Based on P. olivaceus genome, RNA-seq identified 21,787 know genes and 711 new genes by transcripts assembly. Of those, 235 genes exhibited significantly different expression pattern (fold change ≥2 or ≤0.5 and q-value≤0.05), including 194 down-regulated genes and 41 up-regulated genes in albino versus normally pigmented individuals. These genes were enriched to 81 GO terms and 9 KEGG pathways (p≤0.05). Among those, the pigmentation related pathways-Melanogenesis and tyrosine metabolism were contained. High-throughput miRNA sequencing identified a total of 475 miRNAs, including 64 novel miRNAs. Furthermore, 33 differentially expressed miRNAs containing 13 up-regulated and 20 down-regulated miRNAs were identified in albino versus normally pigmented individuals (fold change ≥1.5 or ≤0.67 and p≤0.05). The next target prediction discovered a variety of putative target genes, of which, 134 genes including Tyrosinase (TYR), Tyrosinase-related protein 1 (TYRP1), Microphthalmia-associated transcription factor (MITF) were overlapped with differentially expressed genes derived from RNA-seq. These target genes were significantly enriched to 254 GO terms and 103 KEGG pathways (p<0.001). Of those, tyrosine metabolism, lysosomes, phototransduction pathways, etc., attracted considerable attention due to their involvement in regulating skin pigmentation. Expression patterns of differentially expressed mRNA and miRNAs were validated in 10 mRNA and 10 miRNAs by qRT-PCR. With high-throughput mRNA and miRNA sequencing and analysis, a series of interested mRNA and miRNAs involved in fish

  4. ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data.

    Directory of Open Access Journals (Sweden)

    Brett A McKinney

    Full Text Available Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k for each gene to optimize the Relief-F test statistics (importance scores for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to

  5. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Zhipeng Su

    Full Text Available Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs, UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures. The transcriptional units

  6. NSR-seq transcriptional profiling enables identification of a gene signature of Plasmodium falciparum parasites infecting children.

    Science.gov (United States)

    Vignali, Marissa; Armour, Christopher D; Chen, Jingyang; Morrison, Robert; Castle, John C; Biery, Matthew C; Bouzek, Heather; Moon, Wonjong; Babak, Tomas; Fried, Michal; Raymond, Christopher K; Duffy, Patrick E

    2011-03-01

    Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases.

  7. NSR-seq transcriptional profiling enables identification of a gene signature of Plasmodium falciparum parasites infecting children

    Science.gov (United States)

    Vignali, Marissa; Armour, Christopher D.; Chen, Jingyang; Morrison, Robert; Castle, John C.; Biery, Matthew C.; Bouzek, Heather; Moon, Wonjong; Babak, Tomas; Fried, Michal; Raymond, Christopher K.; Duffy, Patrick E.

    2011-01-01

    Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases. PMID:21317536

  8. Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq.

    Science.gov (United States)

    Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao

    2017-12-07

    Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. iMir: an integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq.

    Science.gov (United States)

    Giurato, Giorgio; De Filippo, Maria Rosaria; Rinaldi, Antonio; Hashim, Adnan; Nassa, Giovanni; Ravo, Maria; Rizzo, Francesca; Tarallo, Roberta; Weisz, Alessandro

    2013-12-13

    Qualitative and quantitative analysis of small non-coding RNAs by next generation sequencing (smallRNA-Seq) represents a novel technology increasingly used to investigate with high sensitivity and specificity RNA population comprising microRNAs and other regulatory small transcripts. Analysis of smallRNA-Seq data to gather biologically relevant information, i.e. detection and differential expression analysis of known and novel non-coding RNAs, target prediction, etc., requires implementation of multiple statistical and bioinformatics tools from different sources, each focusing on a specific step of the analysis pipeline. As a consequence, the analytical workflow is slowed down by the need for continuous interventions by the operator, a critical factor when large numbers of datasets need to be analyzed at once. We designed a novel modular pipeline (iMir) for comprehensive analysis of smallRNA-Seq data, comprising specific tools for adapter trimming, quality filtering, differential expression analysis, biological target prediction and other useful options by integrating multiple open source modules and resources in an automated workflow. As statistics is crucial in deep-sequencing data analysis, we devised and integrated in iMir tools based on different statistical approaches to allow the operator to analyze data rigorously. The pipeline created here proved to be efficient and time-saving than currently available methods and, in addition, flexible enough to allow the user to select the preferred combination of analytical steps. We present here the results obtained by applying this pipeline to analyze simultaneously 6 smallRNA-Seq datasets from either exponentially growing or growth-arrested human breast cancer MCF-7 cells, that led to the rapid and accurate identification, quantitation and differential expression analysis of ~450 miRNAs, including several novel miRNAs and isomiRs, as well as identification of the putative mRNA targets of differentially expressed mi

  10. Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis

    LENUS (Irish Health Repository)

    Guida, Alessandro

    2011-12-22

    Abstract Background Candida parapsilosis is one of the most common causes of Candida infection worldwide. However, the genome sequence annotation was made without experimental validation and little is known about the transcriptional landscape. The transcriptional response of C. parapsilosis to hypoxic (low oxygen) conditions, such as those encountered in the host, is also relatively unexplored. Results We used next generation sequencing (RNA-seq) to determine the transcriptional profile of C. parapsilosis growing in several conditions including different media, temperatures and oxygen concentrations. We identified 395 novel protein-coding sequences that had not previously been annotated. We removed > 300 unsupported gene models, and corrected approximately 900. We mapped the 5\\' and 3\\' UTR for thousands of genes. We also identified 422 introns, including two introns in the 3\\' UTR of one gene. This is the first report of 3\\' UTR introns in the Saccharomycotina. Comparing the introns in coding sequences with other species shows that small numbers have been gained and lost throughout evolution. Our analysis also identified a number of novel transcriptional active regions (nTARs). We used both RNA-seq and microarray analysis to determine the transcriptional profile of cells grown in normoxic and hypoxic conditions in rich media, and we showed that there was a high correlation between the approaches. We also generated a knockout of the UPC2 transcriptional regulator, and we found that similar to C. albicans, Upc2 is required for conferring resistance to azole drugs, and for regulation of expression of the ergosterol pathway in hypoxia. Conclusion We provide the first detailed annotation of the C. parapsilosis genome, based on gene predictions and transcriptional analysis. We identified a number of novel ORFs and other transcribed regions, and detected transcripts from approximately 90% of the annotated protein coding genes. We found that the transcription factor

  11. Mapping RNA-seq Reads with STAR.

    Science.gov (United States)

    Dobin, Alexander; Gingeras, Thomas R

    2015-09-03

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.

  12. Thiol-linked alkylation of RNA to assess expression dynamics.

    Science.gov (United States)

    Herzog, Veronika A; Reichholf, Brian; Neumann, Tobias; Rescheneder, Philipp; Bhat, Pooja; Burkard, Thomas R; Wlotzka, Wiebke; von Haeseler, Arndt; Zuber, Johannes; Ameres, Stefan L

    2017-12-01

    Gene expression profiling by high-throughput sequencing reveals qualitative and quantitative changes in RNA species at steady state but obscures the intracellular dynamics of RNA transcription, processing and decay. We developed thiol(SH)-linked alkylation for the metabolic sequencing of RNA (SLAM seq), an orthogonal-chemistry-based RNA sequencing technology that detects 4-thiouridine (s 4 U) incorporation in RNA species at single-nucleotide resolution. In combination with well-established metabolic RNA labeling protocols and coupled to standard, low-input, high-throughput RNA sequencing methods, SLAM seq enabled rapid access to RNA-polymerase-II-dependent gene expression dynamics in the context of total RNA. We validated the method in mouse embryonic stem cells by showing that the RNA-polymerase-II-dependent transcriptional output scaled with Oct4/Sox2/Nanog-defined enhancer activity, and we provide quantitative and mechanistic evidence for transcript-specific RNA turnover mediated by post-transcriptional gene regulatory pathways initiated by microRNAs and N 6 -methyladenosine. SLAM seq facilitates the dissection of fundamental mechanisms that control gene expression in an accessible, cost-effective and scalable manner.

  13. Elucidating MicroRNA Regulatory Networks Using Transcriptional, Post-transcriptional, and Histone Modification Measurements

    Directory of Open Access Journals (Sweden)

    Sara J.C. Gosline

    2016-01-01

    Full Text Available MicroRNAs (miRNAs regulate diverse biological processes by repressing mRNAs, but their modest effects on direct targets, together with their participation in larger regulatory networks, make it challenging to delineate miRNA-mediated effects. Here, we describe an approach to characterizing miRNA-regulatory networks by systematically profiling transcriptional, post-transcriptional and epigenetic activity in a pair of isogenic murine fibroblast cell lines with and without Dicer expression. By RNA sequencing (RNA-seq and CLIP (crosslinking followed by immunoprecipitation sequencing (CLIP-seq, we found that most of the changes induced by global miRNA loss occur at the level of transcription. We then introduced a network modeling approach that integrated these data with epigenetic data to identify specific miRNA-regulated transcription factors that explain the impact of miRNA perturbation on gene expression. In total, we demonstrate that combining multiple genome-wide datasets spanning diverse regulatory modes enables accurate delineation of the downstream miRNA-regulated transcriptional network and establishes a model for studying similar networks in other systems.

  14. Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.

    Science.gov (United States)

    Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D

    2015-07-30

    RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove

  15. RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.

    Science.gov (United States)

    Merrick, B Alex; Phadke, Dhiral P; Auerbach, Scott S; Mav, Deepak; Stiegelmeyer, Suzy M; Shah, Ruchir R; Tice, Raymond R

    2013-01-01

    Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the

  16. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine

    Directory of Open Access Journals (Sweden)

    Joshua Xu

    2016-03-01

    Full Text Available Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454 were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq

  17. Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.

    Directory of Open Access Journals (Sweden)

    Andreas Tuerk

    2017-05-01

    Full Text Available Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix2 (rd. "mixquare", a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix2 are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix2 to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix2 overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix2 on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC Consortia. On MAQC data, Mix2 achieves improved correlation to qPCR measurements with a relative increase in R2 between 4% and 50%. Mix2 also yields repeatable concentration estimates across technical replicates with a relative increase in R2 between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix2 reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix2 yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix2, 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix2, 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R2 between 8% and 44% and reduced standard deviation.

  18. Identification of transcripts regulated by CUG-BP, Elav-like family member 1 (CELF1 in primary embryonic cardiomyocytes by RNA-seq

    Directory of Open Access Journals (Sweden)

    Yotam Blech-Hermoni

    2015-12-01

    Full Text Available CUG-BP, Elav-like family member 1 (CELF1 is a multi-functional RNA binding protein that regulates pre-mRNA alternative splicing in the nucleus, as well as polyadenylation status, mRNA stability, and translation in the cytoplasm [1]. Dysregulation of CELF1 has been implicated in cardiomyopathies in myotonic dystrophy type 1 and diabetes [2–5], but the targets of CELF1 regulation in the heart have not been systematically investigated. We previously demonstrated that in the developing heart CELF1 expression is restricted to the myocardium and peaks during embryogenesis [6–8]. To identify transcripts regulated by CELF1 in the embryonic myocardium, RNA-seq was used to compare the transcriptome of primary embryonic cardiomyocytes following siRNA-mediated knockdown of CELF1 to that of controls. Raw data files of the RNA-seq reads have been deposited in NCBI's Gene Expression Omnibus [9] under the GEO Series accession number GSE67360. These data can be used to identify transcripts whose levels or alternative processing (i.e., alternative splicing or polyadenylation site usage are regulated by CELF1, and should provide insight into the pathways and processes modulated by this important RNA binding protein during normal heart development and during cardiac pathogenesis.

  19. Distributed biotin-streptavidin transcription roadblocks for mapping cotranscriptional RNA folding.

    Science.gov (United States)

    Strobel, Eric J; Watters, Kyle E; Nedialkov, Yuri; Artsimovitch, Irina; Lucks, Julius B

    2017-07-07

    RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin-streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin-SAv roadblocks. We then show that randomly distributed biotin-SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin-SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Genome-wide RNA-seq analysis of human and mouse platelet transcriptomes

    Science.gov (United States)

    Rowley, Jesse W.; Oler, Andrew J.; Tolley, Neal D.; Hunter, Benjamin N.; Low, Elizabeth N.; Nix, David A.; Yost, Christian C.; Zimmerman, Guy A.

    2011-01-01

    Inbred mice are a useful tool for studying the in vivo functions of platelets. Nonetheless, the mRNA signature of mouse platelets is not known. Here, we use paired-end next-generation RNA sequencing (RNA-seq) to characterize the polyadenylated transcriptomes of human and mouse platelets. We report that RNA-seq provides unprecedented resolution of mRNAs that are expressed across the entire human and mouse genomes. Transcript expression and abundance are often conserved between the 2 species. Several mRNAs, however, are differentially expressed in human and mouse platelets. Moreover, previously described functional disparities between mouse and human platelets are reflected in differences at the transcript level, including protease activated receptor-1, protease activated receptor-3, platelet activating factor receptor, and factor V. This suggests that RNA-seq is a useful tool for predicting differences in platelet function between mice and humans. Our next-generation sequencing analysis provides new insights into the human and murine platelet transcriptomes. The sequencing dataset will be useful in the design of mouse models of hemostasis and a catalyst for discovery of new functions of platelets. Access to the dataset is found in the “Introduction.” PMID:21596849

  1. RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.

    Directory of Open Access Journals (Sweden)

    B Alex Merrick

    Full Text Available Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1, a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL. Paired-end reads were mapped to the rat genome (Rn4 with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005 compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c

  2. MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies

    Directory of Open Access Journals (Sweden)

    Pankaj Kumar

    2015-01-01

    Full Text Available The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments. Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA, Biosample, Bioprojects, and Gene Expression Omnibus (GEO. Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe “MetaRNA-Seq,” a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.

  3. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance

    2013-01-01

    RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

  4. RNA-SeQC: RNA-seq metrics for quality control and process optimization.

    Science.gov (United States)

    DeLuca, David S; Levin, Joshua Z; Sivachenko, Andrey; Fennell, Timothy; Nazaire, Marc-Danie; Williams, Chris; Reich, Michael; Winckler, Wendy; Getz, Gad

    2012-06-01

    RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3'/5' bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis. See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool.

  5. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

    Directory of Open Access Journals (Sweden)

    Kumar Parijat Tripathi

    Full Text Available RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool, QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery tools. It offers a report on statistical analysis of functional and Gene Ontology (GO annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA by ab initio methods helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is

  6. RNA-seq analysis of early hepatic response to handling and confinement stress in rainbow trout.

    Directory of Open Access Journals (Sweden)

    Sixin Liu

    Full Text Available Fish under intensive rearing conditions experience various stressors which have negative impacts on survival, growth, reproduction and fillet quality. Identifying and characterizing the molecular mechanisms underlying stress responses will facilitate the development of strategies that aim to improve animal welfare and aquaculture production efficiency. In this study, we used RNA-seq to identify transcripts which are differentially expressed in the rainbow trout liver in response to handling and confinement stress. These stressors were selected due to their relevance in aquaculture production. Total RNA was extracted from the livers of individual fish in five tanks having eight fish each, including three tanks of fish subjected to a 3 hour handling and confinement stress and two control tanks. Equal amount of total RNA of six individual fish was pooled by tank to create five RNA-seq libraries which were sequenced in one lane of Illumina HiSeq 2000. Three sequencing runs were conducted to obtain a total of 491,570,566 reads which were mapped onto the previously generated stress reference transcriptome to identify 316 differentially expressed transcripts (DETs. Twenty one DETs were selected for qPCR to validate the RNA-seq approach. The fold changes in gene expression identified by RNA-seq and qPCR were highly correlated (R(2 = 0.88. Several gene ontology terms including transcription factor activity and biological process such as glucose metabolic process were enriched among these DETs. Pathways involved in response to handling and confinement stress were implicated by mapping the DETs to reference pathways in the KEGG database.Raw RNA-seq reads have been submitted to the NCBI Short Read Archive under accession number SRP022881.All customized scripts described in this paper are available from Dr. Guangtu Gao or the corresponding author.

  7. RNA-seq: technical variability and sampling

    Science.gov (United States)

    2011-01-01

    Background RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases. PMID:21645359

  8. Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris.

    Directory of Open Access Journals (Sweden)

    Marie Mooney

    Full Text Available Comparative oncology is a developing research discipline that is being used to assist our understanding of human neoplastic diseases. Companion canines are a preferred animal oncology model due to spontaneous tumor development and similarity to human disease at the pathophysiological level. We use a paired RNA sequencing (RNA-Seq/microarray analysis of a set of four normal canine lymph nodes and ten canine lymphoma fine needle aspirates to identify technical biases and variation between the technologies and convergence on biological disease pathways. Surrogate Variable Analysis (SVA provides a formal multivariate analysis of the combined RNA-Seq/microarray data set. Applying SVA to the data allows us to decompose variation into contributions associated with transcript abundance, differences between the technology, and latent variation within each technology. A substantial and highly statistically significant component of the variation reflects transcript abundance, and RNA-Seq appeared more sensitive for detection of transcripts expressed at low levels. Latent random variation among RNA-Seq samples is also distinct in character from that impacting microarray samples. In particular, we observed variation between RNA-Seq samples that reflects transcript GC content. Platform-independent variable decomposition without a priori knowledge of the sources of variation using SVA represents a generalizable method for accomplishing cross-platform data analysis. We identified genes differentially expressed between normal lymph nodes of disease free dogs and a subset of the diseased dogs diagnosed with B-cell lymphoma using each technology. There is statistically significant overlap between the RNA-Seq and microarray sets of differentially expressed genes. Analysis of overlapping genes in the context of biological systems suggests elevated expression and activity of PI3K signaling in B-cell lymphoma biopsies compared with normal biopsies, consistent with

  9. Comparison of transcriptional profiles of Clostridium thermocellum grown on cellobiose and pretreated yellow poplar using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Hui eWei

    2014-04-01

    Full Text Available The anaerobic, thermophilic bacterium, Clostridium thermocellum, secretes multi-protein enzyme complexes, termed cellulosomes, which synergistically interact with the microbial cell surface and efficiently disassemble plant cell wall biomass. C. thermocellum has also been considered a potential consolidated bioprocessing (CBP organism due to its ability to produce the biofuel products, hydrogen and ethanol. We found that C. thermocellum fermentation of pretreated yellow poplar (PYP produced 30% and 39% of ethanol and hydrogen product concentrations, respectively, compared to fermentation of cellobiose. RNA-seq was used to analyze the transcriptional profiles of these cells. The PYP-grown cells taken for analysis at the late stationary phase showed 1211 genes up-regulated and 314 down-regulated by more than 2-fold compared to the cellobiose-grown cells. These affected genes cover a broad spectrum of specific functional categories. The transcriptional analysis was further validated by sub-proteomics data taken from the literature; as well as by quantitative reverse transcription-PCR (qRT-PCR analyses of selected genes. Specifically, 47 cellulosomal protein-encoding genes, genes for 4 pairs of SigI-RsgI for polysaccharide sensing, 7 cellodextrin ABC transporter genes, and a set of NAD(PH hydogenase and alcohol dehydrogenase genes were up-regulated for cells growing on PYP compared to cellobiose. These genes could be potential candidates for future studies aimed at gaining insight into the regulatory mechanism of this organism as well as for improvement of C. thermocellum in its role as a CBP organism.

  10. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

    Science.gov (United States)

    Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

    2011-03-24

    The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This

  11. Beta-Poisson model for single-cell RNA-seq data analyses.

    Science.gov (United States)

    Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Rantalainen, Mattias; Pawitan, Yudi

    2016-07-15

    Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC CONTACT: yudi.pawitan@ki.se or mattias.rantalainen@ki.se Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts.

    Science.gov (United States)

    Ryan, Michael C; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N

    2012-09-15

    SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. mryan@insilico.us.com Supplementary data are available at Bioinformatics online.

  13. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    Science.gov (United States)

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  14. Improving RNA-Seq expression estimates by correcting for fragment bias

    Science.gov (United States)

    2011-01-01

    The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973

  15. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Bruno, Vincent M.; Fang, Zhide; Meng, Xiandong; Blow, Matthew; Zhang, Tao; Sherlock, Gavin; Snyder, Michael; Wang, Zhong

    2010-11-19

    Background: Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied. Results: Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95percent) and reconstruct full-length genes for the majority of the existing gene models (54.3percent). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics. Conclusions: These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

  16. CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome.

    Science.gov (United States)

    Zhang, Zijun; Xing, Yi

    2017-09-19

    Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

    Science.gov (United States)

    Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi

    2014-12-23

    Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.

  18. Discovery of Organophosphate Resistance-Related Genes Associated With Well-known Resistance Mechanisms of Plutella xylostella (L.) (Lepidoptera: Plutellidae) by RNA-Seq.

    Science.gov (United States)

    Hsu, Ju-Chun; Lin, Yu-Yu; Chang, Chia-Che; Hua, Kuo-Hsun; Chen, Mei-Ju May; Huang, Li-Hsin; Chen, Chien-Yu

    2016-04-22

    Pesticide resistance poses many challenges for pest control, particularly for destructive pests such as diamondback moths (Plutella xylostella). Organophosphates have been used in the field since the 1950s, leading to selection for resistance-related gene variants and the development of resistance to new insecticides in the diamondback moth. Identifying actual and potential genes involved in resistance could offer solutions for control. This study established resistant diamondback moth strains from two different collections using mevinphos. Two sets of transcriptome sequencing (RNA-Seq) data were generated for pairs of mevinphos-resistant versus susceptible (wild-type) strains. One susceptible strain containing 14 giga base pairs was assembled into a reference-based assembly using published scaffold sequences as reference. Differential expression data between resistant and susceptible strains revealed 944 transcripts (803 with annotations) showing upregulation and 427 transcripts (150 with annotations) showing downregulation. Around 6.8% of the differential expression transcripts (65) could be categorized as associated with well-known resistance mechanisms such as penetration, detoxification, and behavior response; of these 65 transcripts, 38 showed upregulation, and 12 relating to penetration were upregulated when the transcripts of 19 cytochrome P450s, 2 zeta-class glutathione S-transferases, and 4 ATP-binding cassette transporters showed upregulation. In addition, 11 groups of transcripts related to olfactory perception appeared to be downregulated in trade-off situations. Quantitative polymerase chain reaction expression results were consistent with RNA-Seq data. Possible roles of these differentially expressed genes in resistance mechanisms are discussed in this study. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Transcriptome Analysis of the Thymus in Short-Term Calorie-Restricted Mice Using RNA-seq

    Directory of Open Access Journals (Sweden)

    Zehra Omeroğlu Ulu

    2018-01-01

    Full Text Available Calorie restriction (CR, which is a factor that expands lifespan and an important player in immune response, is an effective protective method against cancer development. Thymus, which plays a critical role in the development of the immune system, reacts to nutrition deficiency quickly. RNA-seq-based transcriptome sequencing was performed to thymus tissues of MMTV-TGF-α mice subjected to ad libitum (AL, chronic calorie restriction (CCR, and intermittent calorie restriction (ICR diets in this study. Three cDNA libraries were sequenced using Illumina HiSeq™ 4000 to produce 100 base pair-end reads. On average, 105 million clean reads were mapped and in total 6091 significantly differentially expressed genes (DEGs were identified (p<0.05. These DEGs were clustered into Gene Ontology (GO categories. The expression pattern revealed by RNA-seq was validated by quantitative real-time PCR (qPCR analysis of four important genes, which are leptin, ghrelin, Igf1, and adinopectin. RNA-seq data has been deposited in NCBI Gene Expression Omnibus (GEO database (GSE95371. We report the use of RNA sequencing to find DEGs that are affected by different feeding regimes in the thymus.

  20. Transcriptome Analysis of the Thymus in Short-Term Calorie-Restricted Mice Using RNA-seq

    Science.gov (United States)

    Omeroğlu Ulu, Zehra; Ulu, Salih; Dogan, Soner; Guvenc Tuna, Bilge

    2018-01-01

    Calorie restriction (CR), which is a factor that expands lifespan and an important player in immune response, is an effective protective method against cancer development. Thymus, which plays a critical role in the development of the immune system, reacts to nutrition deficiency quickly. RNA-seq-based transcriptome sequencing was performed to thymus tissues of MMTV-TGF-α mice subjected to ad libitum (AL), chronic calorie restriction (CCR), and intermittent calorie restriction (ICR) diets in this study. Three cDNA libraries were sequenced using Illumina HiSeq™ 4000 to produce 100 base pair-end reads. On average, 105 million clean reads were mapped and in total 6091 significantly differentially expressed genes (DEGs) were identified (p < 0.05). These DEGs were clustered into Gene Ontology (GO) categories. The expression pattern revealed by RNA-seq was validated by quantitative real-time PCR (qPCR) analysis of four important genes, which are leptin, ghrelin, Igf1, and adinopectin. RNA-seq data has been deposited in NCBI Gene Expression Omnibus (GEO) database (GSE95371). We report the use of RNA sequencing to find DEGs that are affected by different feeding regimes in the thymus. PMID:29511668

  1. SEASTAR: systematic evaluation of alternative transcription start sites in RNA.

    Science.gov (United States)

    Qin, Zhiyi; Stoilov, Peter; Zhang, Xuegong; Xing, Yi

    2018-05-04

    Alternative first exons diversify the transcriptomes of eukaryotes by producing variants of the 5' Untranslated Regions (5'UTRs) and N-terminal coding sequences. Accurate transcriptome-wide detection of alternative first exons typically requires specialized experimental approaches that are designed to identify the 5' ends of transcripts. We developed a computational pipeline SEASTAR that identifies first exons from RNA-seq data alone then quantifies and compares alternative first exon usage across multiple biological conditions. The exons inferred by SEASTAR coincide with transcription start sites identified directly by CAGE experiments and bear epigenetic hallmarks of active promoters. To determine if differential usage of alternative first exons can yield insights into the mechanism controlling gene expression, we applied SEASTAR to an RNA-seq dataset that tracked the reprogramming of mouse fibroblasts into induced pluripotent stem cells. We observed dynamic temporal changes in the usage of alternative first exons, along with correlated changes in transcription factor expression. Using a combined sequence motif and gene set enrichment analysis we identified N-Myc as a regulator of alternative first exon usage in the pluripotent state. Our results demonstrate that SEASTAR can leverage the available RNA-seq data to gain insights into the control of gene expression and alternative transcript variation in eukaryotic transcriptomes.

  2. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Konstantin Okonechnikov

    Full Text Available Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion.

  3. Comparison of transcriptomic landscapes of bovine embryos using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Khatib Hasan

    2010-12-01

    Full Text Available Abstract Background Advances in sequencing technologies have opened a new era of high throughput investigations. Although RNA-seq has been demonstrated in many organisms, no study has provided a comprehensive investigation of the bovine transcriptome using RNA-seq. Results In this study, we provide a deep survey of the bovine embryonic transcriptomes, the first application of RNA-seq in cattle. Embryos cultured in vitro were used as models to study early embryonic development in cattle. RNA amplified from limited amounts of starting total RNA were sequenced and mapped to the reference genome to obtain digital gene expression at single base resolution. In particular, gene expression estimates from more than 1.6 million unannotated bases in 1785 novel transcribed units were obtained. We compared the transcriptomes of embryos showing distinct developmental statuses and found genes that showed differential overall expression as well as alternative splicing. Conclusion Our study demonstrates the power of RNA-seq and provides further understanding of bovine preimplantation embryonic development at a fine scale.

  4. Towards the integration, annotation and association of historical microarray experiments with RNA-seq.

    Science.gov (United States)

    Chavan, Shweta S; Bauer, Michael A; Peterson, Erich A; Heuck, Christoph J; Johann, Donald J

    2013-01-01

    Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches. Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets. Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline. A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.

  5. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

    Science.gov (United States)

    Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

    2016-03-01

    Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  7. Using RNA-Seq Data to Evaluate Reference Genes Suitable for Gene Expression Studies in Soybean.

    Directory of Open Access Journals (Sweden)

    Aldrin Kay-Yuen Yim

    Full Text Available Differential gene expression profiles often provide important clues for gene functions. While reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR is an important tool, the validity of the results depends heavily on the choice of proper reference genes. In this study, we employed new and published RNA-sequencing (RNA-Seq datasets (26 sequencing libraries in total to evaluate reference genes reported in previous soybean studies. In silico PCR showed that 13 out of 37 previously reported primer sets have multiple targets, and 4 of them have amplicons with different sizes. Using a probabilistic approach, we identified new and improved candidate reference genes. We further performed 2 validation tests (with 26 RNA samples on 8 commonly used reference genes and 7 newly identified candidates, using RT-qPCR. In general, the new candidate reference genes exhibited more stable expression levels under the tested experimental conditions. The three newly identified candidate reference genes Bic-C2, F-box protein2, and VPS-like gave the best overall performance, together with the commonly used ELF1b. It is expected that the proposed probabilistic model could serve as an important tool to identify stable reference genes when more soybean RNA-Seq data from different growth stages and treatments are used.

  8. RNA-Seq Data: A Complexity Journey

    Directory of Open Access Journals (Sweden)

    Enrico Capobianco

    2014-09-01

    Full Text Available A paragraph from the highlights of “Transcriptomics: Throwing light on dark matter” by L. Flintoft (Nature Reviews Genetics 11, 455, 2010, says: “Reports over the past few years of extensive transcription throughout eukaryotic genomes have led to considerable excitement. However, doubts have been raised about the methods that have detected this pervasive transcription and about how much of it is functional.” Since the appearance of the ENCODE project and due to follow-up work, a shift from the pervasive transcription observed from RNA-Seq data to its functional validation is gradually occurring. However, much less attention has been turned to the problem of deciphering the complexity of transcriptome data, which determines uncertainty with regard to identification, quantification and differential expression of genes and non-coding RNAs. The aim of this mini-review is to emphasize transcriptome-related problems of direct and inverse nature for which novel inference approaches are needed.

  9. Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data

    Directory of Open Access Journals (Sweden)

    Duan Jialei

    2012-08-01

    Full Text Available Abstract Background Rapid advances in next-generation sequencing methods have provided new opportunities for transcriptome sequencing (RNA-Seq. The unprecedented sequencing depth provided by RNA-Seq makes it a powerful and cost-efficient method for transcriptome study, and it has been widely used in model organisms and non-model organisms to identify and quantify RNA. For non-model organisms lacking well-defined genomes, de novo assembly is typically required for downstream RNA-Seq analyses, including SNP discovery and identification of genes differentially expressed by phenotypes. Although RNA-Seq has been successfully used to sequence many non-model organisms, the results of de novo assembly from short reads can still be improved by using recent bioinformatic developments. Results In this study, we used 212.6 million pair-end reads, which accounted for 16.2 Gb, to assemble the hexaploid wheat transcriptome. Two state-of-the-art assemblers, Trinity and Trans-ABySS, which use the single and multiple k-mer methods, respectively, were used, and the whole de novo assembly process was divided into the following four steps: pre-assembly, merging different samples, removal of redundancy and scaffolding. We documented every detail of these steps and how these steps influenced assembly performance to gain insight into transcriptome assembly from short reads. After optimization, the assembled transcripts were comparable to Sanger-derived ESTs in terms of both continuity and accuracy. We also provided considerable new wheat transcript data to the community. Conclusions It is feasible to assemble the hexaploid wheat transcriptome from short reads. Special attention should be paid to dealing with multiple samples to balance the spectrum of expression levels and redundancy. To obtain an accurate overview of RNA profiling, removal of redundancy may be crucial in de novo assembly.

  10. ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data.

    Science.gov (United States)

    Gardeux, Vincent; David, Fabrice P A; Shajkofci, Adrian; Schwalie, Petra C; Deplancke, Bart

    2017-10-01

    Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. bart.deplancke@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  11. Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species

    Directory of Open Access Journals (Sweden)

    Hornett Emily A

    2012-08-01

    Full Text Available Abstract Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.

  12. NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

    Science.gov (United States)

    Dong, Kai; Zhao, Hongyu; Tong, Tiejun; Wan, Xiang

    2016-09-13

    RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data

  13. The bench scientist's guide to statistical analysis of RNA-Seq data

    OpenAIRE

    Yendrek, Craig R.; Ainsworth, Elizabeth A.; Thimmapuram, Jyothi

    2012-01-01

    Abstract Background RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatics specialists. Here we provide a step-by-step guide and outline a strategy using currently available statistical tools that results in a conservative list of differentially expressed genes. We also discuss potential sources of err...

  14. RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome

    Directory of Open Access Journals (Sweden)

    Severin Andrew J

    2010-08-01

    Full Text Available Abstract Background Next generation sequencing is transforming our understanding of transcriptomes. It can determine the expression level of transcripts with a dynamic range of over six orders of magnitude from multiple tissues, developmental stages or conditions. Patterns of gene expression provide insight into functions of genes with unknown annotation. Results The RNA Seq-Atlas presented here provides a record of high-resolution gene expression in a set of fourteen diverse tissues. Hierarchical clustering of transcriptional profiles for these tissues suggests three clades with similar profiles: aerial, underground and seed tissues. We also investigate the relationship between gene structure and gene expression and find a correlation between gene length and expression. Additionally, we find dramatic tissue-specific gene expression of both the most highly-expressed genes and the genes specific to legumes in seed development and nodule tissues. Analysis of the gene expression profiles of over 2,000 genes with preferential gene expression in seed suggests there are more than 177 genes with functional roles that are involved in the economically important seed filling process. Finally, the Seq-atlas also provides a means of evaluating existing gene model annotations for the Glycine max genome. Conclusions This RNA-Seq atlas extends the analyses of previous gene expression atlases performed using Affymetrix GeneChip technology and provides an example of new methods to accommodate the increase in transcriptome data obtained from next generation sequencing. Data contained within this RNA-Seq atlas of Glycine max can be explored at http://www.soybase.org/soyseq.

  15. RNA-seq based identification and mutant validation of gene targets related to ethanol resistance in cyanobacterial Synechocystis sp. PCC 6803

    Directory of Open Access Journals (Sweden)

    Wang Jiangxin

    2012-12-01

    Full Text Available Abstract Background Fermentation production of biofuel ethanol consumes agricultural crops, which will compete directly with the food supply. As an alternative, photosynthetic cyanobacteria have been proposed as microbial factories to produce ethanol directly from solar energy and CO2. However, the ethanol productivity from photoautotrophic cyanobacteria is still very low, mostly due to the low tolerance of cyanobacterial systems to ethanol stress. Results To build a foundation necessary to engineer robust ethanol-producing cyanobacterial hosts, in this study we applied a quantitative transcriptomics approach with a next-generation sequencing technology, combined with quantitative reverse-transcript PCR (RT-PCR analysis, to reveal the global metabolic responses to ethanol in model cyanobacterial Synechocystis sp. PCC 6803. The results showed that ethanol exposure induced genes involved in common stress responses, transporting and cell envelope modification. In addition, the cells can also utilize enhanced polyhydroxyalkanoates (PHA accumulation and glyoxalase detoxication pathway as means against ethanol stress. The up-regulation of photosynthesis by ethanol was also further confirmed at transcriptional level. Finally, we used gene knockout strains to validate the potential target genes related to ethanol tolerance. Conclusion RNA-Seq based global transcriptomic analysis provided a comprehensive view of cellular response to ethanol exposure. The analysis provided a list of gene targets for engineering ethanol tolerance in cyanobacterium Synechocystis.

  16. RNA-Seq-based toxicogenomic assessment of fresh frozen and formalin-fixed tissues yields similar mechanistic insights.

    Science.gov (United States)

    Auerbach, Scott S; Phadke, Dhiral P; Mav, Deepak; Holmgren, Stephanie; Gao, Yuan; Xie, Bin; Shin, Joo Heon; Shah, Ruchir R; Merrick, B Alex; Tice, Raymond R

    2015-07-01

    Formalin-fixed, paraffin-embedded (FFPE) pathology specimens represent a potentially vast resource for transcriptomic-based biomarker discovery. We present here a comparison of results from a whole transcriptome RNA-Seq analysis of RNA extracted from fresh frozen and FFPE livers. The samples were derived from rats exposed to aflatoxin B1 (AFB1 ) and a corresponding set of control animals. Principal components analysis indicated that samples were separated in the two groups representing presence or absence of chemical exposure, both in fresh frozen and FFPE sample types. Sixty-five percent of the differentially expressed transcripts (AFB1 vs. controls) in fresh frozen samples were also differentially expressed in FFPE samples (overlap significance: P < 0.0001). Genomic signature and gene set analysis of AFB1 differentially expressed transcript lists indicated highly similar results between fresh frozen and FFPE at the level of chemogenomic signatures (i.e., single chemical/dose/duration elicited transcriptomic signatures), mechanistic and pathology signatures, biological processes, canonical pathways and transcription factor networks. Overall, our results suggest that similar hypotheses about the biological mechanism of toxicity would be formulated from fresh frozen and FFPE samples. These results indicate that phenotypically anchored archival specimens represent a potentially informative resource for signature-based biomarker discovery and mechanistic characterization of toxicity. Copyright © 2014 John Wiley & Sons, Ltd.

  17. Effect of chronic uremia on the transcriptional profile of the calcified aorta analyzed by RNA sequencing

    DEFF Research Database (Denmark)

    Rukov, Jakob Lewin; Gravesen, Eva; Mace, Maria L.

    2016-01-01

    The development of vascular calcification (VC) in chronic uremia (CU) is a tightly regulated process controlled by factors promoting and inhibiting mineralization. Next-generation high-throughput RNA sequencing (RNA-seq) is a powerful and sensitive tool for quantitative gene expression profiling...... with an expression level of >1 reads/kilobase transcript/million mapped reads, 2,663 genes were differentially expressed with 47% upregulated genes and 53% downregulated genes in uremic rats. Significantly deregulated genes were enriched for ontologies related to the extracellular matrix, response to wounding...

  18. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs.

    Directory of Open Access Journals (Sweden)

    Ru Huang

    Full Text Available Imprinted macro non-protein-coding (nc RNAs are cis-repressor transcripts that silence multiple genes in at least three imprinted gene clusters in the mouse genome. Similar macro or long ncRNAs are abundant in the mammalian genome. Here we present the full coding and non-coding transcriptome of two mouse tissues: differentiated ES cells and fetal head using an optimized RNA-Seq strategy. The data produced is highly reproducible in different sequencing locations and is able to detect the full length of imprinted macro ncRNAs such as Airn and Kcnq1ot1, whose length ranges between 80-118 kb. Transcripts show a more uniform read coverage when RNA is fragmented with RNA hydrolysis compared with cDNA fragmentation by shearing. Irrespective of the fragmentation method, all coding and non-coding transcripts longer than 8 kb show a gradual loss of sequencing tags towards the 3' end. Comparisons to published RNA-Seq datasets show that the strategy presented here is more efficient in detecting known functional imprinted macro ncRNAs and also indicate that standardization of RNA preparation protocols would increase the comparability of the transcriptome between different RNA-Seq datasets.

  19. A technical assessment of the porcine ejaculated spermatozoa for a sperm-specific RNA-seq analysis.

    Science.gov (United States)

    Gòdia, Marta; Mayer, Fabiana Quoos; Nafissi, Julieta; Castelló, Anna; Rodríguez-Gil, Joan Enric; Sánchez, Armand; Clop, Alex

    2018-04-26

    The study of the boar sperm transcriptome by RNA-seq can provide relevant information on sperm quality and fertility and might contribute to animal breeding strategies. However, the analysis of the spermatozoa RNA is challenging as these cells harbor very low amounts of highly fragmented RNA, and the ejaculates also contain other cell types with larger amounts of non-fragmented RNA. Here, we describe a strategy for a successful boar sperm purification, RNA extraction and RNA-seq library preparation. Using these approaches our objectives were: (i) to evaluate the sperm recovery rate (SRR) after boar spermatozoa purification by density centrifugation using the non-porcine-specific commercial reagent BoviPure TM ; (ii) to assess the correlation between SRR and sperm quality characteristics; (iii) to evaluate the relationship between sperm cell RNA load and sperm quality traits and (iv) to compare different library preparation kits for both total RNA-seq (SMARTer Universal Low Input RNA and TruSeq RNA Library Prep kit) and small RNA-seq (NEBNext Small RNA and TailorMix miRNA Sample Prep v2) for high-throughput sequencing. Our results show that pig SRR (~22%) is lower than in other mammalian species and that it is not significantly dependent of the sperm quality parameters analyzed in our study. Moreover, no relationship between the RNA yield per sperm cell and sperm phenotypes was found. We compared a RNA-seq library preparation kit optimized for low amounts of fragmented RNA with a standard kit designed for high amount and quality of input RNA and found that for sperm, a protocol designed to work on low-quality RNA is essential. We also compared two small RNA-seq kits and did not find substantial differences in their performance. We propose the methodological workflow described for the RNA-seq screening of the boar spermatozoa transcriptome. FPKM: fragments per kilobase of transcript per million mapped reads; KRT1: keratin 1; miRNA: micro-RNA; miscRNA: miscellaneous

  20. Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.

    Science.gov (United States)

    Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X

    2017-12-05

    Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.

  1. RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas.

    Science.gov (United States)

    Bao, Zhao-Shi; Chen, Hui-Min; Yang, Ming-Yu; Zhang, Chuan-Bao; Yu, Kai; Ye, Wan-Lu; Hu, Bo-Qiang; Yan, Wei; Zhang, Wei; Akers, Johnny; Ramakrishnan, Valya; Li, Jie; Carter, Bob; Liu, Yan-Wei; Hu, Hui-Min; Wang, Zheng; Li, Ming-Yang; Yao, Kun; Qiu, Xiao-Guang; Kang, Chun-Sheng; You, Yong-Ping; Fan, Xiao-Long; Song, Wei Sonya; Li, Rui-Qiang; Su, Xiao-Dong; Chen, Clark C; Jiang, Tao

    2014-11-01

    Studies of gene rearrangements and the consequent oncogenic fusion proteins have laid the foundation for targeted cancer therapy. To identify oncogenic fusions associated with glioma progression, we catalogued fusion transcripts by RNA-seq of 272 gliomas. Fusion transcripts were more frequently found in high-grade gliomas, in the classical subtype of gliomas, and in gliomas treated with radiation/temozolomide. Sixty-seven in-frame fusion transcripts were identified, including three recurrent fusion transcripts: FGFR3-TACC3, RNF213-SLC26A11, and PTPRZ1-MET (ZM). Interestingly, the ZM fusion was found only in grade III astrocytomas (1/13; 7.7%) or secondary GBMs (sGBMs, 3/20; 15.0%). In an independent cohort of sGBMs, the ZM fusion was found in three of 20 (15%) specimens. Genomic analysis revealed that the fusion arose from translocation events involving introns 3 or 8 of PTPRZ and intron 1 of MET. ZM fusion transcripts were found in GBMs irrespective of isocitrate dehydrogenase 1 (IDH1) mutation status. sGBMs harboring ZM fusion showed higher expression of genes required for PIK3CA signaling and lowered expression of genes that suppressed RB1 or TP53 function. Expression of the ZM fusion was mutually exclusive with EGFR overexpression in sGBMs. Exogenous expression of the ZM fusion in the U87MG glioblastoma line enhanced cell migration and invasion. Clinically, patients afflicted with ZM fusion harboring glioblastomas survived poorly relative to those afflicted with non-ZM-harboring sGBMs (P < 0.001). Our study profiles the shifting RNA landscape of gliomas during progression and reveled ZM as a novel, recurrent fusion transcript in sGBMs. © 2014 Bao et al.; Published by Cold Spring Harbor Laboratory Press.

  2. RNA-Seq analysis of D. radiodurans find non coding RNAs expressed in response to radiation stress

    International Nuclear Information System (INIS)

    Gadewal, Nikhil; Mukhopadhyaya, Rita

    2015-01-01

    In bacteria discovery of functional RNA molecules that are not translated into protein, noncoding RNAs, became possible with advent of Next Generation Sequencing technology. Bacterial non coding RNAs are typically 50-300 nucleotides long and work as internal signals controlling various levels of gene expression. Deep sequencing of total cellular RNA captures all coding and noncoding transcripts with their differential levels of expression in the transcriptome. It provides a powerful approach to study bacterial gene expression and mechanisms of gene regulation. We subjected the 3 h transcriptome of Deinococcus radiodurans R1 cells post exposure to 6 KGy gamma radiation to 100 x 2 cycles of deep sequencing on the Illumina HiSeq 2000 to look for ncRNA transcripts. Bioinformatics pipeline for analysis and interpretation of RNA Seq data was done in house using Softwares available in public domains. Our sequence data aligned with 21 putative ncRNAs expressed in the intergenic regions of annotated genome of D radiodurans. Verification of 2 ncRNA candidates and 3 transcription factor genes by Real Time PCR confirmed presence of these transcripts in the 3 h transcriptome sequenced by us. Any relationship between ncRNAs and control of radiation induced gene expression in D radiodurans can be proved only after specific gene knock outs in future. (author)

  3. TruSeq Stranded mRNA and Total RNA Sample Preparation Kits

    Science.gov (United States)

    Total RNA-Seq enabled by ribosomal RNA (rRNA) reduction is compatible with formalin-fixed paraffin embedded (FFPE) samples, which contain potentially critical biological information. The family of TruSeq Stranded Total RNA sample preparation kits provides a unique combination of unmatched data quality for both mRNA and whole-transcriptome analyses, robust interrogation of both standard and low-quality samples and workflows compatible with a wide range of study designs.

  4. Comparative RNA-seq analysis in the unsequenced axolotl: the oncogene burst highlights early gene expression in the blastema.

    Directory of Open Access Journals (Sweden)

    Ron Stewart

    Full Text Available The salamander has the remarkable ability to regenerate its limb after amputation. Cells at the site of amputation form a blastema and then proliferate and differentiate to regrow the limb. To better understand this process, we performed deep RNA sequencing of the blastema over a time course in the axolotl, a species whose genome has not been sequenced. Using a novel comparative approach to analyzing RNA-seq data, we characterized the transcriptional dynamics of the regenerating axolotl limb with respect to the human gene set. This approach involved de novo assembly of axolotl transcripts, RNA-seq transcript quantification without a reference genome, and transformation of abundances from axolotl contigs to human genes. We found a prominent burst in oncogene expression during the first day and blastemal/limb bud genes peaking at 7 to 14 days. In addition, we found that limb patterning genes, SALL genes, and genes involved in angiogenesis, wound healing, defense/immunity, and bone development are enriched during blastema formation and development. Finally, we identified a category of genes with no prior literature support for limb regeneration that are candidates for further evaluation based on their expression pattern during the regenerative process.

  5. RNA-Seq as an Emerging Tool for Marine Dinoflagellate Transcriptome Analysis: Process and Challenges

    Directory of Open Access Journals (Sweden)

    Muhamad Afiq Akbar

    2018-01-01

    Full Text Available Dinoflagellates are the large group of marine phytoplankton with primary studies interest regarding their symbiosis with coral reef and the abilities to form harmful algae blooms (HABs. Toxin produced by dinoflagellates during events of HABs cause severe negative impact both in the economy and health sector. However, attempts to understand the dinoflagellates genomic features are hindered by their complex genome organization. Transcriptomics have been employed to understand dinoflagellates genome structure, profile genes and gene expression. RNA-seq is one of the latest methods for transcriptomics study. This method is capable of profiling the dinoflagellates transcriptomes and has several advantages, including highly sensitive, cost effective and deeper sequence coverage. Thus, in this review paper, the current workflow of dinoflagellates RNA-seq starts with the extraction of high quality RNA and is followed by cDNA sequencing using the next-generation sequencing platform, dinoflagellates transcriptome assembly and computational analysis will be discussed. Certain consideration needs will be highlighted such as difficulty in dinoflagellates sequence annotation, post-transcriptional activity and the effect of RNA pooling when using RNA-seq.

  6. Variation in RNA-Seq transcriptome profiles of peripheral whole blood from healthy individuals with and without globin depletion.

    Directory of Open Access Journals (Sweden)

    Heesun Shin

    Full Text Available BACKGROUND: The molecular profile of circulating blood can reflect physiological and pathological events occurring in other tissues and organs of the body and delivers a comprehensive view of the status of the immune system. Blood has been useful in studying the pathobiology of many diseases. It is accessible and easily collected making it ideally suited to the development of diagnostic biomarker tests. The blood transcriptome has a high complement of globin RNA that could potentially saturate next-generation sequencing platforms, masking lower abundance transcripts. Methods to deplete globin mRNA are available, but their effect has not been comprehensively studied in peripheral whole blood RNA-Seq data. In this study we aimed to assess technical variability associated with globin depletion in addition to assessing general technical variability in RNA-Seq from whole blood derived samples. RESULTS: We compared technical and biological replicates having undergone globin depletion or not and found that the experimental globin depletion protocol employed removed approximately 80% of globin transcripts, improved the correlation of technical replicates, allowed for reliable detection of thousands of additional transcripts and generally increased transcript abundance measures. Differential expression analysis revealed thousands of genes significantly up-regulated as a result of globin depletion. In addition, globin depletion resulted in the down-regulation of genes involved in both iron and zinc metal ion bonding. CONCLUSIONS: Globin depletion appears to meaningfully improve the quality of peripheral whole blood RNA-Seq data, and may improve our ability to detect true biological variation. Some concerns remain, however. Key amongst them the significant reduction in RNA yields following globin depletion. More generally, our investigation of technical and biological variation with and without globin depletion finds that high-throughput sequencing by RNA-Seq

  7. Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.

    Science.gov (United States)

    Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh

    2018-01-01

    Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.

  8. The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.

    Science.gov (United States)

    Petryszak, Robert; Fonseca, Nuno A; Füllgrabe, Anja; Huerta, Laura; Keays, Maria; Tang, Y Amy; Brazma, Alvis

    2017-07-15

    The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  9. A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

    Directory of Open Access Journals (Sweden)

    Timothy T Perkins

    2009-07-01

    Full Text Available High-density, strand-specific cDNA sequencing (ssRNA-seq was used to analyze the transcriptome of Salmonella enterica serovar Typhi (S. Typhi. By mapping sequence data to the entire S. Typhi genome, we analyzed the transcriptome in a strand-specific manner and further defined transcribed regions encoded within prophages, pseudogenes, previously un-annotated, and 3'- or 5'-untranslated regions (UTR. An additional 40 novel candidate non-coding RNAs were identified beyond those previously annotated. Proteomic analysis was combined with transcriptome data to confirm and refine the annotation of a number of hpothetical genes. ssRNA-seq was also combined with microarray and proteome analysis to further define the S. Typhi OmpR regulon and identify novel OmpR regulated transcripts. Thus, ssRNA-seq provides a novel and powerful approach to the characterization of the bacterial transcriptome.

  10. RNA-Seq reveals spliceosome and proteasome genes as most consistent transcripts in human cancer cells.

    Directory of Open Access Journals (Sweden)

    Tara Macrae

    Full Text Available Accurate quantification of gene expression by qRT-PCR relies on normalization against a consistently expressed control gene. However, control genes in common use often vary greatly between samples, especially in cancer. The advent of Next Generation Sequencing technology offers the possibility to better select control genes with the least cell to cell variability in steady state transcript levels. Here we analyze the transcriptomes of 55 leukemia samples to identify the most consistent genes. This list is enriched for components of the proteasome (ex. PSMA1 and spliceosome (ex. SF3B2, and also includes the translation initiation factor EIF4H, and many heterogeneous nuclear ribonucleoprotein genes (ex. HNRNPL. We have validated the consistency of our new control genes in 1933 cancer and normal tissues using publically available RNA-seq data, and their usefulness in qRT-PCR analysis is clearly demonstrated.

  11. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

    Science.gov (United States)

    Song, Li; Florea, Liliana

    2015-01-01

    Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

  12. Resolving candidate genes of mouse skeletal muscle QTL via RNA-Seq and expression network analyses

    Directory of Open Access Journals (Sweden)

    Lionikas Arimantas

    2012-11-01

    Full Text Available Abstract Background We have recently identified a number of Quantitative Trait Loci (QTL contributing to the 2-fold muscle weight difference between the LG/J and SM/J mouse strains and refined their confidence intervals. To facilitate nomination of the candidate genes responsible for these differences we examined the transcriptome of the tibialis anterior (TA muscle of each strain by RNA-Seq. Results 13,726 genes were expressed in mouse skeletal muscle. Intersection of a set of 1061 differentially expressed transcripts with a mouse muscle Bayesian Network identified a coherent set of differentially expressed genes that we term the LG/J and SM/J Regulatory Network (LSRN. The integration of the QTL, transcriptome and the network analyses identified eight key drivers of the LSRN (Kdr, Plbd1, Mgp, Fah, Prss23, 2310014F06Rik, Grtp1, Stk10 residing within five QTL regions, which were either polymorphic or differentially expressed between the two strains and are strong candidates for quantitative trait genes (QTGs underlying muscle mass. The insight gained from network analysis including the ability to make testable predictions is illustrated by annotating the LSRN with knowledge-based signatures and showing that the SM/J state of the network corresponds to a more oxidative state. We validated this prediction by NADH tetrazolium reductase staining in the TA muscle revealing higher oxidative potential of the SM/J compared to the LG/J strain (p Conclusion Thus, integration of fine resolution QTL mapping, RNA-Seq transcriptome information and mouse muscle Bayesian Network analysis provides a novel and unbiased strategy for nomination of muscle QTGs.

  13. ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments.

    Science.gov (United States)

    Picardi, Ernesto; D'Antonio, Mattia; Carrabino, Danilo; Castrignanò, Tiziana; Pesole, Graziano

    2011-05-01

    ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context. ExpEdit is freely available on the web at http://www.caspur.it/ExpEdit/.

  14. Rapid Genome-wide Recruitment of RNA Polymerase II Drives Transcription, Splicing, and Translation Events during T Cell Responses

    Directory of Open Access Journals (Sweden)

    Kathrin Davari

    2017-04-01

    Full Text Available Summary: Activation of immune cells results in rapid functional changes, but how such fast changes are accomplished remains enigmatic. By combining time courses of 4sU-seq, RNA-seq, ribosome profiling (RP, and RNA polymerase II (RNA Pol II ChIP-seq during T cell activation, we illustrate genome-wide temporal dynamics for ∼10,000 genes. This approach reveals not only immediate-early and posttranscriptionally regulated genes but also coupled changes in transcription and translation for >90% of genes. Recruitment, rather than release of paused RNA Pol II, primarily mediates transcriptional changes. This coincides with a genome-wide temporary slowdown in cotranscriptional splicing, even for polyadenylated mRNAs that are localized at the chromatin. Subsequent splicing optimization correlates with increasing Ser-2 phosphorylation of the RNA Pol II carboxy-terminal domain (CTD and activation of the positive transcription elongation factor (pTEFb. Thus, rapid de novo recruitment of RNA Pol II dictates the course of events during T cell activation, particularly transcription, splicing, and consequently translation. : Davari et al. visualize global changes in RNA Pol II binding, transcription, splicing, and translation. T cells change their functional program by rapid de novo recruitment of RNA Pol II and coupled changes in transcription and translation. This coincides with fluctuations in RNA Pol II phosphorylation and a temporary reduction in cotranscriptional splicing. Keywords: RNA Pol II, cotranscriptional splicing, T cell activation, ribosome profiling, 4sU, H3K36, Ser-5 RNA Pol II, Ser-2 RNA Pol II, immune response, immediate-early genes

  15. Examination of Csr regulatory circuitry using epistasis analysis with RNA-seq (Epi-seq) confirms that CsrD affects gene expression via CsrA, CsrB and CsrC.

    Science.gov (United States)

    Potts, Anastasia H; Leng, Yuanyuan; Babitzke, Paul; Romeo, Tony

    2018-03-29

    The Csr global regulatory system coordinates gene expression in response to metabolic status. This system utilizes the RNA binding protein CsrA to regulate gene expression by binding to transcripts of structural and regulatory genes, thus affecting their structure, stability, translation, and/or transcription elongation. CsrA activity is controlled by sRNAs, CsrB and CsrC, which sequester CsrA away from other transcripts. CsrB/C levels are partly determined by their rates of turnover, which requires CsrD to render them susceptible to RNase E cleavage. Previous epistasis analysis suggested that CsrD affects gene expression through the other Csr components, CsrB/C and CsrA. However, those conclusions were based on a limited analysis of reporters. Here, we reassessed the global behavior of the Csr circuitry using epistasis analysis with RNA seq (Epi-seq). Because CsrD effects on mRNA levels were entirely lost in the csrA mutant and largely eliminated in a csrB/C mutant under our experimental conditions, while the majority of CsrA effects persisted in the absence of csrD, the original model accounts for the global behavior of the Csr system. Our present results also reflect a more nuanced role of CsrA as terminal regulator of the Csr system than has been recognized.

  16. Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq.

    Science.gov (United States)

    Guertin, Michael J; Cullen, Amy E; Markowetz, Florian; Holding, Andrew N

    2018-04-17

    A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.

  17. Evaluation of normalization methods in mammalian microRNA-Seq data

    Science.gov (United States)

    Garmire, Lana Xia; Subramaniam, Shankar

    2012-01-01

    Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. PMID:22532701

  18. Broad distribution spectrum from Gaussian to power law appears in stochastic variations in RNA-seq data.

    Science.gov (United States)

    Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J

    2018-05-29

    Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.

  19. Properties of the reverse transcription reaction in mRNA quantification

    DEFF Research Database (Denmark)

    Ståhlberg, Anders; Håkansson, Joakim; Xian, Xiaojie

    2004-01-01

    BACKGROUND: In most measurements of gene expression, mRNA is first reverse-transcribed into cDNA. We studied the reverse transcription reaction and its consequences for quantitative measurements of gene expression. METHODS: We used SYBR green I-based quantitative real-time PCR (QPCR) to measure...... the properties of reverse transcription reaction for the beta-tubulin, glyceraldehyde-3-phosphate dehydrogenase, Glut2, CaV1D, and insulin II genes, using random hexamers, oligo(dT), and gene-specific reverse transcription primers. RESULTS: Experimental variation in reverse transcription-QPCR (RT......-QPCR) was mainly attributable to the reverse transcription step. Reverse transcription efficiency depended on priming strategy, and the dependence was different for the five genes studied. Reverse transcription yields also depended on total RNA concentration. CONCLUSIONS: RT-QPCR gene expression measurements...

  20. RNA-Seq-based analysis of the physiologic cold shock-induced changes in Moraxella catarrhalis gene expression.

    Directory of Open Access Journals (Sweden)

    Violeta Spaniol

    Full Text Available BACKGROUND: Moraxella catarrhalis, a major nasopharyngeal pathogen of the human respiratory tract, is exposed to rapid downshifts of environmental temperature when humans breathe cold air. The prevalence of pharyngeal colonization and respiratory tract infections caused by M. catarrhalis is greatest in winter. We investigated how M. catarrhalis uses the physiologic exposure to cold air to regulate pivotal survival systems that may contribute to M. catarrhalis virulence. RESULTS: In this study we used the RNA-seq techniques to quantitatively catalogue the transcriptome of M. catarrhalis exposed to a 26 °C cold shock or to continuous growth at 37 °C. Validation of RNA-seq data using quantitative RT-PCR analysis demonstrated the RNA-seq results to be highly reliable. We observed that a 26 °C cold shock induces the expression of genes that in other bacteria have been related to virulence a strong induction was observed for genes involved in high affinity phosphate transport and iron acquisition, indicating that M. catarrhalis makes a better use of both phosphate and iron resources after exposure to cold shock. We detected the induction of genes involved in nitrogen metabolism, as well as several outer membrane proteins, including ompA, m35-like porin and multidrug efflux pump (acrAB indicating that M. catarrhalis remodels its membrane components in response to downshift of temperature. Furthermore, we demonstrate that a 26 °C cold shock enhances the induction of genes encoding the type IV pili that are essential for natural transformation, and increases the genetic competence of M. catarrhalis, which may facilitate the rapid spread and acquisition of novel virulence-associated genes. CONCLUSION: Cold shock at a physiologically relevant temperature of 26 °C induces in M. catarrhalis a complex of adaptive mechanisms that could convey novel pathogenic functions and may contribute to enhanced colonization and virulence.

  1. A comprehensive simulation study on classification of RNA-Seq data.

    Directory of Open Access Journals (Sweden)

    Gökmen Zararsız

    Full Text Available RNA sequencing (RNA-Seq is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM, classification and regression trees (CART, and random forests (RF. We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based

  2. BrAD-seq: Breath Adapter Directional sequencing: a streamlined, ultra-simple and fast library preparation protocol for strand specific mRNA library construction.

    Directory of Open Access Journals (Sweden)

    Brad Thomas Townsley

    2015-05-01

    Full Text Available Next Generation Sequencing (NGS is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing inherent properties of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE libraries and can easily extend to full transcript coverage shotgun (SHO type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.

  3. Deep RNA-Seq profile reveals biodiversity, plant-microbe interactions and a large family of NBS-LRR resistance genes in walnut (Juglans regia) tissues.

    Science.gov (United States)

    Chakraborty, Sandeep; Britton, Monica; Martínez-García, P J; Dandekar, Abhaya M

    2016-03-01

    Deep RNA-Seq profiling, a revolutionary method used for quantifying transcriptional levels, often includes non-specific transcripts from other co-existing organisms in spite of stringent protocols. Using the recently published walnut genome sequence as a filter, we present a broad analysis of the RNA-Seq derived transcriptome profiles obtained from twenty different tissues to extract the biodiversity and possible plant-microbe interactions in the walnut ecosystem in California. Since the residual nature of the transcripts being analyzed does not provide sufficient information to identify the exact strain, inferences made are constrained to the genus level. The presence of the pathogenic oomycete Phytophthora was detected in the root through the presence of a glyceraldehyde-3-phosphate dehydrogenase. Cryptococcus, the causal agent of cryptococcosis, was found in the catkins and vegetative buds, corroborating previous work indicating that the plant surface supported the sexual cycle of this human pathogen. The RNA-Seq profile revealed several species of the endophytic nitrogen fixing Actinobacteria. Another bacterial species implicated in aerobic biodegradation of methyl tert-butyl ether (Methylibium petroleiphilum) is also found in the root. RNA encoding proteins from the pea aphid were found in the leaves and vegetative buds, while a serine protease from mosquito with significant homology to a female reproductive tract protease from Drosophila mojavensis in the vegetative bud suggests egg-laying activities. The comprehensive analysis of RNA-seq data present also unraveled detailed, tissue-specific information of ~400 transcripts encoded by the largest family of resistance (R) genes (NBS-LRR), which possibly rationalizes the resistance of the specific walnut plant to the pathogens detected. Thus, we elucidate the biodiversity and possible plant-microbe interactions in several walnut (Juglans regia) tissues in California using deep RNA-Seq profiling.

  4. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes.

    Science.gov (United States)

    Ackermann, Amanda M; Wang, Zhiping; Schug, Jonathan; Naji, Ali; Kaestner, Klaus H

    2016-03-01

    Although glucagon-secreting α-cells and insulin-secreting β-cells have opposing functions in regulating plasma glucose levels, the two cell types share a common developmental origin and exhibit overlapping transcriptomes and epigenomes. Notably, destruction of β-cells can stimulate repopulation via transdifferentiation of α-cells, at least in mice, suggesting plasticity between these cell fates. Furthermore, dysfunction of both α- and β-cells contributes to the pathophysiology of type 1 and type 2 diabetes, and β-cell de-differentiation has been proposed to contribute to type 2 diabetes. Our objective was to delineate the molecular properties that maintain islet cell type specification yet allow for cellular plasticity. We hypothesized that correlating cell type-specific transcriptomes with an atlas of open chromatin will identify novel genes and transcriptional regulatory elements such as enhancers involved in α- and β-cell specification and plasticity. We sorted human α- and β-cells and performed the "Assay for Transposase-Accessible Chromatin with high throughput sequencing" (ATAC-seq) and mRNA-seq, followed by integrative analysis to identify cell type-selective gene regulatory regions. We identified numerous transcripts with either α-cell- or β-cell-selective expression and discovered the cell type-selective open chromatin regions that correlate with these gene activation patterns. We confirmed cell type-selective expression on the protein level for two of the top hits from our screen. The "group specific protein" (GC; or vitamin D binding protein) was restricted to α-cells, while CHODL (chondrolectin) immunoreactivity was only present in β-cells. Furthermore, α-cell- and β-cell-selective ATAC-seq peaks were identified to overlap with known binding sites for islet transcription factors, as well as with single nucleotide polymorphisms (SNPs) previously identified as risk loci for type 2 diabetes. We have determined the genetic landscape of

  5. Determination of sRNA expressions by RNA-seq in Yersinia pestis grown in vitro and during infection.

    Directory of Open Access Journals (Sweden)

    Yanfeng Yan

    Full Text Available BACKGROUND: Small non-coding RNAs (sRNAs facilitate host-microbe interactions. They have a central function in the post-transcriptional regulation during pathogenic lifestyles. Hfq, an RNA-binding protein that many sRNAs act in conjunction with, is required for Y. pestis pathogenesis. However, information on how Yersinia pestis modulates the expression of sRNAs during infection is largely unknown. METHODOLOGY AND PRINCIPAL FINDINGS: We used RNA-seq technology to identify the sRNA candidates expressed from Y. pestis grown in vitro and in the infected lungs of mice. A total of 104 sRNAs were found, including 26 previously annotated sRNAs, by searching against the Rfam database with 78 novel sRNA candidates. Approximately 89% (93/104 of these sRNAs from Y. pestis are shared with its ancestor Y. pseudotuberculosis. Ninety-seven percent of these sRNAs (101/104 are shared among more than 80 sequenced genomes of 135 Y. pestis strains. These 78 novel sRNAs include 62 intergenic and 16 antisense sRNAs. Fourteen sRNAs were selected for verification by independent Northern blot analysis. Results showed that nine selected sRNA transcripts were Hfq-dependent. Interestingly, three novel sRNAs were identified as new members of the transcription factor CRP regulon. Semi-quantitative analysis revealed that Y. pestis from the infected lungs induced the expressions of six sRNAs including RyhB1, RyhB2, CyaR/RyeE, 6S RNA, RybB and sR039 and repressed the expressions of four sRNAs, including CsrB, CsrC, 4.5S RNA and sR027. CONCLUSIONS AND SIGNIFICANCE: This study is the first attempt to subject RNA from Y. pestis-infected samples to direct high-throughput sequencing. Many novel sRNAs were identified and the expression patterns of relevant sRNAs in Y. pestis during in vitro growth and in vivo infection were revealed. The annotated sRNAs accounted for the most abundant sRNAs either expressed in bacteria grown in vitro or differentially expressed in the infected lungs

  6. Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data.

    Science.gov (United States)

    Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.

  7. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu, E-mail: lssqlh@mail.sysu.edu.cn; Yang, Jian-Hua, E-mail: lssqlh@mail.sysu.edu.cn [RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou (China)

    2015-01-14

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  8. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    International Nuclear Information System (INIS)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu; Yang, Jian-Hua

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  9. Identification of cytokinin-responsive genes using microarray meta-analysis and RNA-Seq in Arabidopsis.

    Science.gov (United States)

    Bhargava, Apurva; Clabaugh, Ivory; To, Jenn P; Maxwell, Bridey B; Chiang, Yi-Hsuan; Schaller, G Eric; Loraine, Ann; Kieber, Joseph J

    2013-05-01

    Cytokinins are N(6)-substituted adenine derivatives that play diverse roles in plant growth and development. We sought to define a robust set of genes regulated by cytokinin as well as to query the response of genes not represented on microarrays. To this end, we performed a meta-analysis of microarray data from a variety of cytokinin-treated samples and used RNA-seq to examine cytokinin-regulated gene expression in Arabidopsis (Arabidopsis thaliana). Microarray meta-analysis using 13 microarray experiments combined with empirically defined filtering criteria identified a set of 226 genes differentially regulated by cytokinin, a subset of which has previously been validated by other methods. RNA-seq validated about 73% of the up-regulated genes identified by this meta-analysis. In silico promoter analysis indicated an overrepresentation of type-B Arabidopsis response regulator binding elements, consistent with the role of type-B Arabidopsis response regulators as primary mediators of cytokinin-responsive gene expression. RNA-seq analysis identified 73 cytokinin-regulated genes that were not represented on the ATH1 microarray. Representative genes were verified using quantitative reverse transcription-polymerase chain reaction and NanoString analysis. Analysis of the genes identified reveals a substantial effect of cytokinin on genes encoding proteins involved in secondary metabolism, particularly those acting in flavonoid and phenylpropanoid biosynthesis, as well as in the regulation of redox state of the cell, particularly a set of glutaredoxin genes. Novel splicing events were found in members of some gene families that are known to play a role in cytokinin signaling or metabolism. The genes identified in this analysis represent a robust set of cytokinin-responsive genes that are useful in the analysis of cytokinin function in plants.

  10. Genome-wide dynamic transcriptional profiling in clostridium beijerinckii NCIMB 8052 using single-nucleotide resolution RNA-Seq

    Directory of Open Access Journals (Sweden)

    Wang Yi

    2012-03-01

    Full Text Available Abstract Background Clostridium beijerinckii is a prominent solvent-producing microbe that has great potential for biofuel and chemical industries. Although transcriptional analysis is essential to understand gene functions and regulation and thus elucidate proper strategies for further strain improvement, limited information is available on the genome-wide transcriptional analysis for C. beijerinckii. Results The genome-wide transcriptional dynamics of C. beijerinckii NCIMB 8052 over a batch fermentation process was investigated using high-throughput RNA-Seq technology. The gene expression profiles indicated that the glycolysis genes were highly expressed throughout the fermentation, with comparatively more active expression during acidogenesis phase. The expression of acid formation genes was down-regulated at the onset of solvent formation, in accordance with the metabolic pathway shift from acidogenesis to solventogenesis. The acetone formation gene (adc, as a part of the sol operon, exhibited highly-coordinated expression with the other sol genes. Out of the > 20 genes encoding alcohol dehydrogenase in C. beijerinckii, Cbei_1722 and Cbei_2181 were highly up-regulated at the onset of solventogenesis, corresponding to their key roles in primary alcohol production. Most sporulation genes in C. beijerinckii 8052 demonstrated similar temporal expression patterns to those observed in B. subtilis and C. acetobutylicum, while sporulation sigma factor genes sigE and sigG exhibited accelerated and stronger expression in C. beijerinckii 8052, which is consistent with the more rapid forespore and endspore development in this strain. Global expression patterns for specific gene functional classes were examined using self-organizing map analysis. The genes associated with specific functional classes demonstrated global expression profiles corresponding to the cell physiological variation and metabolic pathway switch. Conclusions The results from this

  11. A comparative study of techniques for differential expression analysis on RNA-Seq data.

    Directory of Open Access Journals (Sweden)

    Zong Hong Zhang

    Full Text Available Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.

  12. Simultaneous characterization of cellular RNA structure and function with in-cell SHAPE-Seq.

    Science.gov (United States)

    Watters, Kyle E; Abbott, Timothy R; Lucks, Julius B

    2016-01-29

    Many non-coding RNAs form structures that interact with cellular machinery to control gene expression. A central goal of molecular and synthetic biology is to uncover design principles linking RNA structure to function to understand and engineer this relationship. Here we report a simple, high-throughput method called in-cell SHAPE-Seq that combines in-cell probing of RNA structure with a measurement of gene expression to simultaneously characterize RNA structure and function in bacterial cells. We use in-cell SHAPE-Seq to study the structure-function relationship of two RNA mechanisms that regulate translation in Escherichia coli. We find that nucleotides that participate in RNA-RNA interactions are highly accessible when their binding partner is absent and that changes in RNA structure due to RNA-RNA interactions can be quantitatively correlated to changes in gene expression. We also characterize the cellular structures of three endogenously expressed non-coding RNAs: 5S rRNA, RNase P and the btuB riboswitch. Finally, a comparison between in-cell and in vitro folded RNA structures revealed remarkable similarities for synthetic RNAs, but significant differences for RNAs that participate in complex cellular interactions. Thus, in-cell SHAPE-Seq represents an easily approachable tool for biologists and engineers to uncover relationships between sequence, structure and function of RNAs in the cell. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Radiation-induced alternative transcripts as detected in total and polysome-bound mRNA.

    Science.gov (United States)

    Wahba, Amy; Ryan, Michael C; Shankavaram, Uma T; Camphausen, Kevin; Tofilon, Philip J

    2018-01-02

    Alternative splicing is a critical event in the posttranscriptional regulation of gene expression. To investigate whether this process influences radiation-induced gene expression we defined the effects of ionizing radiation on the generation of alternative transcripts in total cellular mRNA (the transcriptome) and polysome-bound mRNA (the translatome) of the human glioblastoma stem-like cell line NSC11. For these studies, RNA-Seq profiles from control and irradiated cells were compared using the program SpliceSeq to identify transcripts and splice variations induced by radiation. As compared to the transcriptome (total RNA) of untreated cells, the radiation-induced transcriptome contained 92 splice events suggesting that radiation induced alternative splicing. As compared to the translatome (polysome-bound RNA) of untreated cells, the radiation-induced translatome contained 280 splice events of which only 24 were overlapping with the radiation-induced transcriptome. These results suggest that radiation not only modifies alternative splicing of precursor mRNA, but also results in the selective association of existing mRNA isoforms with polysomes. Comparison of radiation-induced alternative transcripts to radiation-induced gene expression in total RNA revealed little overlap (about 3%). In contrast, in the radiation-induced translatome, about 38% of the induced alternative transcripts corresponded to genes whose expression level was affected in the translatome. This study suggests that whereas radiation induces alternate splicing, the alternative transcripts present at the time of irradiation may play a role in the radiation-induced translational control of gene expression and thus cellular radioresponse.

  14. MicroRNA transfection and AGO-bound CLIP-seq data sets reveal distinct determinants of miRNA action

    DEFF Research Database (Denmark)

    Wen, Jiayu; Parker, Brian J; Jacobsen, Anders

    2011-01-01

    the predictive effect of target flanking features. We observe distinct target determinants between expression-based and CLIP-based data. Target flanking features such as flanking region conservation are an important AGO-binding determinant-we hypothesize that CLIP experiments have a preference for strongly bound......Microarray expression analyses following miRNA transfection/inhibition and, more recently, Argonaute cross-linked immunoprecipitation (CLIP)-seq assays have been used to detect miRNA target sites. CLIP and expression approaches measure differing stages of miRNA functioning-initial binding of the mi...... miRNP-target interactions involving adjacent RNA-binding proteins that increase the strength of cross-linking. In contrast, seed-related features are major determinants in expression-based studies, but less so for CLIP-seq studies, and increased miRNA concentrations typical of transfection studies...

  15. HAfTs are novel lncRNA transcripts from aflatoxin exposure.

    Directory of Open Access Journals (Sweden)

    B Alex Merrick

    Full Text Available The transcriptome can reveal insights into precancer biology. We recently conducted RNA-Seq analysis on liver RNA from male rats exposed to the carcinogen, aflatoxin B1 (AFB1, for 90 days prior to liver tumor onset. Among >1,000 differentially expressed transcripts, several novel, unannotated Cufflinks-assembled transcripts, or HAfTs (Hepatic Aflatoxin Transcripts were found. We hypothesized PCR-cloning and RACE (rapid amplification of cDNA ends could further HAfT identification. Sanger data was obtained for 6 transcripts by PCR and 16 transcripts by 5'- and 3'-RACE. BLAST alignments showed, with two exceptions, HAfT transcripts were lncRNAs, >200nt without apparent long open reading frames. Six rat HAfT transcripts were classified as 'novel' without RefSeq annotation. Sequence alignment and genomic synteny showed each rat lncRNA had a homologous locus in the mouse genome and over half had homologous loci in the human genome, including at least two loci (and possibly three others that were previously unannotated. While HAfT functions are not yet clear, coregulatory roles may be possible from their adjacent orientation to known coding genes with altered expression that include 8 HAfT-gene pairs. For example, a unique rat HAfT, homologous to Pvt1, was adjacent to known genes controlling cell proliferation. Additionally, PCR and RACE Sanger sequencing showed many alternative splice variants and refinements of exon sequences compared to Cufflinks assembled transcripts and gene prediction algorithms. Presence of multiple splice variants and short tandem repeats found in some HAfTs may be consequential for secondary structure, transcriptional regulation, and function. In summary, we report novel, differentially expressed lncRNAs after exposure to the genotoxicant, AFB1, prior to neoplastic lesions. Complete cloning and sequencing of such transcripts could pave the way for a new set of sensitive and early prediction markers for chemical

  16. RNA-seq analysis of unintended effects in transgenic wheat overexpressing the transcription factor GmDREB1

    Directory of Open Access Journals (Sweden)

    Qiyan Jiang

    2017-06-01

    Full Text Available The engineering of plants with enhanced tolerance to abiotic stresses typically involves complex multigene networks and may therefore have a greater potential to introduce unintended effects than the genetic modification for simple monogenic traits. For this reason, it is essential to study the unintended effects in transgenic plants engineered for stress tolerance. We selected drought- and salt-tolerant transgenic wheat overexpressing the transcription factor, GmDREB1, to investigate unintended pleiotropic effects using RNA-seq analysis. We compared the transcriptome alteration of transgenic plants with that of wild-type plants subjected to salt stress as a control. We found that GmDREB1 overexpression had a minimal impact on gene expression under normal conditions. GmDREB1 overexpression resulted in transcriptional reprogramming of the salt response, but many of the genes with differential expression are known to mitigate salt stress and contribute incrementally to the enhanced stress tolerance of transgenic wheat. GmDREB1 overexpression did not activate unintended gene networks with respect to gene expression in the roots of transgenic wheat. This work is important for establishing a method of detecting unintended effects of genetic engineering and the safety of such traits with the development of marketable transgenic crops in the near future.

  17. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    Science.gov (United States)

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. RNA CoMPASS: a dual approach for pathogen and host transcriptome analysis of RNA-seq datasets.

    Directory of Open Access Journals (Sweden)

    Guorong Xu

    Full Text Available High-throughput RNA sequencing (RNA-seq has become an instrumental assay for the analysis of multiple aspects of an organism's transcriptome. Further, the analysis of a biological specimen's associated microbiome can also be performed using RNA-seq data and this application is gaining interest in the scientific community. There are many existing bioinformatics tools designed for analysis and visualization of transcriptome data. Despite the availability of an array of next generation sequencing (NGS analysis tools, the analysis of RNA-seq data sets poses a challenge for many biomedical researchers who are not familiar with command-line tools. Here we present RNA CoMPASS, a comprehensive RNA-seq analysis pipeline for the simultaneous analysis of transcriptomes and metatranscriptomes from diverse biological specimens. RNA CoMPASS leverages existing tools and parallel computing technology to facilitate the analysis of even very large datasets. RNA CoMPASS has a web-based graphical user interface with intrinsic queuing to control a distributed computational pipeline. RNA CoMPASS was evaluated by analyzing RNA-seq data sets from 45 B-cell samples. Twenty-two of these samples were derived from lymphoblastoid cell lines (LCLs generated by the infection of naïve B-cells with the Epstein Barr virus (EBV, while another 23 samples were derived from Burkitt's lymphomas (BL, some of which arose in part through infection with EBV. Appropriately, RNA CoMPASS identified EBV in all LCLs and in a fraction of the BLs. Cluster analysis of the human transcriptome component of the RNA CoMPASS output clearly separated the BLs (which have a germinal center-like phenotype from the LCLs (which have a blast-like phenotype with evidence of activated MYC signaling and lower interferon and NF-kB signaling in the BLs. Together, this analysis illustrates the utility of RNA CoMPASS in the simultaneous analysis of transcriptome and metatranscriptome data. RNA CoMPASS is freely

  19. GC-Content Normalization for RNA-Seq Data

    Science.gov (United States)

    2011-01-01

    Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264

  20. RNA-seq transcriptional profiling of Herbaspirillum seropedicae colonizing wheat (Triticum aestivum) roots.

    Science.gov (United States)

    Pankievicz, V C S; Camilios-Neto, D; Bonato, P; Balsanelli, E; Tadra-Sfeir, M Z; Faoro, H; Chubatsu, L S; Donatti, L; Wajnberg, G; Passetti, F; Monteiro, R A; Pedrosa, F O; Souza, E M

    2016-04-01

    Herbaspirillum seropedicae is a diazotrophic and endophytic bacterium that associates with economically important grasses promoting plant growth and increasing productivity. To identify genes related to bacterial ability to colonize plants, wheat seedlings growing hydroponically in Hoagland's medium were inoculated with H. seropedicae and incubated for 3 days. Total mRNA from the bacteria present in the root surface and in the plant medium were purified, depleted from rRNA and used for RNA-seq profiling. RT-qPCR analyses were conducted to confirm regulation of selected genes. Comparison of RNA profile of root attached and planktonic bacteria revealed extensive metabolic adaptations to the epiphytic life style. These adaptations include expression of specific adhesins and cell wall re-modeling to attach to the root. Additionally, the metabolism was adapted to the microxic environment and nitrogen-fixation genes were expressed. Polyhydroxybutyrate (PHB) synthesis was activated, and PHB granules were stored as observed by microscopy. Genes related to plant growth promotion, such as auxin production were expressed. Many ABC transporter genes were regulated in the bacteria attached to the roots. The results provide new insights into the adaptation of H. seropedicae to the interaction with the plant.

  1. SERE: single-parameter quality control and sample comparison for RNA-Seq.

    Science.gov (United States)

    Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S

    2012-10-03

    Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

  2. An Integrated Approach for RNA-seq Data Normalization.

    Science.gov (United States)

    Yang, Shengping; Mercante, Donald E; Zhang, Kun; Fang, Zhide

    2016-01-01

    DNA copy number alteration is common in many cancers. Studies have shown that insertion or deletion of DNA sequences can directly alter gene expression, and significant correlation exists between DNA copy number and gene expression. Data normalization is a critical step in the analysis of gene expression generated by RNA-seq technology. Successful normalization reduces/removes unwanted nonbiological variations in the data, while keeping meaningful information intact. However, as far as we know, no attempt has been made to adjust for the variation due to DNA copy number changes in RNA-seq data normalization. In this article, we propose an integrated approach for RNA-seq data normalization. Comparisons show that the proposed normalization can improve power for downstream differentially expressed gene detection and generate more biologically meaningful results in gene profiling. In addition, our findings show that due to the effects of copy number changes, some housekeeping genes are not always suitable internal controls for studying gene expression. Using information from DNA copy number, integrated approach is successful in reducing noises due to both biological and nonbiological causes in RNA-seq data, thus increasing the accuracy of gene profiling.

  3. Evaluation of Human Adipose Tissue Stromal Heterogeneity in Metabolic Disease Using Single Cell RNA-Seq

    Science.gov (United States)

    2017-09-01

    AWARD NUMBER: W81XWH-15-1-0251 TITLE: “Evaluation of Human Adipose Tissue Stromal Heterogeneity in Metabolic Disease Using Single Cell RNA...Heterogeneity in Metabolic Disease Using Single- Cell RNA-Seq 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER Linus Tzu-Yen...ABSTRACT We have developed a robust protocol to generate single cell transcriptional profiles from subcutaneous adipose tissue samples of both human

  4. Transforming RNA-Seq data to improve the performance of prognostic gene signatures.

    Science.gov (United States)

    Zwiener, Isabella; Frisch, Barbara; Binder, Harald

    2014-01-01

    Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

  5. Transforming RNA-Seq data to improve the performance of prognostic gene signatures.

    Directory of Open Access Journals (Sweden)

    Isabella Zwiener

    Full Text Available Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

  6. Transcriptional profiling of cells sorted by RNA abundance

    NARCIS (Netherlands)

    Klemm, Sandy; Semrau, Stefan; Wiebrands, Kay; Mooijman, Dylan; Faddah, Dina A; Jaenisch, Rudolf; van Oudenaarden, Alexander

    We have developed a quantitative technique for sorting cells on the basis of endogenous RNA abundance, with a molecular resolution of 10-20 transcripts. We demonstrate efficient and unbiased RNA extraction from transcriptionally sorted cells and report a high-fidelity transcriptome measurement of

  7. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.

    Science.gov (United States)

    Law, Charity W; Chen, Yunshun; Shi, Wei; Smyth, Gordon K

    2014-02-03

    New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.

  8. A comprehensive evaluation of alignment algorithms in the context of RNA-seq.

    Directory of Open Access Journals (Sweden)

    Robert Lindner

    Full Text Available Transcriptome sequencing (RNA-Seq overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.

  9. Biotechnological applications of mobile group II introns and their reverse transcriptases: gene targeting, RNA-seq, and non-coding RNA analysis.

    Science.gov (United States)

    Enyeart, Peter J; Mohr, Georg; Ellington, Andrew D; Lambowitz, Alan M

    2014-01-13

    Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into 'targetrons.' Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and 'cut-and-pastes' (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The

  10. Analysis Of Transcriptomes In A Porcine Tissue Collection Using RNA-Seq And Genome Assembly 10

    DEFF Research Database (Denmark)

    Hornshøj, Henrik; Thomsen, Bo; Hedegaard, Jakob

    2011-01-01

    The release of Sus scrofa genome assembly 10 supports improvement of the pig genome annotation and in depth transcriptome analyses using next-generation sequencing technologies. In this study we analyze RNA-seq reads from a tissue collection, including 10 separate tissues from Duroc boars and 10...... short read alignment software we mapped the reads to the genome assembly 10. We extracted contig sequences of gene transcripts using the Cufflinks software. Based on this information we identified expressed genes that are present in the genome assembly. The portion of these genes being previously known...... was roughly estimated by sequence comparison to known genes. Similarly, we searched for genes that are expressed in the tissues but not present in the genome assembly by aligning the non-genome-mapped reads to known gene transcripts. For the genes predicted to have alternative transcript variants by Cufflinks...

  11. NGScloud: RNA-seq analysis of non-model species using cloud computing.

    Science.gov (United States)

    Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai

    2018-05-03

    RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.

  12. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

    DEFF Research Database (Denmark)

    Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol

    2012-01-01

    RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the I......RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated...... gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays...

  13. RNA-SEQ reveals transcriptional level changes of poplar roots in different forms of nitrogen treatments

    Directory of Open Access Journals (Sweden)

    Chunpu eQu

    2016-02-01

    Full Text Available Poplar has emerged as a model plant for understanding molecular mechanisms of tree growth, development and response to environment. Long-term application of different forms of nitrogen (such as NO3--N and NH4+-N may cause morphological changes of poplar roots; however, the molecular level changes are still not well known. In this study, we analyzed the expression profiling of poplar roots treated by three forms of nitrogen: S1 (NH4+, S2 (NH4NO3 and S3 (NO3- by using RNA-SEQ technique. We found 463 genes significantly differentially expressed in roots by different N treatments, of which a total of 116 genes were found to differentially express between S1 and S2, 173 genes between S2 and S3, and 327 genes between S1 and S3. A cluster analysis shows significant difference in many transcription factor families and functional genes family under different N forms. Through an analysis of Mapman metabolic pathway, we found that the significantly differentially expressed genes are associated with fermentation, glycolysis and tricarboxylic acid cycle (TCA, secondary metabolism, hormone metabolism, and transport processing. Interestingly, we did not find significantly differentially expressed genes in N metabolism pathway, mitochondrial electron transport / ATP synthesis and mineral nutrition. We also found abundant candidate genes (20 transcription factors and 30 functional genes regulating morphology changes of poplar roots under the three N forms. The results obtained are beneficial to a better understanding of the potential molecular and cellular mechanisms regulating root morphology changes under different N treatments.

  14. RNA-Seq reveals complex genetic response to deepwater horizon oil release in Fundulus grandis

    Directory of Open Access Journals (Sweden)

    Garcia Tzintzuni I

    2012-09-01

    Full Text Available Abstract Background The release of oil resulting from the blowout of the Deepwater Horizon (DH drilling platform was one of the largest in history discharging more than 189 million gallons of oil and subject to widespread application of oil dispersants. This event impacted a wide range of ecological habitats with a complex mix of pollutants whose biological impact is still not yet fully understood. To better understand the effects on a vertebrate genome, we studied gene expression in the salt marsh minnow Fundulus grandis, which is local to the northern coast of the Gulf of Mexico and is a sister species of the ecotoxicological model Fundulus heteroclitus. To assess genomic changes, we quantified mRNA expression using high throughput sequencing technologies (RNA-Seq in F. grandis populations in the marshes and estuaries impacted by DH oil release. This application of RNA-Seq to a non-model, wild, and ecologically significant organism is an important evaluation of the technology to quickly assess similar events in the future. Results Our de novo assembly of RNA-Seq data produced a large set of sequences which included many duplicates and fragments. In many cases several of these could be associated with a common reference sequence using blast to query a reference database. This reduced the set of significant genes to 1,070 down-regulated and 1,251 up-regulated genes. These genes indicate a broad and complex genomic response to DH oil exposure including the expected AHR-mediated response and CYP genes. In addition a response to hypoxic conditions and an immune response are also indicated. Several genes in the choriogenin family were down-regulated in the exposed group; a response that is consistent with AH exposure. These analyses are in agreement with oligonucleotide-based microarray analyses, and describe only a subset of significant genes with aberrant regulation in the exposed set. Conclusion RNA-Seq may be successfully applied to feral and

  15. SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq Data

    Directory of Open Access Journals (Sweden)

    Yuxiang Tan

    2015-01-01

    Full Text Available The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

  16. RNA-seq reveals more consistent reference genes for gene expression studies in human non-melanoma skin cancers

    Directory of Open Access Journals (Sweden)

    Van L.T. Hoang

    2017-08-01

    Full Text Available Identification of appropriate reference genes (RGs is critical to accurate data interpretation in quantitative real-time PCR (qPCR experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer.

  17. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    Science.gov (United States)

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  18. Influenza Virus Mounts a Two-Pronged Attack on Host RNA Polymerase II Transcription.

    Science.gov (United States)

    Bauer, David L V; Tellier, Michael; Martínez-Alonso, Mónica; Nojima, Takayuki; Proudfoot, Nick J; Murphy, Shona; Fodor, Ervin

    2018-05-15

    Influenza virus intimately associates with host RNA polymerase II (Pol II) and mRNA processing machinery. Here, we use mammalian native elongating transcript sequencing (mNET-seq) to examine Pol II behavior during viral infection. We show that influenza virus executes a two-pronged attack on host transcription. First, viral infection causes decreased Pol II gene occupancy downstream of transcription start sites. Second, virus-induced cellular stress leads to a catastrophic failure of Pol II termination at poly(A) sites, with transcription often continuing for tens of kilobases. Defective Pol II termination occurs independently of the ability of the viral NS1 protein to interfere with host mRNA processing. Instead, this termination defect is a common effect of diverse cellular stresses and underlies the production of previously reported downstream-of-gene transcripts (DoGs). Our work has implications for understanding not only host-virus interactions but also fundamental aspects of mammalian transcription. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  19. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.

    Science.gov (United States)

    Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho

    2015-10-28

    Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ

  20. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.

    Science.gov (United States)

    Yip, Shun H; Sham, Pak Chung; Wang, Junwen

    2018-02-21

    Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.

  1. TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data.

    Science.gov (United States)

    Jorjani, Hadi; Zavolan, Mihaela

    2014-04-01

    Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. In this work, we present a computational method called 'TSSer' that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php

  2. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  3. Transcriptional landscape of ncRNA and Repeat elements in somatic cells

    KAUST Repository

    Ghosheh, Yanal

    2016-12-01

    The advancement of Nucleic acids (DNA and RNA) sequencing technology has enabled many projects targeted towards the identification of genome structure and transcriptome complexity of organisms. The first conclusions of the human and mouse projects have underscored two important, yet unexpected, findings. First, while almost the entire genome is transcribed, only 5% of it encodes for proteins. Thereby, most transcripts are noncoding RNA. This includes both short RNA (<200 nucleotides (nt)) comprising piRNAs; microRNAs (miRNAs); endogenous Short Interfering RNAs (siRNAs) among others, and includes lncRNA (>200nt). Second, a significant portion of the mammalian genome (45%) is composed of Repeat Elements (REs). RE are mostly relics of ancestral viruses that during evolution have invaded the host genome by producing thousands of copies. Their roles within their host genomes have yet to be fully explored considering that they sometimes produce lncRNA, and have been shown to influence expression at the transcriptional and post-transcriptional levels. Moreover, because some REs can still mobilize within host genomes, host genomes have evolved mechanisms, mainly epigenetic, to maintain REs under tight control. Recent reports indicate that REs activity is regulated in somatic cells, particularily in the brain, suggesting a physiological role of RE mobilization during normal development. In this thesis, I focus on the analysis of ncRNAs, specifically REs; piRNAs; lncRNAs in human and mouse post-mitotic somatic cells. The main aspects of this analysis are: Using sRNA-Seq, I show that piRNAs, a class of ncRNAs responsible for the silencing of Transposable elements (TEs) in testes, are present also in adult mouse brain. Furthermore, their regulation shows only a subset of testes piRNAs are expressed in the brain and may be controlled by known neurogenesis factors. To investigate the dynamics of the transcriptome during cellular differentiation, I examined deep RNA-Seq and Cap

  4. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology

    DEFF Research Database (Denmark)

    Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr

    2017-01-01

    Background RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver...

  5. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    Science.gov (United States)

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  6. mRNA-Seq of single prostate cancer circulating tumor cells reveals recapitulation of gene expression and pathways found in prostate cancer.

    Science.gov (United States)

    Cann, Gordon M; Gulzar, Zulfiqar G; Cooper, Samantha; Li, Robin; Luo, Shujun; Tat, Mai; Stuart, Sarah; Schroth, Gary; Srinivas, Sandhya; Ronaghi, Mostafa; Brooks, James D; Talasaz, Amirali H

    2012-01-01

    Circulating tumor cells (CTC) mediate metastatic spread of many solid tumors and enumeration of CTCs is currently used as a prognostic indicator of survival in metastatic prostate cancer patients. Some evidence suggests that it is possible to derive additional information about tumors from expression analysis of CTCs, but the technical difficulty of isolating and analyzing individual CTCs has limited progress in this area. To assess the ability of a new generation of MagSweeper to isolate intact CTCs for downstream analysis, we performed mRNA-Seq on single CTCs isolated from the blood of patients with metastatic prostate cancer and on single prostate cancer cell line LNCaP cells spiked into the blood of healthy donors. We found that the MagSweeper effectively isolated CTCs with a capture efficiency that matched the CellSearch platform. However, unlike CellSearch, the MagSweeper facilitates isolation of individual live CTCs without contaminating leukocytes. Importantly, mRNA-Seq analysis showed that the MagSweeper isolation process did not have a discernible impact on the transcriptional profile of single LNCaPs isolated from spiked human blood, suggesting that any perturbations caused by the MagSweeper process on the transcriptional signature of isolated cells are modest. Although the RNA from patient CTCs showed signs of significant degradation, consistent with reports of short half-lives and apoptosis amongst CTCs, transcriptional signatures of prostate tissue and of cancer were readily detectable with single CTC mRNA-Seq. These results demonstrate that the MagSweeper provides access to intact CTCs and that these CTCs can potentially supply clinically relevant information.

  7. Deep RNA sequencing reveals hidden features and dynamics of early gene transcription in Paramecium bursaria chlorella virus 1.

    Directory of Open Access Journals (Sweden)

    Guillaume Blanc

    Full Text Available Paramecium bursaria chlorella virus 1 (PBCV-1 is the prototype of the genus Chlorovirus (family Phycodnaviridae that infects the unicellular, eukaryotic green alga Chlorella variabilis NC64A. The 331-kb PBCV-1 genome contains 416 major open reading frames. A mRNA-seq approach was used to analyze PBCV-1 transcriptomes at 6 progressive times during the first hour of infection. The alignment of 17 million reads to the PBCV-1 genome allowed the construction of single-base transcriptome maps. Significant transcription was detected for a subset of 50 viral genes as soon as 7 min after infection. By 20 min post infection (p.i., transcripts were detected for most PBCV-1 genes and transcript levels continued to increase globally up to 60 min p.i., at which time 41% or the poly (A+-containing RNAs in the infected cells mapped to the PBCV-1 genome. For some viral genes, the number of transcripts in the latter time points (20 to 60 min p.i. was much higher than that of the most highly expressed host genes. RNA-seq data revealed putative polyadenylation signal sequences in PBCV-1 genes that were identical to the polyadenylation signal AAUAAA of green algae. Several transcripts have an RNA fragment excised. However, the frequency of excision and the resulting putative shortened protein products suggest that most of these excision events have no functional role but are probably the result of the activity of misled splicesomes.

  8. Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq).

    Science.gov (United States)

    Watters, Kyle E; Lucks, Julius B

    2016-01-01

    Mapping RNA structure with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry has proven to be a versatile method for characterizing RNA structure in a variety of contexts. SHAPE reagents covalently modify RNAs in a structure-dependent manner to create adducts at the 2'-OH group of the ribose backbone at nucleotides that are structurally flexible. The positions of these adducts are detected using reverse transcriptase (RT) primer extension, which stops one nucleotide before the modification, to create a pool of cDNAs whose lengths reflect the location of SHAPE modification. Quantification of the cDNA pools is used to estimate the "reactivity" of each nucleotide in an RNA molecule to the SHAPE reagent. High reactivities indicate nucleotides that are structurally flexible, while low reactivities indicate nucleotides that are inflexible. These SHAPE reactivities can then be used to infer RNA structures by restraining RNA structure prediction algorithms. Here, we provide a state-of-the-art protocol describing how to perform in vitro RNA structure probing with SHAPE chemistry using next-generation sequencing to quantify cDNA pools and estimate reactivities (SHAPE-Seq). The use of next-generation sequencing allows for higher throughput, more consistent data analysis, and multiplexing capabilities. The technique described herein, SHAPE-Seq v2.0, uses a universal reverse transcription priming site that is ligated to the RNA after SHAPE modification. The introduced priming site allows for the structural analysis of an RNA independent of its sequence.

  9. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data.

    Science.gov (United States)

    Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2017-01-04

    The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Fully automated pipeline for detection of sex linked genes using RNA-Seq data.

    Science.gov (United States)

    Michalovova, Monika; Kubat, Zdenek; Hobza, Roman; Vyskot, Boris; Kejnovsky, Eduard

    2015-03-11

    Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design

  11. miRge - A Multiplexed Method of Processing Small RNA-Seq Data to Determine MicroRNA Entropy.

    Directory of Open Access Journals (Sweden)

    Alexander S Baras

    Full Text Available Small RNA RNA-seq for microRNAs (miRNAs is a rapidly developing field where opportunities still exist to create better bioinformatics tools to process these large datasets and generate new, useful analyses. We built miRge to be a fast, smart small RNA-seq solution to process samples in a highly multiplexed fashion. miRge employs a Bayesian alignment approach, whereby reads are sequentially aligned against customized mature miRNA, hairpin miRNA, noncoding RNA and mRNA sequence libraries. miRNAs are summarized at the level of raw reads in addition to reads per million (RPM. Reads for all other RNA species (tRNA, rRNA, snoRNA, mRNA are provided, which is useful for identifying potential contaminants and optimizing small RNA purification strategies. miRge was designed to optimally identify miRNA isomiRs and employs an entropy based statistical measurement to identify differential production of isomiRs. This allowed us to identify decreasing entropy in isomiRs as stem cells mature into retinal pigment epithelial cells. Conversely, we show that pancreatic tumor miRNAs have similar entropy to matched normal pancreatic tissues. In a head-to-head comparison with other miRNA analysis tools (miRExpress 2.0, sRNAbench, omiRAs, miRDeep2, Chimira, UEA small RNA Workbench, miRge was faster (4 to 32-fold and was among the top-two methods in maximally aligning miRNAs reads per sample. Moreover, miRge has no inherent limits to its multiplexing. miRge was capable of simultaneously analyzing 100 small RNA-Seq samples in 52 minutes, providing an integrated analysis of miRNA expression across all samples. As miRge was designed for analysis of single as well as multiple samples, miRge is an ideal tool for high and low-throughput users. miRge is freely available at http://atlas.pathology.jhu.edu/baras/miRge.html.

  12. mRNA-Seq of single prostate cancer circulating tumor cells reveals recapitulation of gene expression and pathways found in prostate cancer.

    Directory of Open Access Journals (Sweden)

    Gordon M Cann

    Full Text Available Circulating tumor cells (CTC mediate metastatic spread of many solid tumors and enumeration of CTCs is currently used as a prognostic indicator of survival in metastatic prostate cancer patients. Some evidence suggests that it is possible to derive additional information about tumors from expression analysis of CTCs, but the technical difficulty of isolating and analyzing individual CTCs has limited progress in this area. To assess the ability of a new generation of MagSweeper to isolate intact CTCs for downstream analysis, we performed mRNA-Seq on single CTCs isolated from the blood of patients with metastatic prostate cancer and on single prostate cancer cell line LNCaP cells spiked into the blood of healthy donors. We found that the MagSweeper effectively isolated CTCs with a capture efficiency that matched the CellSearch platform. However, unlike CellSearch, the MagSweeper facilitates isolation of individual live CTCs without contaminating leukocytes. Importantly, mRNA-Seq analysis showed that the MagSweeper isolation process did not have a discernible impact on the transcriptional profile of single LNCaPs isolated from spiked human blood, suggesting that any perturbations caused by the MagSweeper process on the transcriptional signature of isolated cells are modest. Although the RNA from patient CTCs showed signs of significant degradation, consistent with reports of short half-lives and apoptosis amongst CTCs, transcriptional signatures of prostate tissue and of cancer were readily detectable with single CTC mRNA-Seq. These results demonstrate that the MagSweeper provides access to intact CTCs and that these CTCs can potentially supply clinically relevant information.

  13. It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

    Science.gov (United States)

    Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K

    2016-01-01

    RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

  14. RNA-seq for gene identification and transcript profiling in relation to root growth of bermudagrass (Cynodon dactylon) under salinity stress.

    Science.gov (United States)

    Hu, Longxing; Li, Huiying; Chen, Liang; Lou, Yanhong; Amombo, Erick; Fu, Jinmin

    2015-08-04

    Soil salinity is one of the most significant abiotic stresses affecting plant shoots and roots growth. The adjustment of root architecture to spatio-temporal heterogeneity in salinity is particularly critical for plant growth and survival. Bermudagrass (Cynodon dactylon) is a widely used turf and forage perennial grass with a high degree of salinity tolerance. Salinity appears to stimulate the growth of roots and decrease their mortality in tolerant bermudagrass. To estimate a broad spectrum of genes related to root elongation affected by salt stress and the molecular mechanisms that control the positive response of root architecture to salinity, we analyzed the transcriptome of bermudagrass root tips in response to salinity. RNA-sequencing was performed in root tips of two bermudagrass genotypes contrasting in salt tolerance. A total of 237,850,130 high quality clean reads were generated and 250,359 transcripts were assembled with an average length of 1115 bp. Totally, 103,324 unigenes obtained with 53,765 unigenes (52 %) successfully annotated in databases. Bioinformatics analysis indicated that major transcription factor (TF) families linked to stress responses and growth regulation (MYB, bHLH, WRKY) were differentially expressed in root tips of bermudagrass under salinity. In addition, genes related to cell wall loosening and stiffening (xyloglucan endotransglucosylase/hydrolases, peroxidases) were identified. RNA-seq analysis identified candidate genes encoding TFs involved in the regulation of lignin synthesis, reactive oxygen species (ROS) homeostasis controlled by peroxidases, and the regulation of phytohormone signaling that promote cell wall loosening and therefore root growth under salinity.

  15. Linnorm: improved statistical analysis for single cell RNA-seq expression data.

    Science.gov (United States)

    Yip, Shun H; Wang, Panwen; Kocher, Jean-Pierre A; Sham, Pak Chung; Wang, Junwen

    2017-12-15

    Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Guidance for RNA-seq co-expression network construction and analysis: safety in numbers.

    Science.gov (United States)

    Ballouz, S; Verleyen, W; Gillis, J

    2015-07-01

    RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks. We examine RNA-seq co-expression data generated from 1970 RNA-seq samples using a Guilt-By-Association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were >20 samples with read depth >10 M per sample. While the aggregate network constructed shows good performance (area under the receiver operator characteristic curve ∼0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples are required to obtain 'gold-standard' co-expression. We find a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology. jgillis@cshl.edu or sballouz@cshl.edu Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/ and supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Nuclear RNA sequencing of the mouse erythroid cell transcriptome.

    Directory of Open Access Journals (Sweden)

    Jennifer A Mitchell

    Full Text Available In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.

  18. Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

    Science.gov (United States)

    Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent

    2017-01-01

    The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when

  19. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    Science.gov (United States)

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  20. The nuclear receptor ERβ engages AGO2 in regulation of gene transcription, RNA splicing and RISC loading.

    Science.gov (United States)

    Tarallo, Roberta; Giurato, Giorgio; Bruno, Giuseppina; Ravo, Maria; Rizzo, Francesca; Salvati, Annamaria; Ricciardi, Luca; Marchese, Giovanna; Cordella, Angela; Rocco, Teresa; Gigantino, Valerio; Pierri, Biancamaria; Cimmino, Giovanni; Milanesi, Luciano; Ambrosino, Concetta; Nyman, Tuula A; Nassa, Giovanni; Weisz, Alessandro

    2017-10-06

    The RNA-binding protein Argonaute 2 (AGO2) is a key effector of RNA-silencing pathways It exerts a pivotal role in microRNA maturation and activity and can modulate chromatin remodeling, transcriptional gene regulation and RNA splicing. Estrogen receptor beta (ERβ) is endowed with oncosuppressive activities, antagonizing hormone-induced carcinogenesis and inhibiting growth and oncogenic functions in luminal-like breast cancers (BCs), where its expression correlates with a better prognosis of the disease. Applying interaction proteomics coupled to mass spectrometry to characterize nuclear factors cooperating with ERβ in gene regulation, we identify AGO2 as a novel partner of ERβ in human BC cells. ERβ-AGO2 association was confirmed in vitro and in vivo in both the nucleus and cytoplasm and is shown to be RNA-mediated. ChIP-Seq demonstrates AGO2 association with a large number of ERβ binding sites, and total and nascent RNA-Seq in ERβ + vs ERβ - cells, and before and after AGO2 knock-down in ERβ + cells, reveals a widespread involvement of this factor in ERβ-mediated regulation of gene transcription rate and RNA splicing. Moreover, isolation and sequencing by RIP-Seq of ERβ-associated long and small RNAs in the cytoplasm suggests involvement of the nuclear receptor in RISC loading, indicating that it may also be able to directly control mRNA translation efficiency and stability. These results demonstrate that AGO2 can act as a pleiotropic functional partner of ERβ, indicating that both factors are endowed with multiple roles in the control of key cellular functions.

  1. An improved ChIP-seq peak detection system for simultaneously identifying post-translational modified transcription factors by combinatorial fusion, using SUMOylation as an example.

    Science.gov (United States)

    Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching

    2014-01-01

    Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq

  2. An RNA-seq transcriptome analysis of histone modifiers and RNA silencing genes in soybean during floral initiation process.

    Directory of Open Access Journals (Sweden)

    Lim Chee Liew

    Full Text Available Epigenetics has been recognised to play vital roles in many plant developmental processes, including floral initiation through the epigenetic regulation of gene expression. The histone modifying proteins that mediate these modifications involve the SET domain-containing histone methyltransferases, JmjC domain-containing demethylase, acetylases and deacetylases. In addition, RNA interference (RNAi-associated genes are also involved in epigenetic regulation via RNA-directed DNA methylation and post-transcriptional gene silencing. Soybean, a major crop legume, requires a short day to induce flowering. How histone modifications regulate the plant response to external cues that initiate flowering is still largely unknown. Here, we used RNA-seq to address the dynamics of transcripts that are potentially involved in the epigenetic programming and RNAi mediated gene silencing during the floral initiation of soybean. Soybean is a paleopolyploid that has been subjected to at least two rounds of whole genome duplication events. We report that the expanded genomic repertoire of histone modifiers and RNA silencing genes in soybean includes 14 histone acetyltransferases, 24 histone deacetylases, 47 histone methyltransferases, 15 protein arginine methyltransferases, 24 JmjC domain-containing demethylases and 47 RNAi-associated genes. To investigate the role of these histone modifiers and RNA silencing genes during floral initiation, we compared the transcriptional dynamics of the leaf and shoot apical meristem at different time points after a short-day treatment. Our data reveal that the extensive activation of genes that are usually involved in the epigenetic programming and RNAi gene silencing in the soybean shoot apical meristem are reprogrammed for floral development following an exposure to inductive conditions.

  3. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data

    Directory of Open Access Journals (Sweden)

    Gokmen Zararsiz

    2017-10-01

    Full Text Available RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom extensions of the nearest shrunken centroids (NSC and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom’s precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  4. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.

    Science.gov (United States)

    Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet

    2017-01-01

    RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  5. Defining the maize transcriptome de novo using deep RNA-Seq

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

    2011-06-01

    De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.

  6. Defining the maize transcriptome de novo using deep RNA-Seq

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

    2011-06-02

    De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.

  7. Transcriptional profiling of endocrine cerebro-osteodysplasia using microarray and next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Piya Lahiry

    Full Text Available BACKGROUND: Transcriptome profiling of patterns of RNA expression is a powerful approach to identify networks of genes that play a role in disease. To date, most mRNA profiling of tissues has been accomplished using microarrays, but next-generation sequencing can offer a richer and more comprehensive picture. METHODOLOGY/PRINCIPAL FINDINGS: ECO is a rare multi-system developmental disorder caused by a homozygous mutation in ICK encoding intestinal cell kinase. We performed gene expression profiling using both cDNA microarrays and next-generation mRNA sequencing (mRNA-seq of skin fibroblasts from ECO-affected subjects. We then validated a subset of differentially expressed transcripts identified by each method using quantitative reverse transcription-polymerase chain reaction (qRT-PCR. Finally, we used gene ontology (GO to identify critical pathways and processes that were abnormal according to each technical platform. Methodologically, mRNA-seq identifies a much larger number of differentially expressed genes with much better correlation to qRT-PCR results than the microarray (r² = 0.794 and 0.137, respectively. Biologically, cDNA microarray identified functional pathways focused on anatomical structure and development, while the mRNA-seq platform identified a higher proportion of genes involved in cell division and DNA replication pathways. CONCLUSIONS/SIGNIFICANCE: Transcriptome profiling with mRNA-seq had greater sensitivity, range and accuracy than the microarray. The two platforms generated different but complementary hypotheses for further evaluation.

  8. Mining RNA-seq data for infections and contaminations.

    Directory of Open Access Journals (Sweden)

    Thomas Bonfert

    Full Text Available RNA sequencing (RNA-seq provides novel opportunities for transcriptomic studies at nucleotide resolution, including transcriptomics of viruses or microbes infecting a cell. However, standard approaches for mapping the resulting sequencing reads generally ignore alternative sources of expression other than the host cell and are little equipped to address the problems arising from redundancies and gaps among sequenced microbe and virus genomes. We show that screening of sequencing reads for contaminations and infections can be performed easily using ContextMap, our recently developed mapping software. Based on mapping-derived statistics, mapping confidence, similarities and misidentifications (e.g. due to missing genome sequences of species/strains can be assessed. Performance of our approach is evaluated on three real-life sequencing data sets and compared to state-of-the-art metagenomics tools. In particular, ContextMap vastly outperformed GASiC and GRAMMy in terms of runtime. In contrast to MEGAN4, it was capable of providing individual read mappings to species and resolving non-unique mappings, thus allowing the identification of misalignments caused by sequence similarities between genomes and missing genome sequences. Our study illustrates the importance and potentials of routinely mining RNA-seq experiments for infections or contaminations by microbes and viruses. By using ContextMap, gene expression of infecting agents can be analyzed and novel insights in infection processes and tumorigenesis can be obtained.

  9. Directional RNA deep sequencing sheds new light on the transcriptional response of Anabaena sp. strain PCC 7120 to combined-nitrogen deprivation

    Directory of Open Access Journals (Sweden)

    Head Steven R

    2011-06-01

    Full Text Available Abstract Background Cyanobacteria are potential sources of renewable chemicals and biofuels and serve as model organisms for bacterial photosynthesis, nitrogen fixation, and responses to environmental changes. Anabaena (Nostoc sp. strain PCC 7120 (hereafter Anabaena is a multicellular filamentous cyanobacterium that can "fix" atmospheric nitrogen into ammonia when grown in the absence of a source of combined nitrogen. Because the nitrogenase enzyme is oxygen sensitive, Anabaena forms specialized cells called heterocysts that create a microoxic environment for nitrogen fixation. We have employed directional RNA-seq to map the Anabaena transcriptome during vegetative cell growth and in response to combined-nitrogen deprivation, which induces filaments to undergo heterocyst development. Our data provide an unprecedented view of transcriptional changes in Anabaena filaments during the induction of heterocyst development and transition to diazotrophic growth. Results Using the Illumina short read platform and a directional RNA-seq protocol, we obtained deep sequencing data for RNA extracted from filaments at 0, 6, 12, and 21 hours after the removal of combined nitrogen. The RNA-seq data provided information on transcript abundance and boundaries for the entire transcriptome. From these data, we detected novel antisense transcripts within the UTRs (untranslated regions and coding regions of key genes involved in heterocyst development, suggesting that antisense RNAs may be important regulators of the nitrogen response. In addition, many 5' UTRs were longer than anticipated, sometimes extending into upstream open reading frames (ORFs, and operons often showed complex structure and regulation. Finally, many genes that had not been previously identified as being involved in heterocyst development showed regulation, providing new candidates for future studies in this model organism. Conclusions Directional RNA-seq data were obtained that provide

  10. Genome-wide mRNA processing in methanogenic archaea reveals post-transcriptional regulation of ribosomal protein synthesis.

    Science.gov (United States)

    Qi, Lei; Yue, Lei; Feng, Deqin; Qi, Fengxia; Li, Jie; Dong, Xiuzhu

    2017-07-07

    Unlike stable RNAs that require processing for maturation, prokaryotic cellular mRNAs generally follow an 'all-or-none' pattern. Herein, we used a 5΄ monophosphate transcript sequencing (5΄P-seq) that specifically captured the 5΄-end of processed transcripts and mapped the genome-wide RNA processing sites (PSSs) in a methanogenic archaeon. Following statistical analysis and stringent filtration, we identified 1429 PSSs, among which 23.5% and 5.4% were located in 5΄ untranslated region (uPSS) and intergenic region (iPSS), respectively. A predominant uridine downstream PSSs served as a processing signature. Remarkably, 5΄P-seq detected overrepresented uPSS and iPSS in the polycistronic operons encoding ribosomal proteins, and the majority upstream and proximal ribosome binding sites, suggesting a regulatory role of processing on translation initiation. The processed transcripts showed increased stability and translation efficiency. Particularly, processing within the tricistronic transcript of rplA-rplJ-rplL enhanced the translation of rplL, which can provide a driving force for the 1:4 stoichiometry of L10 to L12 in the ribosome. Growth-associated mRNA processing intensities were also correlated with the cellular ribosomal protein levels, thereby suggesting that mRNA processing is involved in tuning growth-dependent ribosome synthesis. In conclusion, our findings suggest that mRNA processing-mediated post-transcriptional regulation is a potential mechanism of ribosomal protein synthesis and stoichiometry. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    Science.gov (United States)

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  12. RNA-Seq Based Transcriptome Analysis of the Type I Interferon Host Response upon Vaccinia Virus Infection of Mouse Cells

    Directory of Open Access Journals (Sweden)

    Bruno Hernáez

    2017-01-01

    Full Text Available Vaccinia virus (VACV encodes the soluble type I interferon (IFN binding protein B18 that is secreted from infected cells and also attaches to the cell surface, as an immunomodulatory strategy to inhibit the host IFN response. By using next generation sequencing technologies, we performed a detailed RNA-seq study to dissect at the transcriptional level the modulation of the IFN based host response by VACV and B18. Transcriptome profiling of L929 cells after incubation with purified recombinant B18 protein showed that attachment of B18 to the cell surface does not trigger cell signalling leading to transcriptional activation. Consistent with its ability to bind type I IFN, B18 completely inhibited the IFN-mediated modulation of host gene expression. Addition of UV-inactivated virus particles to cell cultures altered the expression of a set of 53 cellular genes, including genes involved in innate immunity. Differential gene expression analyses of cells infected with replication competent VACV identified the activation of a broad range of host genes involved in multiple cellular pathways. Interestingly, we did not detect an IFN-mediated response among the transcriptional changes induced by VACV, even after the addition of IFN to cells infected with a mutant VACV lacking B18. This is consistent with additional viral mechanisms acting at different levels to block IFN responses during VACV infection.

  13. Profiling microRNA expression in bovine alveolar macrophages using RNA-seq.

    Science.gov (United States)

    Vegh, Peter; Foroushani, Amir B K; Magee, David A; McCabe, Matthew S; Browne, John A; Nalpas, Nicolas C; Conlon, Kevin M; Gordon, Stephen V; Bradley, Daniel G; MacHugh, David E; Lynn, David J

    2013-10-01

    MicroRNAs (miRNAs) are important regulators of gene expression and are known to play a key role in regulating both adaptive and innate immunity. Bovine alveolar macrophages (BAMs) help maintain lung homeostasis and constitute the front line of host defense against several infectious respiratory diseases, such as bovine tuberculosis. Little is known, however, about the role miRNAs play in these cells. In this study, we used a high-throughput sequencing approach, RNA-seq, to determine the expression levels of known and novel miRNAs in unchallenged BAMs isolated from lung lavages of eight different healthy Holstein-Friesian male calves. Approximately 80 million sequence reads were generated from eight BAM miRNA Illumina sequencing libraries, and 80 miRNAs were identified as being expressed in BAMs at a threshold of at least 100 reads per million (RPM). The expression levels of miRNAs varied over a large dynamic range, with a few miRNAs expressed at very high levels (up to 800,000RPM), and the majority lowly expressed. Notably, many of the most highly expressed miRNAs in BAMs have known roles in regulating immunity in other species (e.g. bta-let-7i, bta-miR-21, bta-miR-27, bta-miR-99b, bta-miR-146, bta-miR-147, bta-miR-155 and bta-miR-223). The most highly expressed miRNA in BAMs was miR-21, which has been shown to regulate the expression of antimicrobial peptides in Mycobacterium leprae-infected human monocytes. Furthermore, the predicted target genes of BAM-expressed miRNAs were found to be statistically enriched for roles in innate immunity. In addition to profiling the expression of known miRNAs, the RNA-seq data was also analysed to identify potentially novel bovine miRNAs. One putatively novel bovine miRNA was identified. To the best of our knowledge, this is the first RNA-seq study to profile miRNA expression in BAMs and provides an important reference dataset for investigating the regulatory roles miRNAs play in this important immune cell type. Copyright

  14. Pervasive, Genome-Wide Transcription in the Organelle Genomes of Diverse Plastid-Bearing Protists

    Directory of Open Access Journals (Sweden)

    Matheus Sanitá Lima

    2017-11-01

    Full Text Available Organelle genomes are among the most sequenced kinds of chromosome. This is largely because they are small and widely used in molecular studies, but also because next-generation sequencing technologies made sequencing easier, faster, and cheaper. However, studies of organelle RNA have not kept pace with those of DNA, despite huge amounts of freely available eukaryotic RNA-sequencing (RNA-seq data. Little is known about organelle transcription in nonmodel species, and most of the available eukaryotic RNA-seq data have not been mined for organelle transcripts. Here, we use publicly available RNA-seq experiments to investigate organelle transcription in 30 diverse plastid-bearing protists with varying organelle genomic architectures. Mapping RNA-seq data to organelle genomes revealed pervasive, genome-wide transcription, regardless of the taxonomic grouping, gene organization, or noncoding content. For every species analyzed, transcripts covered ≥85% of the mitochondrial and/or plastid genomes (all of which were ≤105 kb, indicating that most of the organelle DNA—coding and noncoding—is transcriptionally active. These results follow earlier studies of model species showing that organellar transcription is coupled and ubiquitous across the genome, requiring significant downstream processing of polycistronic transcripts. Our findings suggest that noncoding organelle DNA can be transcriptionally active, raising questions about the underlying function of these transcripts and underscoring the utility of publicly available RNA-seq data for recovering complete genome sequences. If pervasive transcription is also found in bigger organelle genomes (>105 kb and across a broader range of eukaryotes, this could indicate that noncoding organelle RNAs are regulating fundamental processes within eukaryotic cells.

  15. Statistical modeling of isoform splicing dynamics from RNA-seq time series data.

    Science.gov (United States)

    Huang, Yuanhua; Sanguinetti, Guido

    2016-10-01

    Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here, we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the correlations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real datasets, our results show that DICEseq provides substantially more reproducible and robust quantifications, increasing the correlation of estimates from replicate datasets by up to 10% on genes with low or moderate expression levels (bottom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq experiments, and offer a novel tool for improved analysis of such datasets. Python code is freely available at http://diceseq.sf.net G.Sanguinetti@ed.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform.

    Science.gov (United States)

    Borrill, Philippa; Ramirez-Gonzalez, Ricardo; Uauy, Cristobal

    2016-04-01

    The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP's suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments. © 2016 American Society of Plant Biologists. All Rights Reserved.

  17. Reverse transcription-quantitative polymerase chain reaction: description of a RIN-based algorithm for accurate data normalization

    Directory of Open Access Journals (Sweden)

    Boissière-Michot Florence

    2009-04-01

    Full Text Available Abstract Background Reverse transcription-quantitative polymerase chain reaction (RT-qPCR is the gold standard technique for mRNA quantification, but appropriate normalization is required to obtain reliable data. Normalization to accurately quantitated RNA has been proposed as the most reliable method for in vivo biopsies. However, this approach does not correct differences in RNA integrity. Results In this study, we evaluated the effect of RNA degradation on the quantification of the relative expression of nine genes (18S, ACTB, ATUB, B2M, GAPDH, HPRT, POLR2L, PSMB6 and RPLP0 that cover a wide expression spectrum. Our results show that RNA degradation could introduce up to 100% error in gene expression measurements when RT-qPCR data were normalized to total RNA. To achieve greater resolution of small differences in transcript levels in degraded samples, we improved this normalization method by developing a corrective algorithm that compensates for the loss of RNA integrity. This approach allowed us to achieve higher accuracy, since the average error for quantitative measurements was reduced to 8%. Finally, we applied our normalization strategy to the quantification of EGFR, HER2 and HER3 in 104 rectal cancer biopsies. Taken together, our data show that normalization of gene expression measurements by taking into account also RNA degradation allows much more reliable sample comparison. Conclusion We developed a new normalization method of RT-qPCR data that compensates for loss of RNA integrity and therefore allows accurate gene expression quantification in human biopsies.

  18. miRNA Enriched in Human Neuroblast Nuclei Bind the MAZ Transcription Factor and Their Precursors Contain the MAZ Consensus Motif.

    Science.gov (United States)

    Goldie, Belinda J; Fitzsimmons, Chantel; Weidenhofer, Judith; Atkins, Joshua R; Wang, Dan O; Cairns, Murray J

    2017-01-01

    While the cytoplasmic function of microRNA (miRNA) as post-transcriptional regulators of mRNA has been the subject of significant research effort, their activity in the nucleus is less well characterized. Here we use a human neuronal cell model to show that some mature miRNA are preferentially enriched in the nucleus. These molecules were predominantly primate-specific and contained a sequence motif with homology to the consensus MAZ transcription factor binding element. Precursor miRNA containing this motif were shown to have affinity for MAZ protein in nuclear extract. We then used Ago1/2 RIP-Seq to explore nuclear miRNA-associated mRNA targets. Interestingly, the genes for Ago2-associated transcripts were also significantly enriched with MAZ binding sites and neural function, whereas Ago1-transcripts were associated with general metabolic processes and localized with SC35 spliceosomes. These findings suggest the MAZ transcription factor is associated with miRNA in the nucleus and may influence the regulation of neuronal development through Ago2-associated miRNA induced silencing complexes. The MAZ transcription factor may therefore be important for organizing higher order integration of transcriptional and post-transcriptional processes in primate neurons.

  19. Single-Cell RNA-Seq Reveals Transcriptional Heterogeneity in Latent and Reactivated HIV-Infected Cells.

    Science.gov (United States)

    Golumbeanu, Monica; Cristinelli, Sara; Rato, Sylvie; Munoz, Miguel; Cavassini, Matthias; Beerenwinkel, Niko; Ciuffi, Angela

    2018-04-24

    Despite effective treatment, HIV can persist in latent reservoirs, which represent a major obstacle toward HIV eradication. Targeting and reactivating latent cells is challenging due to the heterogeneous nature of HIV-infected cells. Here, we used a primary model of HIV latency and single-cell RNA sequencing to characterize transcriptional heterogeneity during HIV latency and reactivation. Our analysis identified transcriptional programs leading to successful reactivation of HIV expression. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  20. Small RNA-Seq analysis reveals microRNA-regulation of the Imd pathway during Escherichia coli infection in Drosophila.

    Science.gov (United States)

    Li, Shengjie; Shen, Li; Sun, Lianjie; Xu, Jiao; Jin, Ping; Chen, Liming; Ma, Fei

    2017-05-01

    Drosophila have served as a model for research on innate immunity for decades. However, knowledge of the post-transcriptional regulation of immune gene expression by microRNAs (miRNAs) remains rudimentary. In the present study, using small RNA-seq and bioinformatics analysis, we identified 67 differentially expressed miRNAs in Drosophila infected with Escherichia coli compared to injured flies at three time-points. Furthermore, we found that 21 of these miRNAs were potentially involved in the regulation of Imd pathway-related genes. Strikingly, based on UAS-miRNAs line screening and Dual-luciferase assay, we identified that miR-9a and miR-981 could both negatively regulate Drosophila antibacterial defenses and decrease the level of the antibacterial peptide, Diptericin. Taken together, these data support the involvement of miRNAs in the regulation of the Drosophila Imd pathway. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Identification of reference genes for quantitative expression analysis using large-scale RNA-seq data of Arabidopsis thaliana and model crop plants.

    Science.gov (United States)

    Kudo, Toru; Sasaki, Yohei; Terashima, Shin; Matsuda-Imai, Noriko; Takano, Tomoyuki; Saito, Misa; Kanno, Maasa; Ozaki, Soichi; Suwabe, Keita; Suzuki, Go; Watanabe, Masao; Matsuoka, Makoto; Takayama, Seiji; Yano, Kentaro

    2016-10-13

    In quantitative gene expression analysis, normalization using a reference gene as an internal control is frequently performed for appropriate interpretation of the results. Efforts have been devoted to exploring superior novel reference genes using microarray transcriptomic data and to evaluating commonly used reference genes by targeting analysis. However, because the number of specifically detectable genes is totally dependent on probe design in the microarray analysis, exploration using microarray data may miss some of the best choices for the reference genes. Recently emerging RNA sequencing (RNA-seq) provides an ideal resource for comprehensive exploration of reference genes since this method is capable of detecting all expressed genes, in principle including even unknown genes. We report the results of a comprehensive exploration of reference genes using public RNA-seq data from plants such as Arabidopsis thaliana (Arabidopsis), Glycine max (soybean), Solanum lycopersicum (tomato) and Oryza sativa (rice). To select reference genes suitable for the broadest experimental conditions possible, candidates were surveyed by the following four steps: (1) evaluation of the basal expression level of each gene in each experiment; (2) evaluation of the expression stability of each gene in each experiment; (3) evaluation of the expression stability of each gene across the experiments; and (4) selection of top-ranked genes, after ranking according to the number of experiments in which the gene was expressed stably. Employing this procedure, 13, 10, 12 and 21 top candidates for reference genes were proposed in Arabidopsis, soybean, tomato and rice, respectively. Microarray expression data confirmed that the expression of the proposed reference genes under broad experimental conditions was more stable than that of commonly used reference genes. These novel reference genes will be useful for analyzing gene expression profiles across experiments carried out under various

  2. ISVASE: identification of sequence variant associated with splicing event using RNA-seq data.

    Science.gov (United States)

    Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Yu, Jun; Hu, Songnian

    2017-06-28

    Exon recognition and splicing precisely and efficiently by spliceosome is the key to generate mature mRNAs. About one third or a half of disease-related mutations affect RNA splicing. Software PVAAS has been developed to identify variants associated with aberrant splicing by directly using RNA-seq data. However, it bases on the assumption that annotated splicing site is normal splicing, which is not true in fact. We develop the ISVASE, a tool for specifically identifying sequence variants associated with splicing events (SVASE) by using RNA-seq data. Comparing with PVAAS, our tool has several advantages, such as multi-pass stringent rule-dependent filters and statistical filters, only using split-reads, independent sequence variant identification in each part of splicing (junction), sequence variant detection for both of known and novel splicing event, additional exon-exon junction shift event detection if known splicing events provided, splicing signal evaluation, known DNA mutation and/or RNA editing data supported, higher precision and consistency, and short running time. Using a realistic RNA-seq dataset, we performed a case study to illustrate the functionality and effectiveness of our method. Moreover, the output of SVASEs can be used for downstream analysis such as splicing regulatory element study and sequence variant functional analysis. ISVASE is useful for researchers interested in sequence variants (DNA mutation and/or RNA editing) associated with splicing events. The package is freely available at https://sourceforge.net/projects/isvase/ .

  3. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems

    Science.gov (United States)

    2011-01-01

    Background Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. Results Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. Conclusions Our results demonstrate that RNA-Seq can be

  4. Pervasive, Genome-Wide Transcription in the Organelle Genomes of Diverse Plastid-Bearing Protists.

    Science.gov (United States)

    Sanitá Lima, Matheus; Smith, David Roy

    2017-11-06

    Organelle genomes are among the most sequenced kinds of chromosome. This is largely because they are small and widely used in molecular studies, but also because next-generation sequencing technologies made sequencing easier, faster, and cheaper. However, studies of organelle RNA have not kept pace with those of DNA, despite huge amounts of freely available eukaryotic RNA-sequencing (RNA-seq) data. Little is known about organelle transcription in nonmodel species, and most of the available eukaryotic RNA-seq data have not been mined for organelle transcripts. Here, we use publicly available RNA-seq experiments to investigate organelle transcription in 30 diverse plastid-bearing protists with varying organelle genomic architectures. Mapping RNA-seq data to organelle genomes revealed pervasive, genome-wide transcription, regardless of the taxonomic grouping, gene organization, or noncoding content. For every species analyzed, transcripts covered ≥85% of the mitochondrial and/or plastid genomes (all of which were ≤105 kb), indicating that most of the organelle DNA-coding and noncoding-is transcriptionally active. These results follow earlier studies of model species showing that organellar transcription is coupled and ubiquitous across the genome, requiring significant downstream processing of polycistronic transcripts. Our findings suggest that noncoding organelle DNA can be transcriptionally active, raising questions about the underlying function of these transcripts and underscoring the utility of publicly available RNA-seq data for recovering complete genome sequences. If pervasive transcription is also found in bigger organelle genomes (>105 kb) and across a broader range of eukaryotes, this could indicate that noncoding organelle RNAs are regulating fundamental processes within eukaryotic cells. Copyright © 2017 Sanitá Lima and Smith.

  5. Combined analysis of mRNA and miRNA identifies dehydration and salinity responsive key molecular players in citrus roots.

    Science.gov (United States)

    Xie, Rangjin; Zhang, Jin; Ma, Yanyan; Pan, Xiaoting; Dong, Cuicui; Pang, Shaoping; He, Shaolan; Deng, Lie; Yi, Shilai; Zheng, Yongqiang; Lv, Qiang

    2017-02-06

    Citrus is one of the most economically important fruit crops around world. Drought and salinity stresses adversely affected its productivity and fruit quality. However, the genetic regulatory networks and signaling pathways involved in drought and salinity remain to be elucidated. With RNA-seq and sRNA-seq, an integrative analysis of miRNA and mRNA expression profiling and their regulatory networks were conducted using citrus roots subjected to dehydration and salt treatment. Differentially expressed (DE) mRNA and miRNA profiles were obtained according to fold change analysis and the relationships between miRNAs and target mRNAs were found to be coherent and incoherent in the regulatory networks. GO enrichment analysis revealed that some crucial biological processes related to signal transduction (e.g. 'MAPK cascade'), hormone-mediated signaling pathways (e.g. abscisic acid- activated signaling pathway'), reactive oxygen species (ROS) metabolic process (e.g. 'hydrogen peroxide catabolic process') and transcription factors (e.g., 'MYB, ZFP and bZIP') were involved in dehydration and/or salt treatment. The molecular players in response to dehydration and salt treatment were partially overlapping. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-seq and sRNA-seq analysis. This study provides new insights into the molecular mechanisms how citrus roots respond to dehydration and salt treatment.

  6. Dissecting the expression relationships between RNA-binding proteins and their cognate targets in eukaryotic post-transcriptional regulatory networks

    Science.gov (United States)

    Nishtala, Sneha; Neelamraju, Yaseswini; Janga, Sarath Chandra

    2016-05-01

    RNA-binding proteins (RBPs) are pivotal in orchestrating several steps in the metabolism of RNA in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Here, we employed CLIP (cross-linking immunoprecipitation)-seq datasets for 60 human RBPs and RIP-ChIP (RNP immunoprecipitation-microarray) data for 69 yeast RBPs to construct a network of genome-wide RBP- target RNA interactions for each RBP. We show in humans that majority (~78%) of the RBPs are strongly associated with their target transcripts at transcript level while ~95% of the studied RBPs were also found to be strongly associated with expression levels of target transcripts when protein expression levels of RBPs were employed. At transcript level, RBP - RNA interaction data for the yeast genome, exhibited a strong association for 63% of the RBPs, confirming the association to be conserved across large phylogenetic distances. Analysis to uncover the features contributing to these associations revealed the number of target transcripts and length of the selected protein-coding transcript of an RBP at the transcript level while intensity of the CLIP signal, number of RNA-Binding domains, location of the binding site on the transcript, to be significant at the protein level. Our analysis will contribute to improved modelling and prediction of post-transcriptional networks.

  7. Relationships within Cladobranchia (Gastropoda: Nudibranchia) based on RNA-Seq data: an initial investigation.

    Science.gov (United States)

    Goodheart, Jessica A; Bazinet, Adam L; Collins, Allen G; Cummings, Michael P

    2015-09-01

    Cladobranchia (Gastropoda: Nudibranchia) is a diverse (approx. 1000 species) but understudied group of sea slug molluscs. In order to fully comprehend the diversity of nudibranchs and the evolution of character traits within Cladobranchia, a solid understanding of evolutionary relationships is necessary. To date, only two direct attempts have been made to understand the evolutionary relationships within Cladobranchia, neither of which resulted in well-supported phylogenetic hypotheses. In addition to these studies, several others have addressed some of the relationships within this clade while investigating the evolutionary history of more inclusive groups (Nudibranchia and Euthyneura). However, all of the resulting phylogenetic hypotheses contain conflicting topologies within Cladobranchia. In this study, we address some of these long-standing issues regarding the evolutionary history of Cladobranchia using RNA-Seq data (transcriptomes). We sequenced 16 transcriptomes and combined these with four transcriptomes from the NCBI Sequence Read Archive. Transcript assembly using Trinity and orthology determination using HaMStR yielded 839 orthologous groups for analysis. These data provide a well-supported and almost fully resolved phylogenetic hypothesis for Cladobranchia. Our results support the monophyly of Cladobranchia and the sub-clade Aeolidida, but reject the monophyly of Dendronotida.

  8. RNA-Seq of human neurons derived from iPS cells reveals candidate long non-coding RNAs involved in neurogenesis and neuropsychiatric disorders.

    Directory of Open Access Journals (Sweden)

    Mingyan Lin

    Full Text Available Genome-wide expression analysis using next generation sequencing (RNA-Seq provides an opportunity for in-depth molecular profiling of fundamental biological processes, such as cellular differentiation and malignant transformation. Differentiating human neurons derived from induced pluripotent stem cells (iPSCs provide an ideal system for RNA-Seq since defective neurogenesis caused by abnormalities in transcription factors, DNA methylation, and chromatin modifiers lie at the heart of some neuropsychiatric disorders. As a preliminary step towards applying next generation sequencing using neurons derived from patient-specific iPSCs, we have carried out an RNA-Seq analysis on control human neurons. Dramatic changes in the expression of coding genes, long non-coding RNAs (lncRNAs, pseudogenes, and splice isoforms were seen during the transition from pluripotent stem cells to early differentiating neurons. A number of genes that undergo radical changes in expression during this transition include candidates for schizophrenia (SZ, bipolar disorder (BD and autism spectrum disorders (ASD that function as transcription factors and chromatin modifiers, such as POU3F2 and ZNF804A, and genes coding for cell adhesion proteins implicated in these conditions including NRXN1 and NLGN1. In addition, a number of novel lncRNAs were found to undergo dramatic changes in expression, one of which is HOTAIRM1, a regulator of several HOXA genes during myelopoiesis. The increase we observed in differentiating neurons suggests a role in neurogenesis as well. Finally, several lncRNAs that map near SNPs associated with SZ in genome wide association studies also increase during neuronal differentiation, suggesting that these novel transcripts may be abnormally regulated in a subgroup of patients.

  9. The Mechanisms of Maize Resistance to Fusarium verticillioides by comprehensive analysis of RNA-seq Data

    Directory of Open Access Journals (Sweden)

    Yanping Wang

    2016-11-01

    Full Text Available Fusarium verticillioides is the most commonly reported fungal species responsible for ear rot of maize which substantially reduces grain yield. It also results in a substantial accumulation of mycotoxins that give rise to toxic response when ingested by animals and humans. For inefficient control by chemical and agronomic measures, it thus becomes more desirable to select more resistant varieties. However, the molecular mechanisms underlying the infection process remain poorly understood, which hampers the application of quantitative resistance in breeding programs. Here, we reveal the disease-resistance mechanism of the maize inbred line of BT-1 which displays high resistance to ear rot using RNA high throughput sequencing. By analyzing RNA-seq data from the BT-1 kernels before and after F. verticillioides inoculation, we found that transcript levels of genes associated with key pathways are dramatically changed compared with the control treatment. Differential gene expression in ear rot resistant and susceptible maize was confirmed by RNA microarray and qRT-PCR analyses. Further investigation suggests that the small heat shock protein family, some secondary metabolites, and the signaling pathways of abscisic acid (ABA, jasmonic acid (JA or salicylic acids (SA may be involved in the pathogen-associated molecular pattern-triggered immunity against F. verticillioides. These data will not only provide new insights into the molecular resistant mechanisms against fungi invading, but may also result in the identification of key molecular factors associated with ear rot resistance in maize.

  10. Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method.

    Science.gov (United States)

    Honda, Shozo; Morichika, Keisuke; Kirino, Yohei

    2016-03-01

    RNA digestions catalyzed by many ribonucleases generate RNA fragments that contain a 2',3'-cyclic phosphate (cP) at their 3' termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named cP-RNA-seq that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3' termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment; hence, subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ∼6 d, excluding the time required for sequencing and bioinformatics analyses, which are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ∼3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5'-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes.

  11. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

    Science.gov (United States)

    Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz

    2011-07-01

    Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

  12. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

    Directory of Open Access Journals (Sweden)

    Dongjun Chung

    2011-07-01

    Full Text Available Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads. This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads. Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

  13. Study on the regulatory mechanism of the lipid metabolism pathways during chicken male germ cell differentiation based on RNA-seq.

    Science.gov (United States)

    Zuo, Qisheng; Li, Dong; Zhang, Lei; Elsayed, Ahmed Kamel; Lian, Chao; Shi, Qingqing; Zhang, Zhentao; Zhu, Rui; Wang, Yinjie; Jin, Kai; Zhang, Yani; Li, Bichun

    2015-01-01

    Here, we explore the regulatory mechanism of lipid metabolic signaling pathways and related genes during differentiation of male germ cells in chickens, with the hope that better understanding of these pathways may improve in vitro induction. Fluorescence-activated cell sorting was used to obtain highly purified cultures of embryonic stem cells (ESCs), primitive germ cells (PGCs), and spermatogonial stem cells (SSCs). The total RNA was then extracted from each type of cell. High-throughput analysis methods (RNA-seq) were used to sequence the transcriptome of these cells. Gene Ontology (GO) analysis and the KEGG database were used to identify lipid metabolism pathways and related genes. Retinoic acid (RA), the end-product of the retinol metabolism pathway, induced in vitro differentiation of ESC into male germ cells. Quantitative real-time PCR (qRT-PCR) was used to detect changes in the expression of the genes involved in the retinol metabolic pathways. From the results of RNA-seq and the database analyses, we concluded that there are 328 genes in 27 lipid metabolic pathways continuously involved in lipid metabolism during the differentiation of ESC into SSC in vivo, including retinol metabolism. Alcohol dehydrogenase 5 (ADH5) and aldehyde dehydrogenase 1 family member A1 (ALDH1A1) are involved in RA synthesis in the cell. ADH5 was specifically expressed in PGC in our experiments and aldehyde dehydrogenase 1 family member A1 (ALDH1A1) persistently increased throughout development. CYP26b1, a member of the cytochrome P450 superfamily, is involved in the degradation of RA. Expression of CYP26b1, in contrast, decreased throughout development. Exogenous RA in the culture medium induced differentiation of ESC to SSC-like cells. The expression patterns of ADH5, ALDH1A1, and CYP26b1 were consistent with RNA-seq results. We conclude that the retinol metabolism pathway plays an important role in the process of chicken male germ cell differentiation.

  14. miRNA Enriched in Human Neuroblast Nuclei Bind the MAZ Transcription Factor and Their Precursors Contain the MAZ Consensus Motif

    Directory of Open Access Journals (Sweden)

    Belinda J. Goldie

    2017-08-01

    Full Text Available While the cytoplasmic function of microRNA (miRNA as post-transcriptional regulators of mRNA has been the subject of significant research effort, their activity in the nucleus is less well characterized. Here we use a human neuronal cell model to show that some mature miRNA are preferentially enriched in the nucleus. These molecules were predominantly primate-specific and contained a sequence motif with homology to the consensus MAZ transcription factor binding element. Precursor miRNA containing this motif were shown to have affinity for MAZ protein in nuclear extract. We then used Ago1/2 RIP-Seq to explore nuclear miRNA-associated mRNA targets. Interestingly, the genes for Ago2-associated transcripts were also significantly enriched with MAZ binding sites and neural function, whereas Ago1-transcripts were associated with general metabolic processes and localized with SC35 spliceosomes. These findings suggest the MAZ transcription factor is associated with miRNA in the nucleus and may influence the regulation of neuronal development through Ago2-associated miRNA induced silencing complexes. The MAZ transcription factor may therefore be important for organizing higher order integration of transcriptional and post-transcriptional processes in primate neurons.

  15. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.

    Directory of Open Access Journals (Sweden)

    Chandra Shekhar Pareek

    Full Text Available RNA-seq is a useful next-generation sequencing (NGS technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits.The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM SNP genotyping assay. The

  16. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.

    Science.gov (United States)

    Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Pierzchała, Mariusz; Feng, Yaping; Kadarmideen, Haja N; Kumar, Dibyendu

    2017-01-01

    RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF) and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits. The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel) positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs) with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM) SNP genotyping assay. The comprehensive

  17. A Deconvolution Protocol for ChIP-Seq Reveals Analogous Enhancer Structures on the Mouse and Human Ribosomal RNA Genes

    Directory of Open Access Journals (Sweden)

    Jean-Clement Mars

    2018-01-01

    Full Text Available The combination of Chromatin Immunoprecipitation and Massively Parallel Sequencing, or ChIP-Seq, has greatly advanced our genome-wide understanding of chromatin and enhancer structures. However, its resolution at any given genetic locus is limited by several factors. In applying ChIP-Seq to the study of the ribosomal RNA genes, we found that a major limitation to resolution was imposed by the underlying variability in sequence coverage that very often dominates the protein–DNA interaction profiles. Here, we describe a simple numerical deconvolution approach that, in large part, corrects for this variability, and significantly improves both the resolution and quantitation of protein–DNA interaction maps deduced from ChIP-Seq data. This approach has allowed us to determine the in vivo organization of the RNA polymerase I preinitiation complexes that form at the promoters and enhancers of the mouse (Mus musculus and human (Homo sapiens ribosomal RNA genes, and to reveal a phased binding of the HMG-box factor UBF across the rDNA. The data identify and map a “Spacer Promoter” and associated stalled polymerase in the intergenic spacer of the human ribosomal RNA genes, and reveal a very similar enhancer structure to that found in rodents and lower vertebrates.

  18. Inference of RNA polymerase II transcription dynamics from chromatin immunoprecipitation time course data.

    Directory of Open Access Journals (Sweden)

    Ciira wa Maina

    2014-05-01

    Full Text Available Gene transcription mediated by RNA polymerase II (pol-II is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2. The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ERα and FOXA1 binding in their proximal promoter regions.

  19. smallWig: parallel compression of RNA-seq WIG files.

    Science.gov (United States)

    Wang, Zhiying; Weissman, Tsachy; Milenkovic, Olgica

    2016-01-15

    We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms. The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate

  20. An integrative analysis of DNA methylation and RNA-Seq data for human heart, kidney and liver

    Directory of Open Access Journals (Sweden)

    Xie Linglin

    2011-12-01

    Full Text Available Abstract Background Many groups, including our own, have proposed the use of DNA methylation profiles as biomarkers for various disease states. While much research has been done identifying DNA methylation signatures in cancer vs. normal etc., we still lack sufficient knowledge of the role that differential methylation plays during normal cellular differentiation and tissue specification. We also need thorough, genome level studies to determine the meaning of methylation of individual CpG dinucleotides in terms of gene expression. Results In this study, we have used (insert statistical method here to compile unique DNA methylation signatures from normal human heart, lung, and kidney using the Illumina Infinium 27 K methylation arraysand compared those to gene expression by RNA sequencing. We have identified unique signatures of global DNA methylation for human heart, kidney and liver, and showed that DNA methylation data can be used to correctly classify various tissues. It indicates that DNA methylation reflects tissue specificity and may play an important role in tissue differentiation. The integrative analysis of methylation and RNA-Seq data showed that gene methylation and its transcriptional levels were comprehensively correlated. The location of methylation markers in terms of distance to transcription start site and CpG island showed no effects on the regulation of gene expression by DNA methylation in normal tissues. Conclusions This study showed that an integrative analysis of methylation array and RNA-Seq data can be utilized to discover the global regulation of gene expression by DNA methylation and suggests that DNA methylation plays an important role in normal tissue differentiation via modulation of gene expression.

  1. Getting the most out of RNA-seq data analysis

    Directory of Open Access Journals (Sweden)

    Tsung Fei Khang

    2015-10-01

    Full Text Available Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists.Results. Using two large public RNA-seq data sets—one representing strong, and another mild, biological effect size—we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV, such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods

  2. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    Science.gov (United States)

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  3. Single-Cell RNA-Seq Reveals the Transcriptional Landscape and Heterogeneity of Aortic Macrophages in Murine Atherosclerosis.

    Science.gov (United States)

    Cochain, Clément; Vafadarnejad, Ehsan; Arampatzi, Panagiota; Jaroslav, Pelisek; Winkels, Holger; Ley, Klaus; Wolf, Dennis; Saliba, Antoine-Emmanuel; Zernecke, Alma

    2018-03-15

    Rationale: It is assumed that atherosclerotic arteries contain several macrophage subsets endowed with specific functions. The precise identity of these subsets is poorly characterized as they ha ve been defined by the expression of a restricted number of markers. Objective: We have applied single-cell RNA-seq as an unbiased profiling strategy to interrogate and classify aortic macrophage heterogeneity at the single-cell level in atherosclerosis. Methods and Results: We performed single-cell RNA sequencing of total aortic CD45 + cells extracted from the non-diseased (chow fed) and atherosclerotic (11 weeks of high fat diet) aorta of Ldlr -/- mice. Unsupervised clustering singled out 13 distinct aortic cell clusters. Among the myeloid cell populations, Resident-like macrophages with a gene expression profile similar to aortic resident macrophages were found in healthy and diseased aortae, whereas monocytes, monocyte-derived dendritic cells (MoDC), and two populations of macrophages were almost exclusively detectable in atherosclerotic aortae, comprising Inflammatory macrophages showing enrichment in I l1b , and previously undescribed TREM2 hi macrophages. Differential gene expression and gene ontology enrichment analyses revealed specific gene expression patterns distinguishing these three macrophage subsets and MoDC, and uncovered putative functions of each cell type. Notably, TREM2 hi macrophages appeared to be endowed with specialized functions in lipid metabolism and catabolism, and presented a gene expression signature reminiscent of osteoclasts, suggesting a role in lesion calcification. TREM2 expression was moreover detected in human lesional macrophages. Importantly, these macrophage populations were present also in advanced atherosclerosis and in Apoe -/- aortae, indicating relevance of our findings in different stages of atherosclerosis and mouse models. Conclusions: These data unprecedentedly uncovered the transcriptional landscape and phenotypic

  4. Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.

    Science.gov (United States)

    Bai, Yongsheng; Kinne, Jeff; Donham, Brandon; Jiang, Feng; Ding, Lizhong; Hassler, Justin R; Kaufman, Randal J

    2016-08-22

    Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bioinformatics approaches in conjunction with next-generation technologies. During endoplasmic reticulum (ER) stress, the gene encoding the RNase Ire1α is known to splice out a short 26 nt region from the mRNA of the transcription factor Xbp1 non-canonically within the cytosol. This causes an open reading frame-shift that induces expression of many downstream genes in reaction to ER stress as part of the unfolded protein response (UPR). We previously published an algorithm termed "Read-Split-Walk" (RSW) to identify non-canonical splicing regions using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygote and knockout mouse embryonic fibroblast cell lines. In this study, we have developed an improved algorithm "Read-Split-Run" (RSR) for detecting genome-wide Ire1α-targeted genes with non-canonical spliced regions at a faster speed. We applied the RSR algorithm using different combinations of several parameters to the previously RSW tested mouse embryonic fibroblast cells (MEF) and the human Encyclopedia of DNA Elements (ENCODE) RNA-Seq data. We also compared the performance of RSR with two other alternative splicing events identification tools (TopHat (Trapnell et al., Bioinformatics 25:1105-1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012)) utilizing the context of the spliced Xbp1 mRNA as a positive control in the data sets we identified it to be the top cleavage target present in Ire1α (+/-) but absent in Ire1α (-/-) MEF samples and this comparison was also extended to human ENCODE RNA-Seq data. Proof of principle came in our results by the fact that the 26 nt non-conventional splice site in Xbp1 was detected as the top hit by our new RSR

  5. From root to fruit: RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis may affect tomato fruit metabolism.

    Science.gov (United States)

    Zouari, Inès; Salvioli, Alessandra; Chialva, Matteo; Novero, Mara; Miozzi, Laura; Tenore, Gian Carlo; Bagnaresi, Paolo; Bonfante, Paola

    2014-03-21

    Tomato (Solanum lycopersicum) establishes a beneficial symbiosis with arbuscular mycorrhizal (AM) fungi. The formation of the mycorrhizal association in the roots leads to plant-wide modulation of gene expression. To understand the systemic effect of the fungal symbiosis on the tomato fruit, we used RNA-Seq to perform global transcriptome profiling on Moneymaker tomato fruits at the turning ripening stage. Fruits were collected at 55 days after flowering, from plants colonized with Funneliformis mosseae and from control plants, which were fertilized to avoid responses related to nutrient deficiency. Transcriptome analysis identified 712 genes that are differentially expressed in fruits from mycorrhizal and control plants. Gene Ontology (GO) enrichment analysis of these genes showed 81 overrepresented functional GO classes. Up-regulated GO classes include photosynthesis, stress response, transport, amino acid synthesis and carbohydrate metabolism functions, suggesting a general impact of fungal symbiosis on primary metabolisms and, particularly, on mineral nutrition. Down-regulated GO classes include cell wall, metabolism and ethylene response pathways. Quantitative RT-PCR validated the RNA-Seq results for 12 genes out of 14 when tested at three fruit ripening stages, mature green, breaker and turning. Quantification of fruit nutraceutical and mineral contents produced values consistent with the expression changes observed by RNA-Seq analysis. This RNA-Seq profiling produced a novel data set that explores the intersection of mycorrhization and fruit development. We found that the fruits of mycorrhizal plants show two transcriptomic "signatures": genes characteristic of a climacteric fleshy fruit, and genes characteristic of mycorrhizal status, like phosphate and sulphate transporters. Moreover, mycorrhizal plants under low nutrient conditions produce fruits with a nutrient content similar to those from non-mycorrhizal plants under high nutrient conditions

  6. SSP: an interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads.

    Science.gov (United States)

    Safikhani, Zhaleh; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz

    2013-01-01

    Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp. © 2013.

  7. Relating genes to function: identifying enriched transcription factors using the ENCODE ChIP-Seq significance tool.

    Science.gov (United States)

    Auerbach, Raymond K; Chen, Bin; Butte, Atul J

    2013-08-01

    Biological analysis has shifted from identifying genes and transcripts to mapping these genes and transcripts to biological functions. The ENCODE Project has generated hundreds of ChIP-Seq experiments spanning multiple transcription factors and cell lines for public use, but tools for a biomedical scientist to analyze these data are either non-existent or tailored to narrow biological questions. We present the ENCODE ChIP-Seq Significance Tool, a flexible web application leveraging public ENCODE data to identify enriched transcription factors in a gene or transcript list for comparative analyses. The ENCODE ChIP-Seq Significance Tool is written in JavaScript on the client side and has been tested on Google Chrome, Apple Safari and Mozilla Firefox browsers. Server-side scripts are written in PHP and leverage R and a MySQL database. The tool is available at http://encodeqt.stanford.edu. abutte@stanford.edu Supplementary material is available at Bioinformatics online.

  8. Site-Specific Incorporation of Functional Components into RNA by an Unnatural Base Pair Transcription System

    Directory of Open Access Journals (Sweden)

    Rie Kawai

    2012-03-01

    Full Text Available Toward the expansion of the genetic alphabet, an unnatural base pair between 7-(2-thienylimidazo[4,5-b]pyridine (Ds and pyrrole-2-carbaldehyde (Pa functions as a third base pair in replication and transcription, and provides a useful tool for the site-specific, enzymatic incorporation of functional components into nucleic acids. We have synthesized several modified-Pa substrates, such as alkylamino-, biotin-, TAMRA-, FAM-, and digoxigenin-linked PaTPs, and examined their transcription by T7 RNA polymerase using Ds-containing DNA templates with various sequences. The Pa substrates modified with relatively small functional groups, such as alkylamino and biotin, were efficiently incorporated into RNA transcripts at the internal positions, except for those less than 10 bases from the 3′-terminus. We found that the efficient incorporation into a position close to the 3′-terminus of a transcript depended on the natural base contexts neighboring the unnatural base, and that pyrimidine-Ds-pyrimidine sequences in templates were generally favorable, relative to purine-Ds-purine sequences. The unnatural base pair transcription system provides a method for the site-specific functionalization of large RNA molecules.

  9. Discovery of transcription factors and regulatory regions driving in vivo tumor development by ATAC-seq and FAIRE-seq open chromatin profiling.

    Directory of Open Access Journals (Sweden)

    Kristofer Davie

    2015-02-01

    Full Text Available Genomic enhancers regulate spatio-temporal gene expression by recruiting specific combinations of transcription factors (TFs. When TFs are bound to active regulatory regions, they displace canonical nucleosomes, making these regions biochemically detectable as nucleosome-depleted regions or accessible/open chromatin. Here we ask whether open chromatin profiling can be used to identify the entire repertoire of active promoters and enhancers underlying tissue-specific gene expression during normal development and oncogenesis in vivo. To this end, we first compare two different approaches to detect open chromatin in vivo using the Drosophila eye primordium as a model system: FAIRE-seq, based on physical separation of open versus closed chromatin; and ATAC-seq, based on preferential integration of a transposon into open chromatin. We find that both methods reproducibly capture the tissue-specific chromatin activity of regulatory regions, including promoters, enhancers, and insulators. Using both techniques, we screened for regulatory regions that become ectopically active during Ras-dependent oncogenesis, and identified 3778 regions that become (over-activated during tumor development. Next, we applied motif discovery to search for candidate transcription factors that could bind these regions and identified AP-1 and Stat92E as key regulators. We validated the importance of Stat92E in the development of the tumors by introducing a loss of function Stat92E mutant, which was sufficient to rescue the tumor phenotype. Additionally we tested if the predicted Stat92E responsive regulatory regions are genuine, using ectopic induction of JAK/STAT signaling in developing eye discs, and observed that similar chromatin changes indeed occurred. Finally, we determine that these are functionally significant regulatory changes, as nearby target genes are up- or down-regulated. In conclusion, we show that FAIRE-seq and ATAC-seq based open chromatin profiling

  10. Zipper plot: visualizing transcriptional activity of genomic regions.

    Science.gov (United States)

    Avila Cobos, Francisco; Anckaert, Jasper; Volders, Pieter-Jan; Everaert, Celine; Rombaut, Dries; Vandesompele, Jo; De Preter, Katleen; Mestdagh, Pieter

    2017-05-02

    Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs. To tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot. Using the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5'-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool.

  11. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.

    Science.gov (United States)

    Zhao, Shanrong; Xi, Li; Quan, Jie; Xi, Hualin; Zhang, Ying; von Schack, David; Vincent, Michael; Zhang, Baohong

    2016-01-08

    RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists. By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines. The high degree

  12. Dual RNA-seq reveals no plastic transcriptional response of the coccidian parasite Eimeria falciformis to host immune defenses.

    Science.gov (United States)

    Ehret, Totta; Spork, Simone; Dieterich, Christoph; Lucius, Richard; Heitlinger, Emanuel

    2017-09-05

    Parasites can either respond to differences in immune defenses that exist between individual hosts plastically or, alternatively, follow a genetically canalized ("hard wired") program of infection. Assuming that large-scale functional plasticity would be discernible in the parasite transcriptome we have performed a dual RNA-seq study of the lifecycle of Eimeria falciformis using infected mice with different immune status as models for coccidian infections. We compared parasite and host transcriptomes (dual transcriptome) between naïve and challenge infected mice, as well as between immune competent and immune deficient ones. Mice with different immune competence show transcriptional differences as well as differences in parasite reproduction (oocyst shedding). Broad gene categories represented by differently abundant host genes indicate enrichments for immune reaction and tissue repair functions. More specifically, TGF-beta, EGF, TNF and IL-1 and IL-6 are examples of functional annotations represented differently depending on host immune status. Much in contrast, parasite transcriptomes were neither different between Coccidia isolated from immune competent and immune deficient mice, nor between those harvested from naïve and challenge infected mice. Instead, parasite transcriptomes have distinct profiles early and late in infection, characterized largely by biosynthesis or motility associated functional gene groups, respectively. Extracellular sporozoite and oocyst stages showed distinct transcriptional profiles and sporozoite transcriptomes were found enriched for species specific genes and likely pathogenicity factors. We propose that the niche and host-specific parasite E. falciformis uses a genetically canalized program of infection. This program is likely fixed in an evolutionary process rather than employing phenotypic plasticity to interact with its host. This in turn might limit the potential of the parasite to adapt to new host species or niches, forcing

  13. Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells.

    Science.gov (United States)

    Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun

    2015-01-01

    Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.

  14. m6A level and isoform characterization sequencing (m6A-LAIC-seq) reveals the census and complexity of the m6A epitranscriptome

    OpenAIRE

    Molinie, Benoit; Wang, Jinkai; Lim, Kok-Seong; Hillebrand, Roman; Lu, Zhi-xiang; Van Wittenberghe, Nicholas; Howard, Benjamin D.; Daneshvar, Kaveh; Mullen, Alan C.; Dedon, Peter; Xing, Yi; Giallourakis, Cosmas C.

    2016-01-01

    N6-Methyladenosine (m6A) is a widespread, reversible chemical modification of RNA molecules, implicated in many aspects of RNA metabolism. Little quantitative information exists as to either how many transcript copies of particular genes are m6A modified (‘m6A levels’) or the relationship of m6A modification(s) to alternative RNA isoforms. To deconvolute the m6A epitranscriptome, we developed m6A-level and isoform-characterization sequencing (m6A-LAIC-seq). We found that cells exhibit a broad...

  15. RNA-Seq for enrichment and analysis of IRF5 transcript expression in SLE.

    Directory of Open Access Journals (Sweden)

    Rivka C Stone

    Full Text Available Polymorphisms in the interferon regulatory factor 5 (IRF5 gene have been consistently replicated and shown to confer risk for or protection from the development of systemic lupus erythematosus (SLE. IRF5 expression is significantly upregulated in SLE patients and upregulation associates with IRF5-SLE risk haplotypes. IRF5 alternative splicing has also been shown to be elevated in SLE patients. Given that human IRF5 exists as multiple alternatively spliced transcripts with distinct function(s, it is important to determine whether the IRF5 transcript profile expressed in healthy donor immune cells is different from that expressed in SLE patients. Moreover, it is not currently known whether an IRF5-SLE risk haplotype defines the profile of IRF5 transcripts expressed. Using standard molecular cloning techniques, we identified and isolated 14 new differentially spliced IRF5 transcript variants from purified monocytes of healthy donors and SLE patients to generate an IRF5 variant transcriptome. Next-generation sequencing was then used to perform in-depth and quantitative analysis of full-length IRF5 transcript expression in primary immune cells of SLE patients and healthy donors by next-generation sequencing. Evidence for additional alternatively spliced transcripts was obtained from de novo junction discovery. Data from these studies support the overall complexity of IRF5 alternative splicing in SLE. Results from next-generation sequencing correlated with cloning and gave similar abundance rankings in SLE patients thus supporting the use of this new technology for in-depth single gene transcript profiling. Results from this study provide the first proof that 1 SLE patients express an IRF5 transcript signature that is distinct from healthy donors, 2 an IRF5-SLE risk haplotype defines the top four most abundant IRF5 transcripts expressed in SLE patients, and 3 an IRF5 transcript signature enables clustering of SLE patients with the H2 risk haplotype.

  16. Detecting Differential Transcription Factor Activity from ATAC-Seq Data

    Directory of Open Access Journals (Sweden)

    Ignacio J. Tripodi

    2018-05-01

    Full Text Available Transcription factors are managers of the cellular factory, and key components to many diseases. Many non-coding single nucleotide polymorphisms affect transcription factors, either by directly altering the protein or its functional activity at individual binding sites. Here we first briefly summarize high-throughput approaches to studying transcription factor activity. We then demonstrate, using published chromatin accessibility data (specifically ATAC-seq, that the genome-wide profile of TF recognition motifs relative to regions of open chromatin can determine the key transcription factor altered by a perturbation. Our method of determining which TFs are altered by a perturbation is simple, is quick to implement, and can be used when biological samples are limited. In the future, we envision that this method could be applied to determine which TFs show altered activity in response to a wide variety of drugs and diseases.

  17. Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at the Migrating Front of Human Glioblastoma

    Directory of Open Access Journals (Sweden)

    Spyros Darmanis

    2017-10-01

    Full Text Available Summary: Glioblastoma (GBM is the most common primary brain cancer in adults and is notoriously difficult to treat because of its diffuse nature. We performed single-cell RNA sequencing (RNA-seq on 3,589 cells in a cohort of four patients. We obtained cells from the tumor core as well as surrounding peripheral tissue. Our analysis revealed cellular variation in the tumor’s genome and transcriptome. We were also able to identify infiltrating neoplastic cells in regions peripheral to the core lesions. Despite the existence of significant heterogeneity among neoplastic cells, we found that infiltrating GBM cells share a consistent gene signature between patients, suggesting a common mechanism of infiltration. Additionally, in investigating the immunological response to the tumors, we found transcriptionally distinct myeloid cell populations residing in the tumor core and the surrounding peritumoral space. Our data provide a detailed dissection of GBM cell types, revealing an abundance of information about tumor formation and migration. : Darmanis et al. perform single-cell transcriptomic analyses of neoplastic and stromal cells within and proximal to primary glioblastomas. The authors describe a population of neoplastic-infiltrating glioblastoma cells as well as a putative role of tumor-infiltrating immune cells in supporting tumor growth. Keywords: single cell, RNA-seq, glioma, glioblastoma, GBM, brain, heterogeneity, infiltrating, diffuse, checkpoint

  18. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing

    KAUST Repository

    Zhang, Runxuan

    2017-04-05

    Alternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.

  19. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing

    KAUST Repository

    Zhang, Runxuan; Calixto, Cristiane  P.  G.; Marquez, Yamile; Venhuizen, Peter; Tzioutziou, Nikoleta A.; Guo, Wenbin; Spensley, Mark; Entizne, Juan Carlos; Lewandowska, Dominika; ten  Have, Sara; Frei  dit  Frey, Nicolas; Hirt, Heribert; James, Allan B.; Nimmo, Hugh G.; Barta, Andrea; Kalyna, Maria; Brown, John  W.  S.

    2017-01-01

    Alternative splicing generates multiple transcript and protein isoforms from the same gene and thus is important in gene expression regulation. To date, RNA-sequencing (RNA-seq) is the standard method for quantifying changes in alternative splicing on a genome-wide scale. Understanding the current limitations of RNA-seq is crucial for reliable analysis and the lack of high quality, comprehensive transcriptomes for most species, including model organisms such as Arabidopsis, is a major constraint in accurate quantification of transcript isoforms. To address this, we designed a novel pipeline with stringent filters and assembled a comprehensive Reference Transcript Dataset for Arabidopsis (AtRTD2) containing 82,190 non-redundant transcripts from 34 212 genes. Extensive experimental validation showed that AtRTD2 and its modified version, AtRTD2-QUASI, for use in Quantification of Alternatively Spliced Isoforms, outperform other available transcriptomes in RNA-seq analysis. This strategy can be implemented in other species to build a pipeline for transcript-level expression and alternative splicing analyses.

  20. Time-Course Transcriptome Analysis Reveals Resistance Genes of Panax ginseng Induced by Cylindrocarpon destructans Infection Using RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Yuan Gao

    Full Text Available Panax ginseng C. A. Meyer is a highly valued medicinal plant. Cylindrocarpon destructans is a destructive pathogen that causes root rot and significantly reduces the quality and yield of P. ginseng. However, an efficient method to control root rot remains unavailable because of insufficient understanding of the molecular mechanism underlying C. destructans-P. ginseng interaction. In this study, C. destructans-induced transcriptomes at different time points were investigated using RNA sequencing (RNA-Seq. De novo assembly produced 73,335 unigenes for the P. ginseng transcriptome after C. destructans infection, in which 3,839 unigenes were up-regulated. Notably, the abundance of the up-regulated unigenes sharply increased at 0.5 d postinoculation to provide effector-triggered immunity. In total, 24 of 26 randomly selected unigenes can be validated using quantitative reverse transcription (qRT-PCR. Gene ontology enrichment analysis of these unigenes showed that "defense response to fungus", "defense response" and "response to stress" were enriched. In addition, differentially expressed transcription factors involved in the hormone signaling pathways after C. destructans infection were identified. Finally, differentially expressed unigenes involved in reactive oxygen species and ginsenoside biosynthetic pathway during C. destructans infection were indentified. To our knowledge, this study is the first to report on the dynamic transcriptome triggered by C. destructans. These results improve our understanding of disease resistance in P. ginseng and provide a useful resource for quick detection of induced markers in P. ginseng before the comprehensive outbreak of this disease caused by C. destructans.

  1. Inference of RNA decay rate from transcriptional profiling highlights the regulatory programs of Alzheimer's disease.

    Science.gov (United States)

    Alkallas, Rached; Fish, Lisa; Goodarzi, Hani; Najafabadi, Hamed S

    2017-10-13

    The abundance of mRNA is mainly determined by the rates of RNA transcription and decay. Here, we present a method for unbiased estimation of differential mRNA decay rate from RNA-sequencing data by modeling the kinetics of mRNA metabolism. We show that in all primary human tissues tested, and particularly in the central nervous system, many pathways are regulated at the mRNA stability level. We present a parsimonious regulatory model consisting of two RNA-binding proteins and four microRNAs that modulate the mRNA stability landscape of the brain, which suggests a new link between RBFOX proteins and Alzheimer's disease. We show that downregulation of RBFOX1 leads to destabilization of mRNAs encoding for synaptic transmission proteins, which may contribute to the loss of synaptic function in Alzheimer's disease. RBFOX1 downregulation is more likely to occur in older and female individuals, consistent with the association of Alzheimer's disease with age and gender."mRNA abundance is determined by the rates of transcription and decay. Here, the authors propose a method for estimating the rate of differential mRNA decay from RNA-seq data and model mRNA stability in the brain, suggesting a link between mRNA stability and Alzheimer's disease."

  2. High-Throughput Sequencing Based Methods of RNA Structure Investigation

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz Jan

    In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... and computational protocol for detecting the reverse transcription termination sites (RTTS-Seq). This protocol was subsequently applied to hydroxyl radical footprinting of three dimensional RNA structures to give a probing signal that correlates well with the RNA backbone solvent accessibility. Moreover, we applied...

  3. Double-stranded RNA transcribed from vector-based oligodeoxynucleotide acts as transcription factor decoy

    International Nuclear Information System (INIS)

    Xiao, Xiao; Gang, Yi; Wang, Honghong; Wang, Jiayin; Zhao, Lina; Xu, Li; Liu, Zhiguo

    2015-01-01

    Highlights: • A shRNA vector based transcription factor decoy, VB-ODN, was designed. • VB-ODN for NF-κB inhibited cell viability in HEK293 cells. • VB-ODN inhibited expression of downstream genes of target transcription factors. • VB-ODN may enhance nuclear entry ratio for its feasibility of virus production. - Abstract: In this study, we designed a short hairpin RNA vector-based oligodeoxynucleotide (VB-ODN) carrying transcription factor (TF) consensus sequence which could function as a decoy to block TF activity. Specifically, VB-ODN for Nuclear factor-κB (NF-κB) could inhibit cell viability and decrease downstream gene expression in HEK293 cells without affecting expression of NF-κB itself. The specific binding between VB-ODN produced double-stranded RNA and NF-κB was evidenced by electrophoretic mobility shift assay. Moreover, similar VB-ODNs designed for three other TFs also inhibit their downstream gene expression but not that of themselves. Our study provides a new design of decoy for blocking TF activity

  4. Double-stranded RNA transcribed from vector-based oligodeoxynucleotide acts as transcription factor decoy

    Energy Technology Data Exchange (ETDEWEB)

    Xiao, Xiao [State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi’an 710032, Shaanxi Province (China); Gang, Yi [State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi’an 710032, Shaanxi Province (China); Department of Infectious Diseases, Tangdu Hospital, Fourth Military Medical University, Xi’an 710038, Shaanxi Province (China); Wang, Honghong [No. 518 Hospital of Chinese People’s Liberation Army, Xi’an 710043, Shaanxi Province (China); Wang, Jiayin [The Genome Institute, Washington University in St. Louis, St. Louis, MO 63108 (United States); Zhao, Lina [Department of Radiation Oncology, Xijing Hospital, Fourth Military Medical University, Xi’an 710032, Shaanxi Province (China); Xu, Li, E-mail: lxuhelen@163.com [State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi’an 710032, Shaanxi Province (China); Liu, Zhiguo, E-mail: liuzhiguo@fmmu.edu.cn [State Key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi’an 710032, Shaanxi Province (China)

    2015-02-06

    Highlights: • A shRNA vector based transcription factor decoy, VB-ODN, was designed. • VB-ODN for NF-κB inhibited cell viability in HEK293 cells. • VB-ODN inhibited expression of downstream genes of target transcription factors. • VB-ODN may enhance nuclear entry ratio for its feasibility of virus production. - Abstract: In this study, we designed a short hairpin RNA vector-based oligodeoxynucleotide (VB-ODN) carrying transcription factor (TF) consensus sequence which could function as a decoy to block TF activity. Specifically, VB-ODN for Nuclear factor-κB (NF-κB) could inhibit cell viability and decrease downstream gene expression in HEK293 cells without affecting expression of NF-κB itself. The specific binding between VB-ODN produced double-stranded RNA and NF-κB was evidenced by electrophoretic mobility shift assay. Moreover, similar VB-ODNs designed for three other TFs also inhibit their downstream gene expression but not that of themselves. Our study provides a new design of decoy for blocking TF activity.

  5. Complementarity of SOMAscan to LC-MS/MS and RNA-seq for quantitative profiling of human embryonic and mesenchymal stem cells.

    Science.gov (United States)

    Billing, Anja M; Ben Hamidane, Hisham; Bhagwat, Aditya M; Cotton, Richard J; Dib, Shaima S; Kumar, Pankaj; Hayat, Shahina; Goswami, Neha; Suhre, Karsten; Rafii, Arash; Graumann, Johannes

    2017-01-06

    -derived), we included the aptamer-based SOMAscan assay, complementing LC-MS/MS and RNA-seq data. Furthermore, SOMAscan, a targeted proteomics platform developed for analyzing clinical samples, has been benchmarked against established analytical platforms (LC-MS/MS and RNA-seq) using stem cell comparisons as a model. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Transcriptomic profiling of linolenic acid-responsive genes in ROS signalling from RNA-seq data in Arabidopsis

    Directory of Open Access Journals (Sweden)

    Capilla eMata-Pérez

    2015-03-01

    Full Text Available Linolenic acid (Ln released from chloroplast membrane galactolipids is a precursor of the phytohormone jasmonic acid (JA. The involvement of this hormone in different plant biological processes, such as responses to biotic stress conditions, has been extensively studied. However, the role of Ln in the regulation of gene expression during abiotic stress situations mediated by cellular redox changes and/or by oxidative stress processes remains poorly understood. An RNA-seq approach has increased our knowledge of the interplay among Ln, oxidative stress and ROS signalling that mediates abiotic stress conditions. Transcriptome analysis with the aid of RNA-seq in the absence of oxidative stress revealed that the incubation of Arabidopsis thaliana cell suspension cultures (ACSC with Ln resulted in the modulation of 7525 genes, of which 3034 genes had a 2 fold-change, being 533 up- and 2501 down-regulated genes, respectively. Thus, RNA-seq data analysis showed that an important set of these genes were associated with the jasmonic acid biosynthetic pathway including lypoxygenases (LOXs and Allene oxide cyclases (AOCs. In addition, several transcription factor families involved in the response to biotic stress conditions (pathogen attacks or herbivore feeding, such as WRKY, JAZ, MYC and LRR were also modified in response to Ln. However, this study also shows that Ln has the capacity to modulate the expression of genes involved in the response to abiotic stress conditions, particularly those mediated by ROS signalling. In this regard, we were able to identify new targets such as galactinol synthase 1 (GOLS1, methionine sulfoxide reductase (MSR and alkenal reductase in ACSC. It is therefore possible to suggest that, in the absence of any oxidative stress, Ln is capable of modulating new sets of genes involved in the signalling mechanism mediated by additional abiotic stresses (salinity, UV and high light intensity and especially in stresses mediated by ROS.

  7. Rcount: simple and flexible RNA-Seq read counting.

    Science.gov (United States)

    Schmid, Marc W; Grossniklaus, Ueli

    2015-02-01

    Analysis of differential gene expression by RNA sequencing (RNA-Seq) is frequently done using feature counts, i.e. the number of reads mapping to a gene. However, commonly used count algorithms (e.g. HTSeq) do not address the problem of reads aligning with multiple locations in the genome (multireads) or reads aligning with positions where two or more genes overlap (ambiguous reads). Rcount specifically addresses these issues. Furthermore, Rcount allows the user to assign priorities to certain feature types (e.g. higher priority for protein-coding genes compared to rRNA-coding genes) or to add flanking regions. Rcount provides a fast and easy-to-use graphical user interface requiring no command line or programming skills. It is implemented in C++ using the SeqAn (www.seqan.de) and the Qt libraries (qt-project.org). Source code and 64 bit binaries for (Ubuntu) Linux, Windows (7) and MacOSX are released under the GPLv3 license and are freely available on github.com/MWSchmid/Rcount. marcschmid@gmx.ch Test data, genome annotation files, useful Python and R scripts and a step-by-step user guide (including run-time and memory usage tests) are available on github.com/MWSchmid/Rcount. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells

    Directory of Open Access Journals (Sweden)

    Bianchetti Laurent

    2012-11-01

    Full Text Available Abstract Background Single Base Substitutions (SBS that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells. Methods We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE and Tag-seq (a combination of L-SAGE and deep sequencing, and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT, i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality. Results In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP, catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST, i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC, healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression. Conclusion If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient’s tumor and aid diagnostic.

  9. Increased frequency of single base substitutions in a population of transcripts expressed in cancer cells

    International Nuclear Information System (INIS)

    Bianchetti, Laurent; Kieffer, David; Féderkeil, Rémi; Poch, Olivier

    2012-01-01

    Single Base Substitutions (SBS) that alter transcripts expressed in cancer originate from somatic mutations. However, recent studies report SBS in transcripts that are not supported by the genomic DNA of tumor cells. We used sequence based whole genome expression profiling, namely Long-SAGE (L-SAGE) and Tag-seq (a combination of L-SAGE and deep sequencing), and computational methods to identify transcripts with greater SBS frequencies in cancer. Millions of tags produced by 40 healthy and 47 cancer L-SAGE experiments were compared to 1,959 Reference Tags (RT), i.e. tags matching the human genome exactly once. Similarly, tens of millions of tags produced by 7 healthy and 8 cancer Tag-seq experiments were compared to 8,572 RT. For each transcript, SBS frequencies in healthy and cancer cells were statistically tested for equality. In the L-SAGE and Tag-seq experiments, 372 and 4,289 transcripts respectively, showed greater SBS frequencies in cancer. Increased SBS frequencies could not be attributed to known Single Nucleotide Polymorphisms (SNP), catalogued somatic mutations or RNA-editing enzymes. Hypothesizing that Single Tags (ST), i.e. tags sequenced only once, were indicators of SBS, we observed that ST proportions were heterogeneously distributed across Embryonic Stem Cells (ESC), healthy differentiated and cancer cells. ESC had the lowest ST proportions, whereas cancer cells had the greatest. Finally, in a series of experiments carried out on a single patient at 1 healthy and 3 consecutive tumor stages, we could show that SBS frequencies increased during cancer progression. If the mechanisms generating the base substitutions could be known, increased SBS frequency in transcripts would be a new useful biomarker of cancer. With the reduction of sequencing cost, sequence based whole genome expression profiling could be used to characterize increased SBS frequency in patient’s tumor and aid diagnostic

  10. Identification of innate lymphoid cells in single-cell RNA-Seq data.

    Science.gov (United States)

    Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David

    2017-07-01

    Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.

  11. Enhancement of single guide RNA transcription for efficient CRISPR/Cas-based genomic engineering.

    Science.gov (United States)

    Ui-Tei, Kumiko; Maruyama, Shohei; Nakano, Yuko

    2017-06-01

    Genomic engineering using clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) protein is a promising approach for targeting the genomic DNA of virtually any organism in a sequence-specific manner. Recent remarkable advances in CRISPR/Cas technology have made it a feasible system for use in therapeutic applications and biotechnology. In the CRISPR/Cas system, a guide RNA (gRNA), interacting with the Cas protein, recognizes a genomic region with sequence complementarity, and the double-stranded DNA at the target site is cleaved by the Cas protein. A widely used gRNA is an RNA polymerase III (pol III)-driven single gRNA (sgRNA), which is produced by artificial fusion of CRISPR RNA (crRNA) and trans-activation crRNA (tracrRNA). However, we identified a TTTT stretch, known as a termination signal of RNA pol III, in the scaffold region of the sgRNA. Here, we revealed that sgRNA carrying a TTTT stretch reduces the efficiency of sgRNA transcription due to premature transcriptional termination, and decreases the efficiency of genome editing. Unexpectedly, it was also shown that the premature terminated sgRNA may have an adverse effect of inducing RNA interference. Such disadvantageous effects were avoided by substituting one base in the TTTT stretch.

  12. Effect of method of deduplication on estimation of differential gene expression using RNA-seq

    Directory of Open Access Journals (Sweden)

    Anna V. Klepikova

    2017-03-01

    Full Text Available Background RNA-seq is a useful tool for analysis of gene expression. However, its robustness is greatly affected by a number of artifacts. One of them is the presence of duplicated reads. Results To infer the influence of different methods of removal of duplicated reads on estimation of gene expression in cancer genomics, we analyzed paired samples of hepatocellular carcinoma (HCC and non-tumor liver tissue. Four protocols of data analysis were applied to each sample: processing without deduplication, deduplication using a method implemented in SAMtools, and deduplication based on one or two molecular indices (MI. We also analyzed the influence of sequencing layout (single read or paired end and read length. We found that deduplication without MI greatly affects estimated expression values; this effect is the most pronounced for highly expressed genes. Conclusion The use of unique molecular identifiers greatly improves accuracy of RNA-seq analysis, especially for highly expressed genes. We developed a set of scripts that enable handling of MI and their incorporation into RNA-seq analysis pipelines. Deduplication without MI affects results of differential gene expression analysis, producing a high proportion of false negative results. The absence of duplicate read removal is biased towards false positives. In those cases where using MI is not possible, we recommend using paired-end sequencing layout.

  13. Differential stress transcriptome landscape of historic and recently emerged hypervirulent strains of Clostridium difficile strains determined using RNA-seq.

    Directory of Open Access Journals (Sweden)

    Joy Scaria

    Full Text Available C. difficile is the most common cause of nosocomial diarrhea in North America and Europe. Genomes of individual strains of C. difficile are highly divergent. To determine how divergent strains respond to environmental changes, the transcriptomes of two historic and two recently isolated hypervirulent strains were analyzed following nutrient shift and osmotic shock. Illumina based RNA-seq was used to sequence these transcriptomes. Our results reveal that although C. difficile strains contain a large number of shared and strain specific genes, the majority of the differentially expressed genes were core genes. We also detected a number of transcriptionally active regions that were not part of the primary genome annotation. Some of these are likely to be small regulatory RNAs.

  14. Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data.

    Science.gov (United States)

    León-Novelo, Luis; Fuentes, Claudio; Emerson, Sarah

    2017-10-01

    RNA-Seq data characteristically exhibits large variances, which need to be appropriately accounted for in any proposed model. We first explore the effects of this variability on the maximum likelihood estimator (MLE) of the dispersion parameter of the negative binomial distribution, and propose instead to use an estimator obtained via maximization of the marginal likelihood in a conjugate Bayesian framework. We show, via simulation studies, that the marginal MLE can better control this variation and produce a more stable and reliable estimator. We then formulate a conjugate Bayesian hierarchical model, and use this new estimator to propose a Bayesian hypothesis test to detect differentially expressed genes in RNA-Seq data. We use numerical studies to show that our much simpler approach is competitive with other negative binomial based procedures, and we use a real data set to illustrate the implementation and flexibility of the procedure. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. An integrated one-chip-sensor system for microRNA quantitative analysis based on digital droplet polymerase chain reaction

    Science.gov (United States)

    Tsukuda, Masahiko; Wiederkehr, Rodrigo Sergio; Cai, Qing; Majeed, Bivragh; Fiorini, Paolo; Stakenborg, Tim; Matsuno, Toshinobu

    2016-04-01

    A silicon microfluidic chip was developed for microRNA (miRNA) quantitative analysis. It performs sequentially reverse transcription and polymerase chain reaction in a digital droplet format. Individual processes take place on different cavities, and reagent and sample mixing is carried out on a chip, prior to entering each compartment. The droplets are generated on a T-junction channel before the polymerase chain reaction step. Also, a miniaturized fluorescence detector was developed, based on an optical pick-up head of digital versatile disc (DVD) and a micro-photomultiplier tube. The chip integrated in the detection system was tested using synthetic miRNA with known concentrations, ranging from 300 to 3,000 templates/µL. Results proved the functionality of the system.

  16. Integrative analysis of histone ChIP-seq and transcription data using Bayesian mixture models

    DEFF Research Database (Denmark)

    Klein, Hans-Ulrich; Schäfer, Martin; Porse, Bo T

    2014-01-01

    Histone modifications are a key epigenetic mechanism to activate or repress the transcription of genes. Datasets of matched transcription data and histone modification data obtained by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel...

  17. De novo RNA-Seq based transcriptome analysis of Papiliotrema laurentii strain RY1 under nitrogen starvation.

    Science.gov (United States)

    Sarkar, Soumyadev; Chakravorty, Somnath; Mukherjee, Avishek; Bhattacharya, Debanjana; Bhattacharya, Semantee; Gachhui, Ratan

    2018-03-01

    Nitrogen is a key nutrient for all cell forms. Most organisms respond to nitrogen scarcity by slowing down their growth rate. On the contrary, our previous studies have shown that Papiliotrema laurentii strain RY1 has a robust growth under nitrogen starvation. To understand the global regulation that leads to such an extraordinary response, we undertook a de novo approach for transcriptome analysis of the yeast. Close to 33 million sequence reads of high quality for nitrogen limited and enriched condition were generated using Illumina NextSeq500. Trinity analysis and clustered transcripts annotation of the reads produced 17,611 unigenes, out of which 14,157 could be annotated. Gene Ontology term analysis generated 44.92% cellular component terms, 39.81% molecular function terms and 15.24% biological process terms. The most over represented pathways in general were translation, carbohydrate metabolism, amino acid metabolism, general metabolism, folding, sorting, degradation followed by transport and catabolism, nucleotide metabolism, replication and repair, transcription and lipid metabolism. A total of 4256 Single Sequence Repeats were identified. Differential gene expression analysis detected 996 P-significant transcripts to reveal transmembrane transport, lipid homeostasis, fatty acid catabolism and translation as the enriched terms which could be essential for Papiliotrema laurentii strain RY1 to adapt during nitrogen deprivation. Transcriptome data was validated by quantitative real-time PCR analysis of twelve transcripts. To the best of our knowledge, this is the first report of Papiliotrema laurentii strain RY1 transcriptome which would play a pivotal role in understanding the biochemistry of the yeast under acute nitrogen stress and this study would be encouraging to initiate extensive investigations into this Papiliotrema system. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Connections between Transcription Downstream of Genes and cis-SAGe Chimeric RNA.

    Science.gov (United States)

    Chwalenia, Katarzyna; Qin, Fujun; Singh, Sandeep; Tangtrongstittikul, Panjapon; Li, Hui

    2017-11-22

    cis-Splicing between adjacent genes (cis-SAGe) is being recognized as one way to produce chimeric fusion RNAs. However, its detail mechanism is not clear. Recent study revealed induction of transcriptions downstream of genes (DoGs) under osmotic stress. Here, we investigated the influence of osmotic stress on cis-SAGe chimeric RNAs and their connection to DoGs. We found,the absence of induction of at least some cis-SAGe fusions and/or their corresponding DoGs at early time point(s). In fact, these DoGs and their cis-SAGe fusions are inversely correlated. This negative correlation was changed to positive at a later time point. These results suggest a direct competition between the two categories of transcripts when total pool of readthrough transcripts is limited at an early time point. At a later time point, DoGs and corresponding cis-SAGe fusions are both induced, indicating that total readthrough transcripts become more abundant. Finally, we observed overall enhancement of cis-SAGe chimeric RNAs in KCl-treated samples by RNA-Seq analysis.

  19. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts

    KAUST Repository

    Alam, Tanvir

    2016-10-12

    Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna

  20. FARNA: knowledgebase of inferred functions of non-coding RNA transcripts

    KAUST Repository

    Alam, Tanvir; Uludag, Mahmut; Essack, Magbubah; Salhi, Adil; Ashoor, Haitham; Hanks, John B.; Kapfer, Craig Eric; Mineta, Katsuhiko; Gojobori, Takashi; Bajic, Vladimir B.

    2016-01-01

    Non-coding RNA (ncRNA) genes play a major role in control of heterogeneous cellular behavior. Yet, their functions are largely uncharacterized. Current available databases lack in-depth information of ncRNA functions across spectrum of various cells/tissues. Here, we present FARNA, a knowledgebase of inferred functions of 10,289 human ncRNA transcripts (2,734 microRNA and 7,555 long ncRNA) in 119 tissues and 177 primary cells of human. Since transcription factors (TFs) and TF co-factors (TcoFs) are crucial components of regulatory machinery for activation of gene transcription, cellular processes and diseases in which TFs and TcoFs are involved suggest functions of the transcripts they regulate. In FARNA, functions of a transcript are inferred from TFs and TcoFs whose genes co-express with the transcript controlled by these TFs and TcoFs in a considered cell/tissue. Transcripts were annotated using statistically enriched GO terms, pathways and diseases across cells/tissues based on guilt-by-association principle. Expression profiles across cells/tissues based on Cap Analysis of Gene Expression (CAGE) are provided. FARNA, having the most comprehensive function annotation of considered ncRNAs across widest spectrum of human cells/tissues, has a potential to greatly contribute to our understanding of ncRNA roles and their regulatory mechanisms in human. FARNA can be accessed at: http://cbrc.kaust.edu.sa/farna

  1. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    Science.gov (United States)

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  2. RNA-Seq and iTRAQ Reveal the Dwarfing Mechanism of Dwarf Polish Wheat (Triticum polonicum L.).

    Science.gov (United States)

    Wang, Yi; Xiao, Xue; Wang, Xiaolu; Zeng, Jian; Kang, Houyang; Fan, Xing; Sha, Lina; Zhang, Haiqin; Zhou, Yonghong

    2016-01-01

    The dwarfing mechanism of Rht-dp in dwarf Polish wheat (DPW) is unknown. Each internode of DPW was significantly shorter than it in high Polish wheat (HPW), and the dwarfism was insensitive to photoperiod, abscisic acid (ABA), gibberellin (GA), cytokinin (CK), auxin and brassinolide (BR). To understand the mechanism, three sets of transcripts, DPW, HPW, and a chimeric set (a combination of DPW and HPW), were constructed using RNA sequencing (RNA-Seq). Based on the chimeric transcripts, 2,446 proteins were identified using isobaric tags for relative and absolute quantification (iTRAQ). A total of 108 unigenes and 12 proteins were considered as dwarfism-related differentially expressed genes (DEGs) and differentially expressed proteins (DEPs), respectively. Among of these DEGs and DEPs, 6 DEGs and 6 DEPs were found to be involved in flavonoid and S-adenosyl-methionine (SAM) metabolisms; 5 DEGs and 3 DEPs were involved in cellulose metabolism, cell wall plasticity and cell expansion; 2 DEGs were auxin transporters; 2 DEPs were histones; 1 DEP was a peroxidase. These DEGs and DEPs reduced lignin and cellulose contents, increased flavonoid content, possibly decreased S-adenosyl-methionine (SAM) and polyamine contents and increased S-adenosyl-L-homocysteine hydrolase (SAHH) content in DPW stems, which could limit auxin transport and reduce extensibility of the cell wall, finally limited cell expansion (the cell size of DPW was significantly smaller than HPW cells) and caused dwarfism in DPW.

  3. Analysis of the bovine monocyte-derived macrophage response to Mycobacterium avium subspecies paratuberculosis infection using RNA-seq

    Directory of Open Access Journals (Sweden)

    Maura E Casey

    2015-02-01

    Full Text Available Johne’s disease, caused by infection with Mycobacterium avium subsp. paratuberculosis, (MAP, is a chronic intestinal disease of ruminants with serious economic consequences for cattle production in the United States and elsewhere. During infection, MAP bacilli are phagocytosed and subvert host macrophage processes, resulting in subclinical infections that can lead to immunopathology and dissemination of disease. Analysis of the host macrophage transcriptome during infection can therefore shed light on the molecular mechanisms and host-pathogen interplay associated with Johne’s disease. Here we describe results of an in vitro study of the bovine monocyte-derived macrophage (MDM transcriptome response during MAP infection using RNA-seq. MDM were obtained from seven age- and sex-matched Holstein-Friesian cattle and were infected with MAP across a six-hour infection time course with non-infected controls. We observed 245 and 574 differentially expressed genes in MAP-infected versus non-infected control samples (adjusted P value ≤ 0.05 at 2 and 6 hours post-infection, respectively. Functional analyses of these differentially expressed genes, including biological pathway enrichment, highlighted potential functional roles for genes that have not been previously described in the host response to infection with MAP bacilli. In addition, differential expression of pro- and anti-inflammatory cytokine genes, such as those associated with the IL-10 signaling pathway, and other immune-related genes that encode proteins involved in the bovine macrophage response to MAP infection emphasize the balance between protective host immunity and bacilli survival and proliferation. Systematic comparisons of RNA-seq gene expression results with Affymetrix® microarray data generated from the same experimental samples also demonstrated that RNA-seq represents a superior technology for studying host transcriptional responses to intracellular infection.

  4. Gene expression profiling of non-polyadenylated RNA-seq across species

    Directory of Open Access Journals (Sweden)

    Xiao-Ou Zhang

    2014-12-01

    Full Text Available Transcriptomes are dynamic and unique, with each cell type/tissue, developmental stage and species expressing a different repertoire of RNA transcripts. Most mRNAs and well-characterized long noncoding RNAs are shaped with a 5′ cap and 3′ poly(A tail, thus conventional transcriptome analyses typically start with the enrichment of poly(A+ RNAs by oligo(dT selection, followed by deep sequencing approaches. However, accumulated lines of evidence suggest that many RNA transcripts are processed by alternative mechanisms without 3′ poly(A tails and, therefore, fail to be enriched by oligo(dT purification and are absent following deep sequencing analyses. We have described an enrichment strategy to purify non-polyadenylated (poly(A−/ribo− RNAs from human total RNAs by removal of both poly(A+ RNA transcripts and ribosomal RNAs, which led to the identification of many novel RNA transcripts with non-canonical 3′ ends in human. Here, we describe the application of non-polyadenylated RNA-sequencing in rhesus monkey and mouse cell lines/tissue, and further profile the transcription of non-polyadenylated RNAs across species, providing new resources for non-polyadenylated RNA identification and comparison across species.

  5. Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome

    DEFF Research Database (Denmark)

    Peng, Zhiyu; Cheng, Yanbing; Tan, Bertrand Chin-Ming

    2012-01-01

    a computational pipeline that carefully controls for false positives while calling RNA editing events from genome and whole-transcriptome data of the same individual. We identified 22,688 RNA editing events in noncoding genes and introns, untranslated regions and coding sequences of protein-coding genes. Most......RNA editing is a post-transcriptional event that recodes hereditary information. Here we describe a comprehensive profile of the RNA editome of a male Han Chinese individual based on analysis of ∼767 million sequencing reads from poly(A)(+), poly(A)(-) and small RNA samples. We developed...... changes (∼93%) converted A to I(G), consistent with known editing mechanisms based on adenosine deaminase acting on RNA (ADAR). We also found evidence of other types of nucleotide changes; however, these were validated at lower rates. We found 44 editing sites in microRNAs (miRNAs), suggesting a potential...

  6. A Herpesviral Immediate Early Protein Promotes Transcription Elongation of Viral Transcripts

    Directory of Open Access Journals (Sweden)

    Hannah L. Fox

    2017-06-01

    Full Text Available Herpes simplex virus 1 (HSV-1 genes are transcribed by cellular RNA polymerase II (RNA Pol II. While four viral immediate early proteins (ICP4, ICP0, ICP27, and ICP22 function in some capacity in viral transcription, the mechanism by which ICP22 functions remains unclear. We observed that the FACT complex (comprised of SSRP1 and Spt16 was relocalized in infected cells as a function of ICP22. ICP22 was also required for the association of FACT and the transcription elongation factors SPT5 and SPT6 with viral genomes. We further demonstrated that the FACT complex interacts with ICP22 throughout infection. We therefore hypothesized that ICP22 recruits cellular transcription elongation factors to viral genomes for efficient transcription elongation of viral genes. We reevaluated the phenotype of an ICP22 mutant virus by determining the abundance of all viral mRNAs throughout infection by transcriptome sequencing (RNA-seq. The accumulation of almost all viral mRNAs late in infection was reduced compared to the wild type, regardless of kinetic class. Using chromatin immunoprecipitation sequencing (ChIP-seq, we mapped the location of RNA Pol II on viral genes and found that RNA Pol II levels on the bodies of viral genes were reduced in the ICP22 mutant compared to wild-type virus. In contrast, the association of RNA Pol II with transcription start sites in the mutant was not reduced. Taken together, our results indicate that ICP22 plays a role in recruiting elongation factors like the FACT complex to the HSV-1 genome to allow for efficient viral transcription elongation late in viral infection and ultimately infectious virion production.

  7. Identification of Differentially Expressed Genes Related to Dehydration Resistance in a Highly Drought-Tolerant Pear, Pyrus betulaefolia, as through RNA-Seq.

    Science.gov (United States)

    Li, Kong-Qing; Xu, Xiao-Yong; Huang, Xiao-San

    2016-01-01

    Drought is a major abiotic stress that affects plant growth, development and productivity. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms of drought tolerance in this plant are still unclear. To better understand the molecular basis regarding drought stress response, RNA-seq was performed on samples collected before and after dehydration in Pyrus betulaefolia. In total, 19,532 differentially expressed genes (DEGs) were identified. These genes were annotated into 144 Gene Ontology (GO) terms and 18 clusters of orthologous groups (COG) involved in 129 Kyoto Encyclopedia of Genes and Genomes (KEGG) defined pathways. These DEGs comprised 49 (26 up-regulated, 23 down-regulated), 248 (166 up-regulated, 82 down-regulated), 3483 (1295 up-regulated, 2188 down-regulated), 1455 (1065 up-regulated, 390 down-regulated) genes from the 1 h, 3 h and 6 h dehydration-treated samples and a 24 h recovery samples, respectively. RNA-seq was validated by analyzing the expresson patterns of randomly selected 16 DEGs by quantitative real-time PCR. Photosynthesis, signal transduction, innate immune response, protein phosphorylation, response to water, response to biotic stimulus, and plant hormone signal transduction were the most significantly enriched GO categories amongst the DEGs. A total of 637 transcription factors were shown to be dehydration responsive. In addition, a number of genes involved in the metabolism and signaling of hormones were significantly affected by the dehydration stress. This dataset provides valuable information regarding the Pyrus betulaefolia transcriptome changes in response to dehydration and may promote identification and functional analysis of potential genes that could be used for improving drought tolerance via genetic engineering of non-model, but economically-important, perennial species.

  8. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.

    Science.gov (United States)

    Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi

    2018-03-09

    High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.

  9. Quantitative multi-target RNA profiling in Epstein-Barr virus infected tumor cells.

    Science.gov (United States)

    Greijer, A E; Ramayanti, O; Verkuijlen, S A W M; Novalić, Z; Juwana, H; Middeldorp, J M

    2017-03-01

    Epstein-Barr virus (EBV) is etiologically linked to multiple acute, chronic and malignant diseases. Detection of EBV-RNA transcripts in tissues or biofluids besides EBV-DNA can help in diagnosing EBV related syndromes. Sensitive EBV transcription profiling yields new insights on its pathogenic role and may be useful for monitoring virus targeted therapy. Here we describe a multi-gene quantitative RT-PCR profiling method that simultaneously detects a broad spectrum (n=16) of crucial latent and lytic EBV transcripts. These transcripts include (but are not restricted to), EBNA1, EBNA2, LMP1, LMP2, BARTs, EBER1, BARF1 and ZEBRA, Rta, BGLF4 (PK), BXLF1 (TK) and BFRF3 (VCAp18) all of which have been implicated in EBV-driven oncogenesis and viral replication. With this method we determine the amount of RNA copies per infected (tumor) cell in bulk populations of various origin. While we confirm the expected RNA profiles within classic EBV latency programs, this sensitive quantitative approach revealed the presence of rare cells undergoing lytic replication. Inducing lytic replication in EBV tumor cells supports apoptosis and is considered as therapeutic approach to treat EBV-driven malignancies. This sensitive multi-primed quantitative RT-PCR approach can provide broader understanding of transcriptional activity in latent and lytic EBV infection and is suitable for monitoring virus-specific therapy responses in patients with EBV associated cancers. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species

    Science.gov (United States)

    Remarkable advances in next-generation sequencing (NGS) technologies, bioinformatics algorithms, and computational technologies have significantly accelerated genomic research. However, complicated NGS data analysis still remains as a major bottleneck. RNA-seq, as one of the major area in the NGS fi...

  11. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases

    Science.gov (United States)

    Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from RNA in RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030

  12. Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing

    DEFF Research Database (Denmark)

    Willenbrock, Hanni; Salomon, Jesper; Søkilde, Rolf

    2009-01-01

    Recently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two...... technologies have only been compared based on biological data, leading to the conclusion that, although they are somewhat correlated, expression values differ significantly. Here, we use synthetic RNA samples, resembling human microRNA samples, to find that microarray expression measures actually correlate...... better with sample RNA content than expression measures obtained from sequencing data. In addition, microarrays appear highly sensitive and perform equivalently to next-generation sequencing in terms of reproducibility and relative ratio quantification....

  13. Bioinformatics analysis of RNA-seq data revealed critical genes in colon adenocarcinoma.

    Science.gov (United States)

    Xi, W-D; Liu, Y-J; Sun, X-B; Shan, J; Yi, L; Zhang, T-T

    2017-07-01

    RNA-seq data of colon adenocarcinoma (COAD) were analyzed with bioinformatics tools to discover critical genes in the disease. Relevant small molecule drugs, transcription factors (TFs) and microRNAs (miRNAs) were also investigated. RNA-seq data of COAD were downloaded from The Cancer Genome Atlas (TCGA). Differential analysis was performed with package edgeR. False positive discovery (FDR) 1 were set as the cut-offs to screen out differentially expressed genes (DEGs). Gene coexpression network was constructed with package Ebcoexpress. GO enrichment analysis was performed for the DEGs in the gene coexpression network with DAVID. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was also performed for the genes with KOBASS 2.0. Modules were identified with MCODE of Cytoscape. Relevant small molecules drugs were predicted by Connectivity map. Relevant miRNAs and TFs were searched by WebGestalt. A total of 457 DEGs, including 255 up-regulated and 202 down-regulated genes, were identified from 437 COAD and 39 control samples. A gene coexpression network was constructed containing 40 DEGs and 101 edges. The genes were mainly associated with collagen fibril organization, extracellular matrix organization and translation. Two modules were identified from the gene coexpression network, which were implicated in muscle contraction and extracellular matrix organization, respectively. Several critical genes were disclosed, such as MYH11, COL5A2 and ribosomal proteins. Nine relevant small molecule drugs were identified, such as scriptaid and STOCK1N-35874. Accordingly, a total of 17 TFs and 10 miRNAs related to COAD were acquired, such as ETS2, NFAT, AP4, miR-124A, MiR-9, miR-96 and let-7. Several critical genes and relevant drugs, TFs and miRNAs were revealed in COAD. These findings could advance the understanding of the disease and benefit therapy development.

  14. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology.

    Science.gov (United States)

    Pareek, Chandra Shekhar; Smoczyński, Rafał; Kadarmideen, Haja N; Dziuba, Piotr; Błaszczyk, Paweł; Sikora, Marcin; Walendzik, Paulina; Grzybowski, Tomasz; Pierzchała, Mariusz; Horbańczuk, Jarosław; Szostak, Agnieszka; Ogluszka, Magdalena; Zwierzchowski, Lech; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Wąsowicz, Krzysztof; Gelfand, Brian; Feng, Yaping; Kumar, Dibyendu

    2016-01-01

    Examination of bovine pituitary gland transcriptome by strand-specific RNA-seq allows detection of putative single nucleotide polymorphisms (SNPs) within potential candidate genes (CGs) or QTLs regions as well as to understand the genomics variations that contribute to economic trait. Here we report a breed-specific model to successfully perform the detection of SNPs in the pituitary gland of young growing bulls representing Polish Holstein-Friesian (HF), Polish Red, and Hereford breeds at three developmental ages viz., six months, nine months, and twelve months. A total of 18 bovine pituitary gland polyA transcriptome libraries were prepared and sequenced using the Illumina NextSeq 500 platform. Sequenced FastQ databases of all 18 young bulls were submitted to NCBI-SRA database with NCBI-SRA accession numbers SRS1296732. For the investigated young bulls, a total of 113,882,3098 raw paired-end reads with a length of 156 bases were obtained, resulting in an approximately 63 million paired-end reads per library. Breed-wise, a total of 515.38, 215.39, and 408.04 million paired-end reads were obtained for Polish HF, Polish Red, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed 93.04%, 94.39%, and 83.46% of the mapped sequencing reads were properly paired to the Polish HF, Polish Red, and Hereford breeds, respectively. Constructed breed-specific SNP-db of three cattle breeds yielded at 13,775,885 SNPs. On an average 765,326 breed-specific SNPs per young bull were identified. Using two stringent filtering parameters, i.e., a minimum 10 SNP reads per base with an accuracy ≥ 90% and a minimum 10 SNP reads per base with an accuracy = 100%, SNP-db records were trimmed to construct a highly reliable SNP-db. This resulted in a reduction of 95,7% and 96,4% cut-off mark of constructed raw SNP-db. Finally, SNP discoveries using RNA-Seq data were validated by KASP™ SNP genotyping assay. The comprehensive QTLs/CGs analysis of 76 QTLs

  15. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology.

    Directory of Open Access Journals (Sweden)

    Chandra Shekhar Pareek

    Full Text Available Examination of bovine pituitary gland transcriptome by strand-specific RNA-seq allows detection of putative single nucleotide polymorphisms (SNPs within potential candidate genes (CGs or QTLs regions as well as to understand the genomics variations that contribute to economic trait. Here we report a breed-specific model to successfully perform the detection of SNPs in the pituitary gland of young growing bulls representing Polish Holstein-Friesian (HF, Polish Red, and Hereford breeds at three developmental ages viz., six months, nine months, and twelve months. A total of 18 bovine pituitary gland polyA transcriptome libraries were prepared and sequenced using the Illumina NextSeq 500 platform. Sequenced FastQ databases of all 18 young bulls were submitted to NCBI-SRA database with NCBI-SRA accession numbers SRS1296732. For the investigated young bulls, a total of 113,882,3098 raw paired-end reads with a length of 156 bases were obtained, resulting in an approximately 63 million paired-end reads per library. Breed-wise, a total of 515.38, 215.39, and 408.04 million paired-end reads were obtained for Polish HF, Polish Red, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA read alignments showed 93.04%, 94.39%, and 83.46% of the mapped sequencing reads were properly paired to the Polish HF, Polish Red, and Hereford breeds, respectively. Constructed breed-specific SNP-db of three cattle breeds yielded at 13,775,885 SNPs. On an average 765,326 breed-specific SNPs per young bull were identified. Using two stringent filtering parameters, i.e., a minimum 10 SNP reads per base with an accuracy ≥ 90% and a minimum 10 SNP reads per base with an accuracy = 100%, SNP-db records were trimmed to construct a highly reliable SNP-db. This resulted in a reduction of 95,7% and 96,4% cut-off mark of constructed raw SNP-db. Finally, SNP discoveries using RNA-Seq data were validated by KASP™ SNP genotyping assay. The comprehensive QTLs/CGs analysis

  16. Massively parallel sequencing, aCGH, and RNA-Seq technologies provide a comprehensive molecular diagnosis of Fanconi anemia.

    Science.gov (United States)

    Chandrasekharappa, Settara C; Lach, Francis P; Kimble, Danielle C; Kamat, Aparna; Teer, Jamie K; Donovan, Frank X; Flynn, Elizabeth; Sen, Shurjo K; Thongthip, Supawat; Sanborn, Erica; Smogorzewska, Agata; Auerbach, Arleen D; Ostrander, Elaine A

    2013-05-30

    Current methods for detecting mutations in Fanconi anemia (FA)-suspected patients are inefficient and often miss mutations. We have applied recent advances in DNA sequencing and genomic capture to the diagnosis of FA. Specifically, we used custom molecular inversion probes or TruSeq-enrichment oligos to capture and sequence FA and related genes, including introns, from 27 samples from the International Fanconi Anemia Registry at The Rockefeller University. DNA sequencing was complemented with custom array comparative genomic hybridization (aCGH) and RNA sequencing (RNA-seq) analysis. aCGH identified deletions/duplications in 4 different FA genes. RNA-seq analysis revealed lack of allele specific expression associated with a deletion and splicing defects caused by missense, synonymous, and deep-in-intron variants. The combination of TruSeq-targeted capture, aCGH, and RNA-seq enabled us to identify the complementation group and biallelic germline mutations in all 27 families: FANCA (7), FANCB (3), FANCC (3), FANCD1 (1), FANCD2 (3), FANCF (2), FANCG (2), FANCI (1), FANCJ (2), and FANCL (3). FANCC mutations are often the cause of FA in patients of Ashkenazi Jewish (AJ) ancestry, and we identified 2 novel FANCC mutations in 2 patients of AJ ancestry. We describe here a strategy for efficient molecular diagnosis of FA.

  17. Quark enables semi-reference-based compression of RNA-seq data.

    Science.gov (United States)

    Sarkar, Hirak; Patro, Rob

    2017-11-01

    The past decade has seen an exponential increase in biological sequencing capacity, and there has been a simultaneous effort to help organize and archive some of the vast quantities of sequencing data that are being generated. Although these developments are tremendous from the perspective of maximizing the scientific utility of available data, they come with heavy costs. The storage and transmission of such vast amounts of sequencing data is expensive. We present Quark, a semi-reference-based compression tool designed for RNA-seq data. Quark makes use of a reference sequence when encoding reads, but produces a representation that can be decoded independently, without the need for a reference. This allows Quark to achieve markedly better compression rates than existing reference-free schemes, while still relieving the burden of assuming a specific, shared reference sequence between the encoder and decoder. We demonstrate that Quark achieves state-of-the-art compression rates, and that, typically, only a small fraction of the reference sequence must be encoded along with the reads to allow reference-free decompression. Quark is implemented in C ++11, and is available under a GPLv3 license at www.github.com/COMBINE-lab/quark. rob.patro@cs.stonybrook.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. SPIRE, a modular pipeline for eQTL analysis of RNA-Seq data, reveals a regulatory hotspot controlling miRNA expression in C. elegans.

    Science.gov (United States)

    Kel, Ivan; Chang, Zisong; Galluccio, Nadia; Romeo, Margherita; Beretta, Stefano; Diomede, Luisa; Mezzelani, Alessandra; Milanesi, Luciano; Dieterich, Christoph; Merelli, Ivan

    2016-10-18

    The interpretation of genome-wide association study is difficult, as it is hard to understand how polymorphisms can affect gene regulation, in particular for trans-regulatory elements located far from their controlling gene. Using RNA or protein expression data as phenotypes, it is possible to correlate their variations with specific genotypes. This technique is usually referred to as expression Quantitative Trait Loci (eQTLs) analysis and only few packages exist for the integration of genotype patterns and expression profiles. In particular, tools are needed for the analysis of next-generation sequencing (NGS) data on a genome-wide scale, which is essential to identify eQTLs able to control a large number of genes (hotspots). Here we present SPIRE (Software for Polymorphism Identification Regulating Expression), a generic, modular and functionally highly flexible pipeline for eQTL processing. SPIRE integrates different univariate and multivariate approaches for eQTL analysis, paying particular attention to the scalability of the procedure in order to support cis- as well as trans-mapping, thus allowing the identification of hotspots in NGS data. In particular, we demonstrated how SPIRE can handle big association study datasets, reproducing published results and improving the identification of trans-eQTLs. Furthermore, we employed the pipeline to analyse novel data concerning the genotypes of two different C. elegans strains (N2 and Hawaii) and related miRNA expression data, obtained using RNA-Seq. A miRNA regulatory hotspot was identified in chromosome 1, overlapping the transcription factor grh-1, known to be involved in the early phases of embryonic development of C. elegans. In a follow-up qPCR experiment we were able to verify most of the predicted eQTLs, as well as to show, for a novel miRNA, a significant difference in the sequences of the two analysed strains of C. elegans. SPIRE is publicly available as open source software at , together with some example

  19. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq.

    Science.gov (United States)

    Reinius, Björn; Mold, Jeff E; Ramsköld, Daniel; Deng, Qiaolin; Johnsson, Per; Michaëlsson, Jakob; Frisén, Jonas; Sandberg, Rickard

    2016-11-01

    Cellular heterogeneity can emerge from the expression of only one parental allele. However, it has remained controversial whether, or to what degree, random monoallelic expression of autosomal genes (aRME) is mitotically inherited (clonal) or stochastic (dynamic) in somatic cells, particularly in vivo. Here we used allele-sensitive single-cell RNA-seq on clonal primary mouse fibroblasts and freshly isolated human CD8 + T cells to dissect clonal and dynamic monoallelic expression patterns. Dynamic aRME affected a considerable portion of the cells' transcriptomes, with levels dependent on the cells' transcriptional activity. Notably, clonal aRME was detected, but it was surprisingly scarce (aRME occurs transiently within individual cells, and patterns of aRME are thus primarily scattered throughout somatic cell populations rather than, as previously hypothesized, confined to patches of clonally related cells.

  20. Genome-wide specificity of DNA binding, gene regulation, and chromatin remodeling by TALE- and CRISPR/Cas9-based transcriptional activators.

    Science.gov (United States)

    Polstein, Lauren R; Perez-Pinera, Pablo; Kocak, D Dewran; Vockley, Christopher M; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E; Reddy, Timothy E; Gersbach, Charles A

    2015-08-01

    Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. © 2015 Polstein et al.; Published by Cold Spring Harbor Laboratory Press.

  1. RNA-Seq reveals MicroRNA expression signature and genetic polymorphism associated with growth and muscle quality traits in rainbow trout

    Science.gov (United States)

    The role of microRNA expression and genetic variation in microRNA-binding sites of target genes on growth and muscle quality traits is poorly characterized. We used RNA-Seq approach to investigate their importance on 5 growth and muscle quality traits: whole body weight (WBW), muscle yield, muscle c...

  2. Identification of the miRNA-mRNA regulatory network of small cell osteosarcoma based on RNA-seq.

    Science.gov (United States)

    Xie, Lin; Liao, Yedan; Shen, Lida; Hu, Fengdi; Yu, Sunlin; Zhou, Yonghong; Zhang, Ya; Yang, Yihao; Li, Dongqi; Ren, Minyan; Yuan, Zhongqin; Yang, Zuozhang

    2017-06-27

    Small cell osteosarcoma (SCO) is a rare subtype of osteosarcoma characterized by highly aggressive progression and a poor prognosis. The miRNA and mRNA expression profiles of peripheral blood mononuclear cells (PBMCs) were obtained in 3 patients with SCO and 10 healthy individuals using high-throughput RNA-sequencing. We identified 37 dysregulated miRNAs and 1636 dysregulated mRNAs in patients with SCO compared to the healthy controls. Specifically, the 37 dysregulated miRNAs consisted of 27 up-regulated miRNAs and 10 down-regulated miRNAs; the 1636 dysregulated mRNAs consisted of 555 up-regulated mRNAs and 1081 down-regulated mRNAs. The target-genes of miRNAs were predicted, and 1334 negative correlations between miRNAs and mRNAs were used to construct an miRNA-mRNA regulatory network. Dysregulated genes were significantly enriched in pathways related to cancer, mTOR signaling and cell cycle signaling. Specifically, hsa-miR-26b-5p, hsa-miR-221-3p and hsa-miR-125b-2-3p were significantly dysregulated miRNAs and exhibited a high degree of connectivity with target genes. Overall, the expression of dysregulated genes in tumor tissues and peripheral blood samples of patients with SCO measured by quantitative real-time polymerase chain reaction corroborated with our bioinformatics analyses based on the expression profiles of PBMCs from patients with SCO. Thus, hsa-miR-26b-5p, hsa-miR-221-3p and hsa-miR-125b-2-3p may be involved in SCO tumorigenesis.

  3. Processivity and coupling in messenger RNA transcription.

    Directory of Open Access Journals (Sweden)

    Stuart Aitken

    2010-01-01

    Full Text Available The complexity of messenger RNA processing is now being uncovered by experimental techniques that are capable of detecting individual copies of mRNA in cells, and by quantitative real-time observations that reveal the kinetics. This processing is commonly modelled by permitting mRNA to be transcribed only when the promoter is in the on state. In this simple on/off model, the many processes involved in active transcription are represented by a single reaction. These processes include elongation, which has a minimum time for completion and processing that is not captured in the model.In this paper, we explore the impact on the mRNA distribution of representing the elongation process in more detail. Consideration of the mechanisms of elongation leads to two alternative models of the coupling between the elongating polymerase and the state of the promoter: Processivity allows polymerases to complete elongation irrespective of the promoter state, whereas coupling requires the promoter to be active to produce a full-length transcript. We demonstrate that these alternatives have a significant impact on the predicted distributions. Models are simulated by the Gillespie algorithm, and the third and fourth moments of the resulting distribution are computed in order to characterise the length of the tail, and sharpness of the peak. By this methodology, we show that the moments provide a concise summary of the distribution, showing statistically-significant differences across much of the feasible parameter range.We conclude that processivity is not fully consistent with the on/off model unless the probability of successfully completing elongation is low--as has been observed. The results also suggest that some form of coupling between the promoter and a rate-limiting step in transcription may explain the cell's inability to maintain high mRNA levels at low noise--a prediction of the on/off model that has no supporting evidence.

  4. Identification and validation of differentially expressed transcripts by RNA-sequencing of formalin-fixed, paraffin-embedded (FFPE) lung tissue from patients with Idiopathic Pulmonary Fibrosis.

    Science.gov (United States)

    Vukmirovic, Milica; Herazo-Maya, Jose D; Blackmon, John; Skodric-Trifunovic, Vesna; Jovanovic, Dragana; Pavlovic, Sonja; Stojsic, Jelena; Zeljkovic, Vesna; Yan, Xiting; Homer, Robert; Stefanovic, Branko; Kaminski, Naftali

    2017-01-12

    Idiopathic Pulmonary Fibrosis (IPF) is a lethal lung disease of unknown etiology. A major limitation in transcriptomic profiling of lung tissue in IPF has been a dependence on snap-frozen fresh tissues (FF). In this project we sought to determine whether genome scale transcript profiling using RNA Sequencing (RNA-Seq) could be applied to archived Formalin-Fixed Paraffin-Embedded (FFPE) IPF tissues. We isolated total RNA from 7 IPF and 5 control FFPE lung tissues and performed 50 base pair paired-end sequencing on Illumina 2000 HiSeq. TopHat2 was used to map sequencing reads to the human genome. On average ~62 million reads (53.4% of ~116 million reads) were mapped per sample. 4,131 genes were differentially expressed between IPF and controls (1,920 increased and 2,211 decreased (FDR < 0.05). We compared our results to differentially expressed genes calculated from a previously published dataset generated from FF tissues analyzed on Agilent microarrays (GSE47460). The overlap of differentially expressed genes was very high (760 increased and 1,413 decreased, FDR < 0.05). Only 92 differentially expressed genes changed in opposite directions. Pathway enrichment analysis performed using MetaCore confirmed numerous IPF relevant genes and pathways including extracellular remodeling, TGF-beta, and WNT. Gene network analysis of MMP7, a highly differentially expressed gene in both datasets, revealed the same canonical pathways and gene network candidates in RNA-Seq and microarray data. For validation by NanoString nCounter® we selected 35 genes that had a fold change of 2 in at least one dataset (10 discordant, 10 significantly differentially expressed in one dataset only and 15 concordant genes). High concordance of fold change and FDR was observed for each type of the samples (FF vs FFPE) with both microarrays (r = 0.92) and RNA-Seq (r = 0.90) and the number of discordant genes was reduced to four. Our results demonstrate that RNA sequencing of RNA

  5. Quantitative analysis of dengue-2 virus RNA during the extrinsic incubation period in individual Aedes aegypti.

    Science.gov (United States)

    Richardson, Jason; Molina-Cruz, Alvaro; Salazar, Ma Isabel; Black, William

    2006-01-01

    Dengue virus-2 (DENV-2) RNA was quantified from the midgut and legs of individual Aedes aegypti at each of 14 days postinfectious blood meal (dpi) in a DENV-2 susceptible strain from Chetumal, Mexico. A SYBR Green I based strand-specific, quantitative real-time reverse transcription-polymerase chain reaction (RT-PCR) assay was developed. The lower detection and quantitation limits were 20 and 200 copies per reaction, respectively. Amounts of positive and negative strand viral RNA strands were correlated. Numbers of plaque-forming units (PFU) were correlated with DENV-2 RNA copy number in both C6/36 cell cultures and mosquitoes. PFU were consistently lower than RNA copy number by 2-3 log(10). Midgut levels of DENV-2 RNA peaked 8 dpi and fluctuated erratically between 6 and 9 dpi. Copies of DENV-2 RNA varied significantly among infected mosquitoes at each time point. Quantitative real-time RT-PCR is a convenient and reliable method that provides new insights into virus-vector interactions.

  6. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud.

    Directory of Open Access Journals (Sweden)

    Malachi Griffith

    2015-08-01

    Full Text Available Massively parallel RNA sequencing (RNA-seq has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.

  7. RNA sequencing analysis reveals new findings of hyperbaric oxygen treatment on rats with acute carbon monoxide poisoning.

    Science.gov (United States)

    Wang, Wenlan; Xue, Li; Li, Ya; Li, Rong; Xie, Xiaoping; Bao, Junxiang; Hai, Chunxu; Li, Jinsheng

    2016-01-01

    To elucidate the altered gene network in the brains of carbon monoxide (CO) poisoned rats after treatment with hyperbaric oxygen (HBO₂). RNA sequencing (RNA-seq) analysis was performed to examine differentially expressed genes (DEGs) in brain tissue samples from nine male rats: a normal control group; a CO poisoning group; and an HBO₂ treatment group (three rats/group). Reverse transcription polymerase chain reaction (RT-PCR) and real-time quantitative PCR were used for validation of the DEGs in another 18 male rats (six rats/group). RNA-seq revealed that two genes were upregulated (4.18 and 8.76 log to the base 2 fold change) (p⟨0.05) in the CO-poisoned rats relative to the control rats; two genes were upregulated (3.88 and 7.69 log to the base 2 fold change); and 23 genes were downregulated (3.49-15.12 log to the base 2 fold change) (p⟨0.05) in the brains of the HBO₂-treated rats relative to the CO-poisoned rats. Target prediction of DEGs by gene network analysis and analysis of pathways affected suggested that regulation of gene expressions of dopamine metabolism and nitric oxide (NO) synthesis were significantly affected by CO poisoning and HBO₂ treatment. Results of RT-PCR and real-time quantitative PCR indicated that four genes (Pomc, GH-1, Pr1 and Fshβ) associated with hormone secretion in the hypothalamic-pituitary system have potential as markers for prognosis of CO. This study is the first RNA-seq analysis profile of HBO₂ treatment on rats with acute CO poisoning. It concludes that changes of hormone secretion in the hypothalamic-pituitary system, dopamine metabolism and NO synthesis involved in brain damage and behavior abnormalities after CO poisoning and HBO₂ therapy may regulate these changes.

  8. Computational Investigations of Post-Transcriptional Regulation

    DEFF Research Database (Denmark)

    Rasmussen, Simon Horskjær

    and miRNA regulation was studied by cross-linking immunoprecipitation (CLIP) and RBP double knockdown experiments. A comprehensive analysis of 107 CLIP datasets of 49 RBPs demonstrated that RBPs modulate miRNA regulation. Results suggest it is mediated by RBP-binding hotspots that likely...... investigated using high-throughput data. Analysis of IMP RIP-seq, iCLIP and RNA-seq datasets identified transcripts associated with cytoplasmic IMP ribonucleoproteins. Many of these transcripts were functionally involved in actin cytoskeletal remodeling. Further analyses of this data permitted estimation...... of a bipartite motif, composed of an AU-rich and a CA-rich domain. In addition, a regulatory motif discovery method was developed and applied to identify motifs using differential expression data and CLIP-data in the above investigations. This thesis increased the understanding of the role of RBPs in mi...

  9. An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile.

    Science.gov (United States)

    Prakash, Celine; Haeseler, Arndt Von

    2017-03-01

    RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.

  10. Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

    Science.gov (United States)

    Thimmaiah, Tim; Voje, William E; Carothers, James M

    2015-01-01

    With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.

  11. Transcriptomics-based analysis using RNA-Seq of the coconut (Cocos nucifera) leaf in response to yellow decline phytoplasma infection.

    Science.gov (United States)

    Nejat, Naghmeh; Cahill, David M; Vadamalai, Ganesan; Ziemann, Mark; Rookes, James; Naderali, Neda

    2015-10-01

    Invasive phytoplasmas wreak havoc on coconut palms worldwide, leading to high loss of income, food insecurity and extreme poverty of farmers in producing countries. Phytoplasmas as strictly biotrophic insect-transmitted bacterial pathogens instigate distinct changes in developmental processes and defence responses of the infected plants and manipulate plants to their own advantage; however, little is known about the cellular and molecular mechanisms underlying host-phytoplasma interactions. Further, phytoplasma-mediated transcriptional alterations in coconut palm genes have not yet been identified. This study evaluated the whole transcriptome profiles of naturally infected leaves of Cocos nucifera ecotype Malayan Red Dwarf in response to yellow decline phytoplasma from group 16SrXIV, using RNA-Seq technique. Transcriptomics-based analysis reported here identified genes involved in coconut innate immunity. The number of down-regulated genes in response to phytoplasma infection exceeded the number of genes up-regulated. Of the 39,873 differentially expressed unigenes, 21,860 unigenes were suppressed and 18,013 were induced following infection. Comparative analysis revealed that genes associated with defence signalling against biotic stimuli were significantly overexpressed in phytoplasma-infected leaves versus healthy coconut leaves. Genes involving cell rescue and defence, cellular transport, oxidative stress, hormone stimulus and metabolism, photosynthesis reduction, transcription and biosynthesis of secondary metabolites were differentially represented. Our transcriptome analysis unveiled a core set of genes associated with defence of coconut in response to phytoplasma attack, although several novel defence response candidate genes with unknown function have also been identified. This study constitutes valuable sequence resource for uncovering the resistance genes and/or susceptibility genes which can be used as genetic tools in disease resistance breeding.

  12. Downregulation of rRNA transcription triggers cell differentiation.

    Directory of Open Access Journals (Sweden)

    Yuki Hayashi

    Full Text Available Responding to various stimuli is indispensable for the maintenance of homeostasis. The downregulation of ribosomal RNA (rRNA transcription is one of the mechanisms involved in the response to stimuli by various cellular processes, such as cell cycle arrest and apoptosis. Cell differentiation is caused by intra- and extracellular stimuli and is associated with the downregulation of rRNA transcription as well as reduced cell growth. The downregulation of rRNA transcription during differentiation is considered to contribute to reduced cell growth. However, the downregulation of rRNA transcription can induce various cellular processes; therefore, it may positively regulate cell differentiation. To test this possibility, we specifically downregulated rRNA transcription using actinomycin D or a siRNA for Pol I-specific transcription factor IA (TIF-IA in HL-60 and THP-1 cells, both of which have differentiation potential. The inhibition of rRNA transcription induced cell differentiation in both cell lines, which was demonstrated by the expression of the common differentiation marker CD11b. Furthermore, TIF-IA knockdown in an ex vivo culture of mouse hematopoietic stem cells increased the percentage of myeloid cells and reduced the percentage of immature cells. We also evaluated whether differentiation was induced via the inhibition of cell cycle progression because rRNA transcription is tightly coupled to cell growth. We found that cell cycle arrest without affecting rRNA transcription did not induce differentiation. To the best of our knowledge, our results demonstrate the first time that the downregulation of rRNA levels could be a trigger for the induction of differentiation in mammalian cells. Furthermore, this phenomenon was not simply a reflection of cell cycle arrest. Our results provide a novel insight into the relationship between rRNA transcription and cell differentiation.

  13. RNA-seq analyses reveal insights into the function of respiratory nitrate reductase of the diazotroph Herbaspirillum seropedicae.

    Science.gov (United States)

    Bonato, Paloma; Batista, Marcelo B; Camilios-Neto, Doumit; Pankievicz, Vânia C S; Tadra-Sfeir, Michelle Z; Monteiro, Rose Adele; Pedrosa, Fabio O; Souza, Emanuel M; Chubatsu, Leda S; Wassem, Roseli; Rigo, Liu Un

    2016-09-01

    Herbaspirillum seropedicae is a nitrogen-fixing β-proteobacterium that associates with roots of gramineous plants. In silico analyses revealed that H. seropedicae genome has genes encoding a putative respiratory (NAR) and an assimilatory nitrate reductase (NAS). To date, little is known about nitrate metabolism in H. seropedicae, and, as this bacterium cannot respire nitrate, the function of NAR remains unknown. This study aimed to investigate the function of NAR in H. seropedicae and how it metabolizes nitrate in a low aerated-condition. RNA-seq transcriptional profiling in the presence of nitrate allowed us to pinpoint genes important for nitrate metabolism in H. seropedicae, including nitrate transporters and regulatory proteins. Additionally, both RNA-seq data and physiological characterization of a mutant in the catalytic subunit of NAR (narG mutant) showed that NAR is not required for nitrate assimilation but is required for: (i) production of high levels of nitrite, (ii) production of NO and (iii) dissipation of redox power, which in turn lead to an increase in carbon consumption. In addition, wheat plants showed an increase in shoot dry weight only when inoculated with H. seropedicae wild type, but not with the narG mutant, suggesting that NAR is important to H. seropedicae-wheat interaction. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

  14. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.

    Science.gov (United States)

    Evans, Ciaran; Hardin, Johanna; Stoebel, Daniel M

    2017-02-27

    RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  15. Regulatory complexity revealed by integrated cytological and RNA-seq analyses of meiotic substages in mouse spermatocytes.

    Science.gov (United States)

    Ball, Robyn L; Fujiwara, Yasuhiro; Sun, Fengyun; Hu, Jianjun; Hibbs, Matthew A; Handel, Mary Ann; Carter, Gregory W

    2016-08-12

    The continuous and non-synchronous nature of postnatal male germ-cell development has impeded stage-specific resolution of molecular events of mammalian meiotic prophase in the testis. Here the juvenile onset of spermatogenesis in mice is analyzed by combining cytological and transcriptomic data in a novel computational analysis that allows decomposition of the transcriptional programs of spermatogonia and meiotic prophase substages. Germ cells from testes of individual mice were obtained at two-day intervals from 8 to 18 days post-partum (dpp), prepared as surface-spread chromatin and immunolabeled for meiotic stage-specific protein markers (STRA8, SYCP3, phosphorylated H2AFX, and HISTH1T). Eight stages were discriminated cytologically by combinatorial antibody labeling, and RNA-seq was performed on the same samples. Independent principal component analyses of cytological and transcriptomic data yielded similar patterns for both data types, providing strong evidence for substage-specific gene expression signatures. A novel permutation-based maximum covariance analysis (PMCA) was developed to map co-expressed transcripts to one or more of the eight meiotic prophase substages, thereby linking distinct molecular programs to cytologically defined cell states. Expression of meiosis-specific genes is not substage-limited, suggesting regulation of substage transitions at other levels. This integrated analysis provides a general method for resolving complex cell populations. Here it revealed not only features of meiotic substage-specific gene expression, but also a network of substage-specific transcription factors and relationships to potential target genes.

  16. Dual RNA-seq transcriptional analysis of wheat roots colonized by Azospirillum brasilense reveals up-regulation of nutrient acquisition and cell cycle genes.

    Science.gov (United States)

    Camilios-Neto, Doumit; Bonato, Paloma; Wassem, Roseli; Tadra-Sfeir, Michelle Z; Brusamarello-Santos, Liziane C C; Valdameri, Glaucio; Donatti, Lucélia; Faoro, Helisson; Weiss, Vinicius A; Chubatsu, Leda S; Pedrosa, Fábio O; Souza, Emanuel M

    2014-05-16

    The rapid growth of the world's population demands an increase in food production that no longer can be reached by increasing amounts of nitrogenous fertilizers. Plant growth promoting bacteria (PGPB) might be an alternative to increase nitrogenous use efficiency (NUE) in important crops such wheat. Azospirillum brasilense is one of the most promising PGPB and wheat roots colonized by A. brasilense is a good model to investigate the molecular basis of plant-PGPB interaction including improvement in plant-NUE promoted by PGPB. We performed a dual RNA-Seq transcriptional profiling of wheat roots colonized by A. brasilense strain FP2. cDNA libraries from biological replicates of colonized and non-inoculated wheat roots were sequenced and mapped to wheat and A. brasilense reference sequences. The unmapped reads were assembled de novo. Overall, we identified 23,215 wheat expressed ESTs and 702 A. brasilense expressed transcripts. Bacterial colonization caused changes in the expression of 776 wheat ESTs belonging to various functional categories, ranging from transport activity to biological regulation as well as defense mechanism, production of phytohormones and phytochemicals. In addition, genes encoding proteins related to bacterial chemotaxi, biofilm formation and nitrogen fixation were highly expressed in the sub-set of A. brasilense expressed genes. PGPB colonization enhanced the expression of plant genes related to nutrient up-take, nitrogen assimilation, DNA replication and regulation of cell division, which is consistent with a higher proportion of colonized root cells in the S-phase. Our data support the use of PGPB as an alternative to improve nutrient acquisition in important crops such as wheat, enhancing plant productivity and sustainability.

  17. Mapping Mammalian Cell-type-specific Transcriptional Regulatory Networks Using KD-CAGE and ChIP-seq Data in the TC-YIK Cell Line

    Science.gov (United States)

    Lizio, Marina; Ishizu, Yuri; Itoh, Masayoshi; Lassmann, Timo; Hasegawa, Akira; Kubosaki, Atsutaka; Severin, Jessica; Kawaji, Hideya; Nakamura, Yukio; Suzuki, Harukazu; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R. R.

    2015-01-01

    Mammals are composed of hundreds of different cell types with specialized functions. Each of these cellular phenotypes are controlled by different combinations of transcription factors. Using a human non islet cell insulinoma cell line (TC-YIK) which expresses insulin and the majority of known pancreatic beta cell specific genes as an example, we describe a general approach to identify key cell-type-specific transcription factors (TFs) and their direct and indirect targets. By ranking all human TFs by their level of enriched expression in TC-YIK relative to a broad collection of samples (FANTOM5), we confirmed known key regulators of pancreatic function and development. Systematic siRNA mediated perturbation of these TFs followed by qRT-PCR revealed their interconnections with NEUROD1 at the top of the regulation hierarchy and its depletion drastically reducing insulin levels. For 15 of the TF knock-downs (KD), we then used Cap Analysis of Gene Expression (CAGE) to identify thousands of their targets genome-wide (KD-CAGE). The data confirm NEUROD1 as a key positive regulator in the transcriptional regulatory network (TRN), and ISL1, and PROX1 as antagonists. As a complimentary approach we used ChIP-seq on four of these factors to identify NEUROD1, LMX1A, PAX6, and RFX6 binding sites in the human genome. Examining the overlap between genes perturbed in the KD-CAGE experiments and genes with a ChIP-seq peak within 50 kb of their promoter, we identified direct transcriptional targets of these TFs. Integration of KD-CAGE and ChIP-seq data shows that both NEUROD1 and LMX1A work as the main transcriptional activators. In the core TRN (i.e., TF-TF only), NEUROD1 directly transcriptionally activates the pancreatic TFs HSF4, INSM1, MLXIPL, MYT1, NKX6-3, ONECUT2, PAX4, PROX1, RFX6, ST18, DACH1, and SHOX2, while LMX1A directly transcriptionally activates DACH1, SHOX2, PAX6, and PDX1. Analysis of these complementary datasets suggests the need for caution in interpreting ChIP-seq

  18. Transcriptional and post-transcriptional regulation of nucleotide excision repair genes in human cells

    Energy Technology Data Exchange (ETDEWEB)

    Lefkofsky, Hailey B. [Translational Oncology Program, University of Michigan Medical School, Ann Arbor, MI (United States); Veloso, Artur [Translational Oncology Program, University of Michigan Medical School, Ann Arbor, MI (United States); Department of Radiation Oncology, University of Michigan Medical School, Ann Arbor, MI (United States); Bioinformatics Program, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI (United States); Ljungman, Mats, E-mail: ljungman@umich.edu [Translational Oncology Program, University of Michigan Medical School, Ann Arbor, MI (United States); Department of Radiation Oncology, University of Michigan Medical School, Ann Arbor, MI (United States); Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, MI (United States)

    2015-06-15

    Nucleotide excision repair (NER) removes DNA helix-distorting lesions induced by UV light and various chemotherapeutic agents such as cisplatin. These lesions efficiently block the elongation of transcription and need to be rapidly removed by transcription-coupled NER (TC-NER) to avoid the induction of apoptosis. Twenty-nine genes have been classified to code for proteins participating in nucleotide excision repair (NER) in human cells. Here we explored the transcriptional and post-transcriptional regulation of these NER genes across 13 human cell lines using Bru-seq and BruChase-seq, respectively. Many NER genes are relatively large in size and therefore will be easily inactivated by UV-induced transcription-blocking lesions. Furthermore, many of these genes produce transcripts that are rather unstable. Thus, these genes are expected to rapidly lose expression leading to a diminished function of NER. One such gene is ERCC6 that codes for the CSB protein critical for TC-NER. Due to its large gene size and high RNA turnover rate, the ERCC6 gene may act as dosimeter of DNA damage so that at high levels of damage, ERCC6 RNA levels would be diminished leading to the loss of CSB expression, inhibition of TC-NER and the promotion of cell death.

  19. General lack of global dosage compensation in ZZ/ZW systems? Broadening the perspective with RNA-seq

    Directory of Open Access Journals (Sweden)

    Wolf Jochen BW

    2011-02-01

    Full Text Available Abstract Background Species with heteromorphic sex chromosomes face the challenge of large-scale imbalance in gene dose. Microarray-based studies in several independent male heterogametic XX/XY systems suggest that dosage compensation mechanisms are in place to mitigate the detrimental effects of gene dose differences. However, recent genomic research on female heterogametic ZZ/ZW systems has generated surprising results. In two bird species and one lepidopteran no evidence for a global dosage compensating mechanism has been found. The recent advent of massively parallel RNA sequencing now opens up the possibility to gauge the generality of this observation with a broader phylogenetic sampling. It further allows assessing the validity of microarray-based inference on dosage compensation with a novel technology. Results We here expemplify this approach using massively parallel sequencing on barcoded individuals of a bird species, the European crow (Corvus corone, where previously no genetic resources were available. Testing for Z-linkage with quantitative PCR (qPCR, we first establish that orthology with distantly related species (chicken, zebra finch can be used as a good predictor for chromosomal affiliation of a gene. We then use a digital measure of gene expression (RNA-seq on brain transcriptome and confirm a global lack of dosage compensation on the Z chromosome. RNA-seq estimates of male-to-female (m:f expression difference on the Z compare well to previous microarray-based estimates in birds and lepidopterans. The data further lends support that an up-regulation of female Z-linked genes conveys partial compensation and suggest a relationship between sex-bias and absolute expression level of a gene. Correlation of sex-biased gene expression on the Z chromosome across all three bird species further suggests that the degree of compensation has been partly conserved across 100 million years of avian evolution. Conclusions This work

  20. A Comparison of RNA-Seq Results from Paired Formalin-Fixed Paraffin-Embedded and Fresh-Frozen Glioblastoma Tissue Samples.

    Directory of Open Access Journals (Sweden)

    Anna Esteve-Codina

    Full Text Available The molecular classification of glioblastoma (GBM based on gene expression might better explain outcome and response to treatment than clinical factors. Whole transcriptome sequencing using next-generation sequencing platforms is rapidly becoming accepted as a tool for measuring gene expression for both research and clinical use. Fresh frozen (FF tissue specimens of GBM are difficult to obtain since tumor tissue obtained at surgery is often scarce and necrotic and diagnosis is prioritized over freezing. After diagnosis, leftover tissue is usually stored as formalin-fixed paraffin-embedded (FFPE tissue. However, RNA from FFPE tissues is usually degraded, which could hamper gene expression analysis. We compared RNA-Seq data obtained from matched pairs of FF and FFPE GBM specimens. Only three FFPE out of eleven FFPE-FF matched samples yielded informative results. Several quality-control measurements showed that RNA from FFPE samples was highly degraded but maintained transcriptomic similarities to RNA from FF samples. Certain issues regarding mutation analysis and subtype prediction were detected. Nevertheless, our results suggest that RNA-Seq of FFPE GBM specimens provides reliable gene expression data that can be used in molecular studies of GBM if the RNA is sufficiently preserved.

  1. A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq Data.

    Science.gov (United States)

    Choi, Yoonha; Coram, Marc; Peng, Jie; Tang, Hua

    2017-07-01

    Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.

  2. Identification of genes related to drought in native potatoes using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Roberto Lozano

    2014-03-01

    Full Text Available The recent advent RNA sequencing technology (RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to understand the expression profile of plants in response to biotic and abiotic stress. In this study, the mRNA was sequencing from leaves and roots of two native potato varieties at different levels of drought. Fifty-base-pair reads from whole mRNAs were mapped to the potato genomic sequence: 75 – 82% mapped uniquely to the genome, 6 – 14% mapped to several locations in the genome and 9 – 12% had no match in the genome. Comparing expression profiles, 887 to 1925 genes were found to be induced/repressed by drought in the sensible variety and 998 to 1995 in the tolerant. This research provides valuable information for future studies and deeper understanding of the molecular mechanism of drought resistance in potato and related species.

  3. Altered minor-groove hydrogen bonds in DNA block transcription elongation by T7 RNA polymerase.

    Science.gov (United States)

    Tanasova, Marina; Goeldi, Silvan; Meyer, Fabian; Hanawalt, Philip C; Spivak, Graciela; Sturla, Shana J

    2015-05-26

    DNA transcription depends upon the highly efficient and selective function of RNA polymerases (RNAPs). Modifications in the template DNA can impact the progression of RNA synthesis, and a number of DNA adducts, as well as abasic sites, arrest or stall transcription. Nonetheless, data are needed to understand why certain modifications to the structure of DNA bases stall RNA polymerases while others are efficiently bypassed. In this study, we evaluate the impact that alterations in dNTP/rNTP base-pair geometry have on transcription. T7 RNA polymerase was used to study transcription over modified purines and pyrimidines with altered H-bonding capacities. The results suggest that introducing wobble base-pairs into the DNA:RNA heteroduplex interferes with transcriptional elongation and stalls RNA polymerase. However, transcriptional stalling is not observed if mismatched base-pairs do not H-bond. Together, these studies show that RNAP is able to discriminate mismatches resulting in wobble base-pairs, and suggest that, in cases of modifications with minor steric impact, DNA:RNA heteroduplex geometry could serve as a controlling factor for initiating transcription-coupled DNA repair. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization.

    Science.gov (United States)

    Locati, Mauro D; Terpstra, Inez; de Leeuw, Wim C; Kuzak, Mateusz; Rauwerda, Han; Ensink, Wim A; van Leeuwen, Selina; Nehrdich, Ulrike; Spaink, Herman P; Jonker, Martijs J; Breit, Timo M; Dekker, Rob J

    2015-08-18

    There is an increasing interest in complementing RNA-seq experiments with small-RNA (sRNA) expression data to obtain a comprehensive view of a transcriptome. Currently, two main experimental challenges concerning sRNA-seq exist: how to check the size distribution of isolated sRNAs, given the sensitive size-selection steps in the protocol; and how to normalize data between samples, given the low complexity of sRNA types. We here present two separate sets of synthetic RNA spike-ins for monitoring size-selection and for performing data normalization in sRNA-seq. The size-range quality control (SRQC) spike-in set, consisting of 11 oligoribonucleotides (10-70 nucleotides), was tested by intentionally altering the size-selection protocol and verified via several comparative experiments. We demonstrate that the SRQC set is useful to reproducibly track down biases in the size-selection in sRNA-seq. The external reference for data-normalization (ERDN) spike-in set, consisting of 19 oligoribonucleotides, was developed for sample-to-sample normalization in differential-expression analysis of sRNA-seq data. Testing and applying the ERDN set showed that it can reproducibly detect differential expression over a dynamic range of 2(18). Hence, biological variation in sRNA composition and content between samples is preserved while technical variation is effectively minimized. Together, both spike-in sets can significantly improve the technical reproducibility of sRNA-seq. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Analysis of miRNA expression profiles in melatonin-exposed GC-1 spg cell line.

    Science.gov (United States)

    Zhu, Xiaoling; Chen, Shuxiong; Jiang, Yanwen; Xu, Ying; Zhao, Yun; Chen, Lu; Li, Chunjin; Zhou, Xu

    2018-02-05

    Melatonin is an endocrine neurohormone secreted by pinealocytes in the pineal gland. It exerts diverse physiological effects, such as circadian rhythm regulator and antioxidant. However, the functional importance of melatonin in spermatogenesis regulation remains unclear. The objectives of this study are to: (1) detect melatonin affection on miRNA expression profiles in GC-1 spg cells by miRNA deep sequencing (DeepSeq) and (2) define melatonin affected miRNA-mRNA interactions and associated biological processes using bioinformatics analysis. GC-1 spg cells were cultured with melatonin (10 -7 M) for 24h. DeepSeq data were validated using quantitative real-time reverse transcription polymerase chain reaction analysis (qRT-PCR). A total of 176 miRNA expressions were found to be significantly different between two groups (fold change of >2 or melatonin could regulate the expression of miRNA to perform its physiological effects in GC-1 spg cells. These results should be useful to investigate the biological function of miRNAs regulated by melatonin in spermatogenesis and testicular germ cell tumor. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. A deep learning method for lincRNA detection using auto-encoder algorithm.

    Science.gov (United States)

    Yu, Ning; Yu, Zeng; Pan, Yi

    2017-12-06

    RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly

  7. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor [version 2; referees: 1 approved, 4 approved with reservations

    Directory of Open Access Journals (Sweden)

    Aaron T.L. Lun

    2016-10-01

    Full Text Available Single-cell RNA sequencing (scRNA-seq is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

  8. Transcriptional activation of ribosomal RNA genes during compensatory renal hypertrophy

    International Nuclear Information System (INIS)

    Ouellette, A.J.; Moonka, R.; Zelenetz, A.; Malt, R.A.

    1986-01-01

    The overall rate of rDNA transcription increases by 50% during the first 24 hours of compensatory renal hypertrophy in the mouse. To study mechanisms of ribosome accumulation after uninephrectomy, transcription rates were measured in isolated kidneys by transcriptional runoff. 32 P-labeled nascent transcripts were hybridized to blots containing linearized, denatured cloned rDNA, and hybridization was quantitated autoradiographically and by direct counting. Overall transcriptional activity of rDNA was increased by 30% above control levels at 6 hrs after nephrectomy and by 50% at 12, 18, and 24 hrs after operation. Hybridizing RNA was insensitive to inhibiby alpha-amanitin, and no hybridization was detected to vector DNA. Thus, accelerated rDNA transcription is one regulatory element in the accretion of ribosomes in renal growth, and the regulatory event is an early event. Mechanisms of activation may include enhanced transcription of active genes or induction of inactive DNA

  9. Elucidating the 16S rRNA 3' boundaries and defining optimal SD/aSD pairing in Escherichia coli and Bacillus subtilis using RNA-Seq data.

    Science.gov (United States)

    Wei, Yulong; Silke, Jordan R; Xia, Xuhua

    2017-12-15

    Bacterial translation initiation is influenced by base pairing between the Shine-Dalgarno (SD) sequence in the 5' UTR of mRNA and the anti-SD (aSD) sequence at the free 3' end of the 16S rRNA (3' TAIL) due to: 1) the SD/aSD sequence binding location and 2) SD/aSD binding affinity. In order to understand what makes an SD/aSD interaction optimal, we must define: 1) terminus of the 3' TAIL and 2) extent of the core aSD sequence within the 3' TAIL. Our approach to characterize these components in Escherichia coli and Bacillus subtilis involves 1) mapping the 3' boundary of the mature 16S rRNA using high-throughput RNA sequencing (RNA-Seq), and 2) identifying the segment within the 3' TAIL that is strongly preferred in SD/aSD pairing. Using RNA-Seq data, we resolve previous discrepancies in the reported 3' TAIL in B. subtilis and recovered the established 3' TAIL in E. coli. Furthermore, we extend previous studies to suggest that both highly and lowly expressed genes favor SD sequences with intermediate binding affinity, but this trend is exclusive to SD sequences that complement the core aSD sequences defined herein.

  10. Differential transcriptomic analysis by RNA-Seq of GSNO-responsive genes between Arabidopsis roots and leaves.

    Science.gov (United States)

    Begara-Morales, Juan C; Sánchez-Calvo, Beatriz; Luque, Francisco; Leyva-Pérez, María O; Leterrier, Marina; Corpas, Francisco J; Barroso, Juan B

    2014-06-01

    S-Nitrosoglutathione (GSNO) is a nitric oxide-derived molecule that can regulate protein function by a post-translational modification designated S-nitrosylation. GSNO has also been detected in different plant organs under physiological and stress conditions, and it can also modulate gene expression. Thirty-day-old Arabidopsis plants were grown under hydroponic conditions, and exogenous 1 mM GSNO was applied to the root systems for 3 h. Differential gene expression analyses were carried out both in roots and in leaves by RNA sequencing (RNA-seq). A total of 3,263 genes were identified as being modulated by GSNO. Most of the genes identified were associated with the mechanism of protection against stress situations, many of these having previously been identified as target genes of GSNO by array-based methods. However, new genes were identified, such as that for methionine sulfoxide reductase (MSR) in leaves or different miscellaneous RNA (miscRNA) genes in Arabidopsis roots. As a result, 1,945 GSNO-responsive genes expressed differently in leaves and roots were identified, and 114 of these corresponded exclusively to one of these organs. In summary, it is demonstrated that RNA-seq extends our knowledge of GSNO as a signaling molecule which differentially modulates gene expression in roots and leaves under non-stress conditions. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  11. Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export.

    Science.gov (United States)

    Karijolich, John; Zhao, Yang; Alla, Ravi; Glaunsinger, Britt

    2017-06-02

    Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA-RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. A Herpesviral Immediate Early Protein Promotes Transcription Elongation of Viral Transcripts.

    Science.gov (United States)

    Fox, Hannah L; Dembowski, Jill A; DeLuca, Neal A

    2017-06-13

    Herpes simplex virus 1 (HSV-1) genes are transcribed by cellular RNA polymerase II (RNA Pol II). While four viral immediate early proteins (ICP4, ICP0, ICP27, and ICP22) function in some capacity in viral transcription, the mechanism by which ICP22 functions remains unclear. We observed that the FACT complex (comprised of SSRP1 and Spt16) was relocalized in infected cells as a function of ICP22. ICP22 was also required for the association of FACT and the transcription elongation factors SPT5 and SPT6 with viral genomes. We further demonstrated that the FACT complex interacts with ICP22 throughout infection. We therefore hypothesized that ICP22 recruits cellular transcription elongation factors to viral genomes for efficient transcription elongation of viral genes. We reevaluated the phenotype of an ICP22 mutant virus by determining the abundance of all viral mRNAs throughout infection by transcriptome sequencing (RNA-seq). The accumulation of almost all viral mRNAs late in infection was reduced compared to the wild type, regardless of kinetic class. Using chromatin immunoprecipitation sequencing (ChIP-seq), we mapped the location of RNA Pol II on viral genes and found that RNA Pol II levels on the bodies of viral genes were reduced in the ICP22 mutant compared to wild-type virus. In contrast, the association of RNA Pol II with transcription start sites in the mutant was not reduced. Taken together, our results indicate that ICP22 plays a role in recruiting elongation factors like the FACT complex to the HSV-1 genome to allow for efficient viral transcription elongation late in viral infection and ultimately infectious virion production. IMPORTANCE HSV-1 interacts with many cellular proteins throughout productive infection. Here, we demonstrate the interaction of a viral protein, ICP22, with a subset of cellular proteins known to be involved in transcription elongation. We determined that ICP22 is required to recruit the FACT complex and other transcription

  13. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    Science.gov (United States)

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  14. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data

    Directory of Open Access Journals (Sweden)

    Songbo eHuang

    2011-07-01

    Full Text Available RNA-Seq, a method using next generation sequencing technologies to sequence the transcriptome, facilitates genome-wide analysis of splice junction sites. In this paper, we introduce SOAPsplice, a robust tool to detect splice junctions using RNA-Seq data without using any information of known splice junctions. SOAPsplice uses a novel two-step approach consisting of first identifying as many reasonable splice junction candidates as possible, and then, filtering the false positives with two effective filtering strategies. In both simulated and real datasets, SOAPsplice is able to detect many reliable splice junctions with low false positive rate. The improvement gained by SOAPsplice, when compared to other existing tools, becomes more obvious when the depth of sequencing is low. SOAPsplice is freely available at http://soap.genomics.org.cn/soapsplice.html.

  15. Transcription arrest caused by long nascent RNA chains

    DEFF Research Database (Denmark)

    Bentin, Thomas; Cherny, Dmitry; Larsen, H Jakob

    2004-01-01

    on transcription. Using phage T3 RNA polymerase (T3 RNAP) and covalently closed circular (cccDNA) DNA templates that did not contain any strong termination signal, transcription was severely inhibited after a short period of time. Less than approximately 10% residual transcriptional activity remained after 10 min......The transcription process is highly processive. However, specific sequence elements encoded in the nascent RNA may signal transcription pausing and/or termination. We find that under certain conditions nascent RNA chains can have a strong and apparently sequence-independent inhibitory effect...... of incubation. The addition of RNase A almost fully restored transcription in a dose dependent manner. Throughout RNase A rescue, an elongation rate of approximately 170 nt/s was maintained and this velocity was independent of RNA transcript length, at least up to 6 kb. Instead, RNase A rescue increased...

  16. Computer and Statistical Analysis of Transcription Factor Binding and Chromatin Modifications by ChIP-seq data in Embryonic Stem Cell

    Directory of Open Access Journals (Sweden)

    Orlov Yuriy

    2012-06-01

    Full Text Available Advances in high throughput sequencing technology have enabled the identification of transcription factor (TF binding sites in genome scale. TF binding studies are important for medical applications and stem cell research. Somatic cells can be reprogrammed to a pluripotent state by the combined introduction of factors such as Oct4, Sox2, c-Myc, Klf4. These reprogrammed cells share many characteristics with embryonic stem cells (ESCs and are known as induced pluripotent stem cells (iPSCs. The signaling requirements for maintenance of human and murine embryonic stem cells (ESCs differ considerably. Genome wide ChIP-seq TF binding maps in mouse stem cells include Oct4, Sox2, Nanog, Tbx3, Smad2 as well as group of other factors. ChIP-seq allows study of new candidate transcription factors for reprogramming. It was shown that Nr5a2 could replace Oct4 for reprogramming. Epigenetic modifications play important role in regulation of gene expression adding additional complexity to transcription network functioning. We have studied associations between different histone modification using published data together with RNA Pol II sites. We found strong associations between activation marks and TF binding sites and present it qualitatively. To meet issues of statistical analysis of genome ChIP-sequencing maps we developed computer program to filter out noise signals and find significant association between binding site affinity and number of sequence reads. The data provide new insights into the function of chromatin organization and regulation in stem cells.

  17. Quantitative ChIP-Seq Normalization Reveals Global Modulation of the Epigenome

    Directory of Open Access Journals (Sweden)

    David A. Orlando

    2014-11-01

    Full Text Available Epigenomic profiling by chromatin immunoprecipitation coupled with massively parallel DNA sequencing (ChIP-seq is a prevailing methodology used to investigate chromatin-based regulation in biological systems such as human disease, but the lack of an empirical methodology to enable normalization among experiments has limited the precision and usefulness of this technique. Here, we describe a method called ChIP with reference exogenous genome (ChIP-Rx that allows one to perform genome-wide quantitative comparisons of histone modification status across cell populations using defined quantities of a reference epigenome. ChIP-Rx enables the discovery and quantification of dynamic epigenomic profiles across mammalian cells that would otherwise remain hidden using traditional normalization methods. We demonstrate the utility of this method for measuring epigenomic changes following chemical perturbations and show how reference normalization of ChIP-seq experiments enables the discovery of disease-relevant changes in histone modification occupancy.

  18. Comparative analysis of response to selection with three insecticides in the dengue mosquito Aedes aegypti using mRNA sequencing.

    Science.gov (United States)

    David, Jean-Philippe; Faucon, Frédéric; Chandor-Proust, Alexia; Poupardin, Rodolphe; Riaz, Muhammad Asam; Bonin, Aurélie; Navratil, Vincent; Reynaud, Stéphane

    2014-03-05

    selection strongly affected the polymorphism of several transcripts encoding cytochrome P450 monooxygenases likely involved in insecticide biodegradation. The present study confirmed the power of RNA-seq for identifying concomitantly quantitative and qualitative transcriptome changes associated with insecticide resistance in mosquitoes. Our results suggest that transcriptome modifications can be selected rapidly by insecticides and affect multiple biological functions. Previously neglected by molecular screenings, polymorphism variations of detoxification enzymes may play an important role in the adaptive response of mosquitoes to insecticides.

  19. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.

    Science.gov (United States)

    Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K

    2016-01-01

    In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  20. Transcriptome Analysis of Ceriops tagal in Saline Environments Using RNA-Sequencing.

    Directory of Open Access Journals (Sweden)

    Xiaorong Xiao

    Full Text Available Identification of genes involved in mangrove species' adaptation to salt stress can provide valuable information for developing salt-tolerant crops and understanding the molecular evolution of salt tolerance in halophiles. Ceriops tagal is a salt-tolerant mangrove tree growing in mudflats and marshes in tropical and subtropical areas, without any prior genome information. In this study, we assessed the biochemical and transcriptional responses of C. tagal to high salt treatment (500 mmol/L NaCl by hydroponic experiments and RNA-seq. In C. tagal root tissues under salt stress, proline accumulated strongly from 3 to 12 h of treatment; meanwhile, malondialdehyde content progressively increased from 0 to 9 h, then dropped to lower than control levels by 24 h. These implied that C. tagal plants could survive salt stress through biochemical modification. Using the Illumina sequencing platform, approximately 27.39 million RNA-seq reads were obtained from three salt-treated and control (untreated root samples. These reads were assembled into 47,111 transcripts with an average length of 514 bp and an N50 of 632 bp. Approximately 78% of the transcripts were annotated, and a total of 437 genes were putative transcription factors. Digital gene expression analysis was conducted by comparing transcripts from the untreated control to the three salt treated samples, and 7,330 differentially expressed transcripts were identified. Using k-means clustering, these transcripts were divided into six clusters that differed in their expression patterns across four treatment time points. The genes identified as being up- or downregulated are involved in salt stress responses, signal transduction, and DNA repair. Our study shows the main adaptive pathway of C. tagal in saline environments, under short-term and long-term treatments of salt stress. This provides vital clues as to which genes may be candidates for breeding salt-tolerant crops and clarifying molecular

  1. Elucidation of terpenoid metabolism in Scoparia dulcis by RNA-seq analysis.

    Science.gov (United States)

    Yamamura, Yoshimi; Kurosaki, Fumiya; Lee, Jung-Bum

    2017-03-07

    Scoparia dulcis biosynthesize bioactive diterpenes, such as scopadulcic acid B (SDB), which are known for their unique molecular skeleton. Although the biosynthesis of bioactive diterpenes is catalyzed by a sequence of class II and class I diterpene synthases (diTPSs), the mechanisms underlying this process are yet to be fully identified. To elucidate these biosynthetic machinery, we performed a high-throughput RNA-seq analysis, and de novo assembly of clean reads revealed 46,332 unique transcripts and 40,503 two unigenes. We found diTPSs genes including a putative syn-copalyl diphosphate synthase (SdCPS2) and two kaurene synthase-like (SdKSLs) genes. Besides them, total 79 full-length of cytochrome P450 (CYP450) genes were also discovered. The expression analyses showed selected CYP450s associated with their expression pattern of SdCPS2 and SdKSL1, suggesting that CYP450 candidates involved diterpene modification. SdCPS2 represents the first predicted gene to produce syn-copalyl diphosphate in dicots. In addition, SdKSL1 potentially contributes to the SDB biosynthetic pathway. Therefore, these identified genes associated with diterpene biosynthesis lead to the development of genetic engineering focus on diterpene metabolism in S. dulcis.

  2. Differential Regulation of rRNA and tRNA Transcription from the rRNA-tRNA Composite Operon in Escherichia coli.

    Directory of Open Access Journals (Sweden)

    Hiraku Takada

    Full Text Available Escherichia coli contains seven rRNA operons, each consisting of the genes for three rRNAs (16S, 23S and 5S rRNA in this order and one or two tRNA genes in the spacer between 16S and 23S rRNA genes and one or two tRNA genes in the 3' proximal region. All of these rRNA and tRNA genes are transcribed from two promoters, P1 and P2, into single large precursors that are afterward processed to individual rRNAs and tRNAs by a set of RNases. In the course of Genomic SELEX screening of promoters recognized by RNA polymerase (RNAP holoenzyme containing RpoD sigma, a strong binding site was identified within 16S rRNA gene in each of all seven rRNA operons. The binding in vitro of RNAP RpoD holoenzyme to an internal promoter, referred to the promoter of riRNA (an internal RNA of the rRNA operon, within each 16S rRNA gene was confirmed by gel shift assay and AFM observation. Using this riRNA promoter within the rrnD operon as a representative, transcription in vitro was detected with use of the purified RpoD holoenzyme, confirming the presence of a constitutive promoter in this region. LacZ reporter assay indicated that this riRNA promoter is functional in vivo. The location of riRNA promoter in vivo as identified using a set of reporter plasmids agrees well with that identified in vitro. Based on transcription profile in vitro and Northern blot analysis in vivo, the majority of transcript initiated from this riRNA promoter was estimated to terminate near the beginning of 23S rRNA gene, indicating that riRNA leads to produce the spacer-coded tRNA. Under starved conditions, transcription of the rRNA operon is markedly repressed to reduce the intracellular level of ribosomes, but the levels of both riRNA and its processed tRNAGlu stayed unaffected, implying that riRNA plays a role in the continued steady-state synthesis of tRNAs from the spacers of rRNA operons. We then propose that the tRNA genes organized within the spacers of rRNA-tRNA composite operons

  3. RNA-seq analysis and de novo transcriptome assembly of Jerusalem artichoke (Helianthus tuberosus Linne).

    Science.gov (United States)

    Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun

    2014-01-01

    Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.

  4. RNA-Seq analysis during the life cycle of Cryptosporidium parvum reveals significant differential gene expression between proliferating stages in the intestine and infectious sporozoites.

    Science.gov (United States)

    Lippuner, Christoph; Ramakrishnan, Chandra; Basso, Walter U; Schmid, Marc W; Okoniewski, Michal; Smith, Nicholas C; Hässig, Michael; Deplazes, Peter; Hehl, Adrian B

    2018-05-01

    Cryptosporidium parvum is a major cause of diarrhoea in humans and animals. There are no vaccines and few drugs available to control C. parvum. In this study, we used RNA-Seq to compare gene expression in sporozoites and intracellular stages of C. parvum to identify genes likely to be important for successful completion of the parasite's life cycle and, thereby, possible targets for drugs or vaccines. We identified 3774 protein-encoding transcripts in C. parvum. Applying a stringent cut-off of eight fold for determination of differential expression, we identified 173 genes (26 coding for predicted secreted proteins) upregulated in sporozoites. On the other hand, expression of 1259 genes was upregulated in intestinal stages (merozoites/gamonts) with a gene ontology enrichment for 63 biological processes and upregulation of 117 genes in 23 metabolic pathways. There was no clear stage specificity of expression of AP2-domain containing transcription factors, although sporozoites had a relatively small repertoire of these important regulators. Our RNA-Seq analysis revealed a new calcium-dependent protein kinase, bringing the total number of known calcium-dependent protein kinases (CDPKs) in C. parvum to 11. One of these, CDPK1, was expressed in all stages, strengthening the notion that it is a valid drug target. By comparing parasites grown in vivo (which produce bona fide thick-walled oocysts) and in vitro (which are arrested in sexual development prior to oocyst generation) we were able to confirm that genes encoding oocyst wall proteins are expressed in gametocytes and that the proteins are stockpiled rather than generated de novo in zygotes. RNA-Seq analysis of C. parvum revealed genes expressed in a stage-specific manner and others whose expression is required at all stages of development. The functional significance of these can now be addressed through recent advances in transgenics for C. parvum, and may lead to the identification of viable drug and vaccine

  5. Physiological and Pathological Transcriptional Activation of Endogenous Retroelements Assessed by RNA-Sequencing of B Lymphocytes

    Directory of Open Access Journals (Sweden)

    Jan Attig

    2017-12-01

    Full Text Available In addition to evolutionarily-accrued sequence mutation or deletion, endogenous retroelements (EREs in eukaryotic genomes are subject to epigenetic silencing, preventing or reducing their transcription, particularly in the germplasm. Nevertheless, transcriptional activation of EREs, including endogenous retroviruses (ERVs and long interspersed nuclear elements (LINEs, is observed in somatic cells, variably upon cellular differentiation and frequently upon cellular transformation. ERE transcription is modulated during physiological and pathological immune cell activation, as well as in immune cell cancers. However, our understanding of the potential consequences of such modulation remains incomplete, partly due to the relative scarcity of information regarding genome-wide ERE transcriptional patterns in immune cells. Here, we describe a methodology that allows probing RNA-sequencing (RNA-seq data for genome-wide expression of EREs in murine and human cells. Our analysis of B cells reveals that their transcriptional response during immune activation is dominated by induction of gene transcription, and that EREs respond to a much lesser extent. The transcriptional activity of the majority of EREs is either unaffected or reduced by B cell activation both in mice and humans, albeit LINEs appear considerably more responsive in the latter host. Nevertheless, a small number of highly distinct ERVs are strongly and consistently induced during B cell activation. Importantly, this pattern contrasts starkly with B cell transformation, which exhibits widespread induction of EREs, including ERVs that minimally overlap with those responsive to immune stimulation. The distinctive patterns of ERE induction suggest different underlying mechanisms and will help separate physiological from pathological expression.

  6. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    Science.gov (United States)

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  7. Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients.

    Science.gov (United States)

    Isik, Zerrin; Ercan, Muserref Ece

    2017-10-01

    Integration of several types of patient data in a computational framework can accelerate the identification of more reliable biomarkers, especially for prognostic purposes. This study aims to identify biomarkers that can successfully predict the potential survival time of a cancer patient by integrating the transcriptomic (RNA-Seq), proteomic (RPPA), and protein-protein interaction (PPI) data. The proposed method -RPBioNet- employs a random walk-based algorithm that works on a PPI network to identify a limited number of protein biomarkers. Later, the method uses gene expression measurements of the selected biomarkers to train a classifier for the survival time prediction of patients. RPBioNet was applied to classify kidney renal clear cell carcinoma (KIRC), glioblastoma multiforme (GBM), and lung squamous cell carcinoma (LUSC) patients based on their survival time classes (long- or short-term). The RPBioNet method correctly identified the survival time classes of patients with between 66% and 78% average accuracy for three data sets. RPBioNet operates with only 20 to 50 biomarkers and can achieve on average 6% higher accuracy compared to the closest alternative method, which uses only RNA-Seq data in the biomarker selection. Further analysis of the most predictive biomarkers highlighted genes that are common for both cancer types, as they may be driver proteins responsible for cancer progression. The novelty of this study is the integration of a PPI network with mRNA and protein expression data to identify more accurate prognostic biomarkers that can be used for clinical purposes in the future. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Sentence‐Chain Based Seq2seq Model for Corpus Expansion

    Directory of Open Access Journals (Sweden)

    Euisok Chung

    2017-08-01

    Full Text Available This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence‐chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4‐times the number of n‐grams with superior performance for English text.

  9. Identifying TF-MiRNA Regulatory Relationships Using Multiple Features.

    Directory of Open Access Journals (Sweden)

    Mingyu Shao

    Full Text Available MicroRNAs are known to play important roles in the transcriptional and post-transcriptional regulation of gene expression. While intensive research has been conducted to identify miRNAs and their target genes in various genomes, there is only limited knowledge about how microRNAs are regulated. In this study, we construct a pipeline that can infer the regulatory relationships between transcription factors and microRNAs from ChIP-Seq data with high confidence. In particular, after identifying candidate peaks from ChIP-Seq data, we formulate the inference as a PU learning (learning from only positive and unlabeled examples problem. Multiple features including the statistical significance of the peaks, the location of the peaks, the transcription factor binding site motifs, and the evolutionary conservation are derived from peaks for training and prediction. To further improve the accuracy of our inference, we also apply a mean reciprocal rank (MRR-based method to the candidate peaks. We apply our pipeline to infer TF-miRNA regulatory relationships in mouse embryonic stem cells. The experimental results show that our approach provides very specific findings of TF-miRNA regulatory relationships.

  10. GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences.

    Science.gov (United States)

    Cumbie, Jason S; Kimbrel, Jeffrey A; Di, Yanming; Schafer, Daniel W; Wilhelm, Larry J; Fox, Samuel E; Sullivan, Christopher M; Curzon, Aron D; Carrington, James C; Mockler, Todd C; Chang, Jeff H

    2011-01-01

    GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts.

  11. GENE-counter: a computational pipeline for the analysis of RNA-Seq data for gene expression differences.

    Directory of Open Access Journals (Sweden)

    Jason S Cumbie

    Full Text Available GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts.

  12. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex.

    Science.gov (United States)

    Konermann, Silvana; Brigham, Mark D; Trevino, Alexandro E; Joung, Julia; Abudayyeh, Omar O; Barcena, Clea; Hsu, Patrick D; Habib, Naomi; Gootenberg, Jonathan S; Nishimasu, Hiroshi; Nureki, Osamu; Zhang, Feng

    2015-01-29

    Systematic interrogation of gene function requires the ability to perturb gene expression in a robust and generalizable manner. Here we describe structure-guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional activation at endogenous genomic loci. We used these engineered Cas9 activation complexes to investigate single-guide RNA (sgRNA) targeting rules for effective transcriptional activation, to demonstrate multiplexed activation of ten genes simultaneously, and to upregulate long intergenic non-coding RNA (lincRNA) transcripts. We also synthesized a library consisting of 70,290 guides targeting all human RefSeq coding isoforms to screen for genes that, upon activation, confer resistance to a BRAF inhibitor. The top hits included genes previously shown to be able to confer resistance, and novel candidates were validated using individual sgRNA and complementary DNA overexpression. A gene expression signature based on the top screening hits correlated with markers of BRAF inhibitor resistance in cell lines and patient-derived samples. These results collectively demonstrate the potential of Cas9-based activators as a powerful genetic perturbation technology.

  13. The Impact of Normalization Methods on RNA-Seq Data Analysis

    Science.gov (United States)

    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

    2015-01-01

    High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

  14. Next Generation Sequencing Analysis of Human Platelet PolyA+ mRNAs and rRNA-Depleted Total RNA

    Science.gov (United States)

    Kissopoulou, Antheia; Jonasson, Jon; Lindahl, Tomas L.; Osman, Abdimajid

    2013-01-01

    Background Platelets are small anucleate cells circulating in the blood vessels where they play a key role in hemostasis and thrombosis. Here, we compared platelet RNA-Seq results obtained from polyA+ mRNA and rRNA-depleted total RNA. Materials and Methods We used purified, CD45 depleted, human blood platelets collected by apheresis from three male and one female healthy blood donors. The Illumina HiSeq 2000 platform was employed to sequence cDNA converted either from oligo(dT) isolated polyA+ RNA or from rRNA-depleted total RNA. The reads were aligned to the GRCh37 reference assembly with the TopHat/Cufflinks alignment package using Ensembl annotations. A de novo assembly of the platelet transcriptome using the Trinity software package and RSEM was also performed. The bioinformatic tools HTSeq and DESeq from Bioconductor were employed for further statistical analyses of read counts. Results Consistent with previous findings our data suggests that mitochondrially expressed genes comprise a substantial fraction of the platelet transcriptome. We also identified high transcript levels for protein coding genes related to the cytoskeleton function, chemokine signaling, cell adhesion, aggregation, as well as receptor interaction between cells. Certain transcripts were particularly abundant in platelets compared with other cell and tissue types represented by RNA-Seq data from the Illumina Human Body Map 2.0 project. Irrespective of the different library preparation and sequencing protocols, there was good agreement between samples from the 4 individuals. Eighteen differentially expressed genes were identified in the two sexes at 10% false discovery rate using DESeq. Conclusion The present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of mitochondrially encoded genes and also of genomic transcripts related to the cytoskeleton function, chemokine signaling and surface components compared with other cell and

  15. Next generation sequencing analysis of human platelet PolyA+ mRNAs and rRNA-depleted total RNA.

    Directory of Open Access Journals (Sweden)

    Antheia Kissopoulou

    Full Text Available BACKGROUND: Platelets are small anucleate cells circulating in the blood vessels where they play a key role in hemostasis and thrombosis. Here, we compared platelet RNA-Seq results obtained from polyA+ mRNA and rRNA-depleted total RNA. MATERIALS AND METHODS: We used purified, CD45 depleted, human blood platelets collected by apheresis from three male and one female healthy blood donors. The Illumina HiSeq 2000 platform was employed to sequence cDNA converted either from oligo(dT isolated polyA+ RNA or from rRNA-depleted total RNA. The reads were aligned to the GRCh37 reference assembly with the TopHat/Cufflinks alignment package using Ensembl annotations. A de novo assembly of the platelet transcriptome using the Trinity software package and RSEM was also performed. The bioinformatic tools HTSeq and DESeq from Bioconductor were employed for further statistical analyses of read counts. RESULTS: Consistent with previous findings our data suggests that mitochondrially expressed genes comprise a substantial fraction of the platelet transcriptome. We also identified high transcript levels for protein coding genes related to the cytoskeleton function, chemokine signaling, cell adhesion, aggregation, as well as receptor interaction between cells. Certain transcripts were particularly abundant in platelets compared with other cell and tissue types represented by RNA-Seq data from the Illumina Human Body Map 2.0 project. Irrespective of the different library preparation and sequencing protocols, there was good agreement between samples from the 4 individuals. Eighteen differentially expressed genes were identified in the two sexes at 10% false discovery rate using DESeq. CONCLUSION: The present data suggests that platelets may have a unique transcriptome profile characterized by a relative over-expression of mitochondrially encoded genes and also of genomic transcripts related to the cytoskeleton function, chemokine signaling and surface components

  16. Identification of Nitrogen Consumption Genetic Variants in Yeast Through QTL Mapping and Bulk Segregant RNA-Seq Analyses.

    Science.gov (United States)

    Cubillos, Francisco A; Brice, Claire; Molinet, Jennifer; Tisné, Sebastién; Abarca, Valentina; Tapia, Sebastián M; Oporto, Christian; García, Verónica; Liti, Gianni; Martínez, Claudio

    2017-06-07

    Saccharomyces cerevisiae is responsible for wine must fermentation. In this process, nitrogen represents a limiting nutrient and its scarcity results in important economic losses for the wine industry. Yeast isolates use different strategies to grow in poor nitrogen environments and their genomic plasticity enables adaptation to multiple habitats through improvements in nitrogen consumption. Here, we used a highly recombinant S. cerevisiae multi-parent population (SGRP-4X) derived from the intercross of four parental strains of different origins to identify new genetic variants responsible for nitrogen consumption differences during wine fermentation. Analysis of 165 fully sequenced F12 segregants allowed us to map 26 QTL in narrow intervals for 14 amino acid sources and ammonium, the majority of which represent genomic regions previously unmapped for these traits. To complement this strategy, we performed Bulk segregant RNA-seq (BSR-seq) analysis in segregants exhibiting extremely high and low ammonium consumption levels. This identified several QTL overlapping differentially expressed genes and refined the gene candidate search. Based on these approaches, we were able to validate ARO1 , PDC1 , CPS1 , ASI2 , LYP1 , and ALP1 allelic variants underlying nitrogen consumption differences between strains, providing evidence of many genes with small phenotypic effects. Altogether, these variants significantly shape yeast nitrogen consumption with important implications for evolution, ecological, and quantitative genomics. Copyright © 2017 Cubillos et al.

  17. Identification of Nitrogen Consumption Genetic Variants in Yeast Through QTL Mapping and Bulk Segregant RNA-Seq Analyses

    Directory of Open Access Journals (Sweden)

    Francisco A. Cubillos

    2017-06-01

    Full Text Available Saccharomyces cerevisiae is responsible for wine must fermentation. In this process, nitrogen represents a limiting nutrient and its scarcity results in important economic losses for the wine industry. Yeast isolates use different strategies to grow in poor nitrogen environments and their genomic plasticity enables adaptation to multiple habitats through improvements in nitrogen consumption. Here, we used a highly recombinant S. cerevisiae multi-parent population (SGRP-4X derived from the intercross of four parental strains of different origins to identify new genetic variants responsible for nitrogen consumption differences during wine fermentation. Analysis of 165 fully sequenced F12 segregants allowed us to map 26 QTL in narrow intervals for 14 amino acid sources and ammonium, the majority of which represent genomic regions previously unmapped for these traits. To complement this strategy, we performed Bulk segregant RNA-seq (BSR-seq analysis in segregants exhibiting extremely high and low ammonium consumption levels. This identified several QTL overlapping differentially expressed genes and refined the gene candidate search. Based on these approaches, we were able to validate ARO1, PDC1, CPS1, ASI2, LYP1, and ALP1 allelic variants underlying nitrogen consumption differences between strains, providing evidence of many genes with small phenotypic effects. Altogether, these variants significantly shape yeast nitrogen consumption with important implications for evolution, ecological, and quantitative genomics.

  18. Post-transcriptional bursting in genes regulated by small RNA molecules

    Science.gov (United States)

    Rodrigo, Guillermo

    2018-03-01

    Gene expression programs in living cells are highly dynamic due to spatiotemporal molecular signaling and inherent biochemical stochasticity. Here we study a mechanism based on molecule-to-molecule variability at the RNA level for the generation of bursts of protein production, which can lead to heterogeneity in a cell population. We develop a mathematical framework to show numerically and analytically that genes regulated post transcriptionally by small RNA molecules can exhibit such bursts due to different states of translation activity (on or off), mostly revealed in a regime of few molecules. We exploit this framework to compare transcriptional and post-transcriptional bursting and also to illustrate how to tune the resulting protein distribution with additional post-transcriptional regulations. Moreover, because RNA-RNA interactions are predictable with an energy model, we define the kinetic constants of on-off switching as functions of the two characteristic free-energy differences of the system, activation and formation, with a nonequilibrium scheme. Overall, post-transcriptional bursting represents a distinctive principle linking gene regulation to gene expression noise, which highlights the importance of the RNA layer beyond the simple information transfer paradigm and significantly contributes to the understanding of the intracellular processes from a first-principles perspective.

  19. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data.

    Science.gov (United States)

    Fan, Jean; Lee, Hae-Ock; Lee, Soohyun; Ryu, Da-Eun; Lee, Semin; Xue, Catherine; Kim, Seok Jin; Kim, Kihyun; Barkas, Nikolas; Park, Peter J; Park, Woong-Yang; Kharchenko, Peter V

    2018-06-13

    Characterization of intratumoral heterogeneity is critical to cancer therapy, as presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss-of-heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct underlying subclonal architecture. Examining several tumor types, we show that HoneyBADGER is effective at identifying deletion, amplifications, and copy-neutral loss-of-heterozygosity events, and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Surprisingly, other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure, and were likely driven by alternative, non-clonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer. Published by Cold Spring Harbor Laboratory Press.

  20. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

    Science.gov (United States)

    Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie

    2018-04-16

    RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.

  1. Identification of mRNAs that move over long distances using an RNA-Seq analysis of Arabidopsis/Nicotiana benthamiana heterografts.

    Science.gov (United States)

    Notaguchi, Michitaka; Higashiyama, Tetsuya; Suzuki, Takamasa

    2015-02-01

    Phloem is a conductive tissue that allocates nutrients from mature source leaves to sinks such as young developing tissues. Phloem also delivers proteins and RNA species, such as small RNAs and mRNAs. Intensive studies on plant systemic signaling revealed the essential roles of proteins and RNA species. However, many of their functions are still largely unknown, with the roles of transported mRNAs being particularly poorly understood. A major difficulty is the absence of an accurate and comprehensive list of mobile transcripts. In this study, we used a hetero-graft system with Nicotiana benthamiana as the recipient scion and Arabidopsis as the donor stock, to identify transcripts that moved long distances across the graft union. We identified 138 Arabidopsis transcripts as mobile mRNAs, which we collectively termed the mRNA mobilome. Reverse transcription-PCR, quantitative real-time PCR and droplet digital PCR analyses confirmed the mobility. The transcripts included potential signaling factors and, unexpectedly, more general factors. In our investigations, we found no preferred transcript length, no previously known sequence motifs in promoter or transcript sequences and no similarities between the level of the transcripts and that in the source leaves. Grafting experiments regarding the function of ERECTA, an identified transcript, showed that no function of the transcript mobilized. To our knowledge, this is the first report identifying transcripts that move over long distances using a hetero-graft system between different plant taxa. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  2. The clinical value of lncRNA NEAT1 in digestive system malignancies: A comprehensive investigation based on 57 microarray and RNA-seq datasets.

    Science.gov (United States)

    Xiong, Dan-Dan; Feng, Zhen-Bo; Cen, Wei-Luan; Zeng, Jing-Jing; Liang, Lu; Tang, Rui-Xue; Gan, Xiao-Ning; Liang, Hai-Wei; Li, Zu-Yun; Chen, Gang; Luo, Dian-Zhong

    2017-03-14

    This comprehensive investigation was performed to evaluate the expression level and potential clinical value of NEAT1 in digestive system malignancies. A total of 57 lncRNA datasets of microarray or RNA-seq and 5 publications were included. The pooled standard mean deviation (SMD) indicated that NEAT1 was down-regulated in esophageal carcinoma (ESCA, SMD = -0.35, 95% CI: -0.5~-0.20, P digestive system malignancies (HR: 1.50, 95% CI: 1.28-1.76, P digestive system cancers and could be a potential diagnostic and prognostic biomarker in patients with digestive system carcinomas. Further and stricter studies with a larger number of cases are necessary to strengthen our conclusions.

  3. Mycobacterial RNA isolation optimized for non-coding RNA: high fidelity isolation of 5S rRNA from Mycobacterium bovis BCG reveals novel post-transcriptional processing and a complete spectrum of modified ribonucleosides.

    Science.gov (United States)

    Hia, Fabian; Chionh, Yok Hian; Pang, Yan Ling Joy; DeMott, Michael S; McBee, Megan E; Dedon, Peter C

    2015-03-11

    A major challenge in the study of mycobacterial RNA biology is the lack of a comprehensive RNA isolation method that overcomes the unusual cell wall to faithfully yield the full spectrum of non-coding RNA (ncRNA) species. Here, we describe a simple and robust procedure optimized for the isolation of total ncRNA, including 5S, 16S and 23S ribosomal RNA (rRNA) and tRNA, from mycobacteria, using Mycobacterium bovis BCG to illustrate the method. Based on a combination of mechanical disruption and liquid and solid-phase technologies, the method produces all major species of ncRNA in high yield and with high integrity, enabling direct chemical and sequence analysis of the ncRNA species. The reproducibility of the method with BCG was evident in bioanalyzer electrophoretic analysis of isolated RNA, which revealed quantitatively significant differences in the ncRNA profiles of exponentially growing and non-replicating hypoxic bacilli. The method also overcame an historical inconsistency in 5S rRNA isolation, with direct sequencing revealing a novel post-transcriptional processing of 5S rRNA to its functional form and with chemical analysis revealing seven post-transcriptional ribonucleoside modifications in the 5S rRNA. This optimized RNA isolation procedure thus provides a means to more rigorously explore the biology of ncRNA species in mycobacteria. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Effect of Green Tea Extract on Systemic Metabolic Homeostasis in Diet-Induced Obese Mice Determined via RNA-Seq Transcriptome Profiles

    Directory of Open Access Journals (Sweden)

    Ji-Young Choi

    2016-10-01

    Full Text Available Green tea (GT has various health effects, including anti-obesity properties. However, the multiple molecular mechanisms of the effects have not been fully determined. The aim of this study was to elucidate the anti-obesity effects of GT via the analysis of its metabolic and transcriptional responses based on RNA-seq profiles. C57BL/6J mice were fed a normal, high-fat (60% energy as fat, or high-fat + 0.25% (w/w GT diet for 12 weeks. The GT extract ameliorated obesity, hepatic steatosis, dyslipidemia, and insulin resistance in diet-induced obesity (DIO mice. GT supplementation resulted in body weight gain reduction than mice fed high-fat through enhanced energy expenditure, and reduced adiposity. The transcriptome profiles of epididymal white adipose tissue (eWAT suggested that GT augments transcriptional responses to the degradation of branched chain amino acids (BCAAs, as well as AMP-activated protein kinase (AMPK signaling, which suggests enhanced energy homeostasis. Our findings provide some significant insights into the effects of GT for the prevention of obesity and its comorbidities. We demonstrated that the GT extract contributed to the regulation of systemic metabolic homeostasis via transcriptional responses to not only lipid and glucose metabolism, but also amino acid metabolism via BCAA degradation in the adipose tissue of DIO mice.

  5. Global effects of the CSR-1 RNA interference pathway on transcriptional landscape

    Science.gov (United States)

    Cecere, Germano; Hoersch, Sebastian; O’Keeffe, Sean; Sachidanandam, Ravi; Grishok, Alla

    2014-01-01

    Argonaute proteins and their small RNA co-factors short interfering RNAs (siRNAs) are known to inhibit gene expression at the transcriptional and post-transcriptional levels. In Caenorhabditis elegans, the Argonaute CSR-1 binds thousands of endogenous siRNAs (endo-siRNAs) antisense to germline transcripts and associates with chromatin in a siRNA-dependent manner. However, its role in gene expression regulation remains controversial. Here, we used a genome-wide profiling of nascent RNA transcripts to demonstrate that the CSR-1 RNAi pathway promotes sense-oriented Pol II transcription. Moreover, a loss of CSR-1 function resulted in global increase in antisense transcription and ectopic transcription of silent chromatin domains, which led to reduced chromatin incorporation of centromere-specific histone H3. Based on these findings, we propose that the CSR-1 pathway has a role in maintaining the directionality of active transcription thereby propagating the distinction between transcriptionally active and silent genomic regions. PMID:24681887

  6. Optimization Of A High-Throughput Transcriptomic (HTTr) Bioactivity Screen In MCF7 Cells Using Targeted RNA-Seq (SOT)

    Science.gov (United States)

    Recent advances in targeted RNA-Seq technology allow researchers to efficiently and cost-effectively obtain whole transcriptome profiles using picograms of mRNA from human cell lysates. Low mRNA input requirements and sample multiplexing capabilities has made time- and concentrat...

  7. SplicingTypesAnno: annotating and quantifying alternative splicing events for RNA-Seq data.

    Science.gov (United States)

    Sun, Xiaoyong; Zuo, Fenghua; Ru, Yuanbin; Guo, Jiqiang; Yan, Xiaoyan; Sablok, Gaurav

    2015-04-01

    Alternative splicing plays a key role in the regulation of the central dogma. Four major types of alternative splicing have been classified as intron retention, exon skipping, alternative 5 splice sites or alternative donor sites, and alternative 3 splice sites or alternative acceptor sites. A few algorithms have been developed to detect splice junctions from RNA-Seq reads. However, there are few tools targeting at the major alternative splicing types at the exon/intron level. This type of analysis may reveal subtle, yet important events of alternative splicing, and thus help gain deeper understanding of the mechanism of alternative splicing. This paper describes a user-friendly R package, extracting, annotating and analyzing alternative splicing types for sequence alignment files from RNA-Seq. SplicingTypesAnno can: (1) provide annotation for major alternative splicing at exon/intron level. By comparing the annotation from GTF/GFF file, it identifies the novel alternative splicing sites; (2) offer a convenient two-level analysis: genome-scale annotation for users with high performance computing environment, and gene-scale annotation for users with personal computers; (3) generate a user-friendly web report and additional BED files for IGV visualization. SplicingTypesAnno is a user-friendly R package for extracting, annotating and analyzing alternative splicing types at exon/intron level for sequence alignment files from RNA-Seq. It is publically available at https://sourceforge.net/projects/splicingtypes/files/ or http://genome.sdau.edu.cn/research/software/SplicingTypesAnno.html. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  8. Development and validation of genic-SSR markers in sesame by RNA-seq.

    Science.gov (United States)

    Zhang, Haiyang; Wei, Libin; Miao, Hongmei; Zhang, Tide; Wang, Cuiying

    2012-07-16

    Sesame (Sesamum indicum L.) is one of the most important oil crops; however, a lack of useful molecular markers hinders current genetic research. We performed transcriptome sequencing of samples from different sesame growth and developmental stages, and mining of genic-SSR markers to identify valuable markers for sesame molecular genetics research. In this study, 75 bp and 100 bp paired-end RNA-seq was used to sequence 24 cDNA libraries, and 42,566 uni-transcripts were assembled from more than 260 million filtered reads. The total length of uni-transcript sequences was 47.99 Mb, and 7,324 SSRs (SSRs ≥15 bp) and 4,440 SSRs (SSRs ≥18 bp) were identified. On average, there was one genic-SSR per 6.55 kb (SSRs ≥15 bp) or 10.81 kb (SSRs ≥18 bp). Among perfect SSRs (≥18 bp), di-nucleotide motifs (48.01%) were the most abundant, followed by tri- (20.96%), hexa- (25.37%), penta- (2.97%), tetra- (2.12%), and mono-nucleotides (0.57%). The top four motif repeats were (AG/CT)n [1,268 (34.51%)], (CA/TG)n [281 (7.65%)], (AT/AT)n [215 (5.85%)], and (GAA/TTC)n [131 (3.57%)]. A total of 2,164 SSR primer pairs were identified in the 4,440 SSR-containing sequences (≥18 bp), and 300 SSR primer pairs were randomly chosen for validation. These SSR markers were amplified and validated in 25 sesame accessions (24 cultivated accessions, one wild species). 276 (92.0%) primer pairs yielded PCR amplification products in 24 cultivars. Thirty two primer pairs (11.59%) exhibited polymorphisms. Moreover, 203 primer pairs (67.67%) yielded PCR amplicons in the wild accession and 167 (60.51%) were polymorphic between species. A UPGMA dendrogram based on genetic similarity coefficients showed that the correlation between genotype and geographical source was low and that the genetic basis of sesame in China is narrow, as previously reported. The 32 polymorphic primer pairs were validated using an F2 mapping population; 18 primer pairs exhibited polymorphisms between the parents, and 14

  9. Whole transcriptome expression analysis and comparison of two different strains of Plasmodium falciparum using RNA-Seq

    Directory of Open Access Journals (Sweden)

    Hiasindh Ashmi Antony

    2016-06-01

    Full Text Available The emergence and distribution of drug resistance in malaria are serious public health concerns in tropical and subtropical regions of the world. However, the molecular mechanism of drug resistance remains unclear. In the present study, we performed a high-throughput RNA-Seq to identify and characterize the differentially expressed genes between the chloroquine (CQ sensitive (3D7 and resistant (Dd2 strains of Plasmodium falciparum. The parasite cells were cultured in the presence and absence of CQ by in vitro method. Total RNA was isolated from the harvested parasite cells using TRIzol, and RNA-Seq was conducted using an Illumina HiSeq 2500 sequencing platform with paired-end reads and annotated using Tophat. The transcriptome analysis of P. falciparum revealed the expression of ~5000 genes, in which ~60% of the genes have unknown function. Cuffdiff program was used to identify the differentially expressed genes between the CQ-sensitive and resistant strains. Here, we furnish a detailed description of the experimental design, procedure, and analysis of the transcriptome sequencing data, that have been deposited in the National Center for Biotechnology Information (accession nos. PRJNA308455 and GSE77499.

  10. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

    Science.gov (United States)

    Wang, Zichen; Ma'ayan, Avi

    2016-01-01

    RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at:  http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and  https://hub.docker.com/r/maayanlab/zika/.

  11. RNA-Seq and molecular docking reveal multi-level pesticide resistance in the bed bug

    Directory of Open Access Journals (Sweden)

    Mamidala Praveen

    2012-01-01

    Full Text Available Abstract Background Bed bugs (Cimex lectularius are hematophagous nocturnal parasites of humans that have attained high impact status due to their worldwide resurgence. The sudden and rampant resurgence of C. lectularius has been attributed to numerous factors including frequent international travel, narrower pest management practices, and insecticide resistance. Results We performed a next-generation RNA sequencing (RNA-Seq experiment to find differentially expressed genes between pesticide-resistant (PR and pesticide-susceptible (PS strains of C. lectularius. A reference transcriptome database of 51,492 expressed sequence tags (ESTs was created by combining the databases derived from de novo assembled mRNA-Seq tags (30,404 ESTs and our previous 454 pyrosequenced database (21,088 ESTs. The two-way GLMseq analysis revealed ~15,000 highly significant differentially expressed ESTs between the PR and PS strains. Among the top 5,000 differentially expressed ESTs, 109 putative defense genes (cuticular proteins, cytochrome P450s, antioxidant genes, ABC transporters, glutathione S-transferases, carboxylesterases and acetyl cholinesterase involved in penetration resistance and metabolic resistance were identified. Tissue and development-specific expression of P450 CYP3 clan members showed high mRNA levels in the cuticle, Malpighian tubules, and midgut; and in early instar nymphs, respectively. Lastly, molecular modeling and docking of a candidate cytochrome P450 (CYP397A1V2 revealed the flexibility of the deduced protein to metabolize a broad range of insecticide substrates including DDT, deltamethrin, permethrin, and imidacloprid. Conclusions We developed significant molecular resources for C. lectularius putatively involved in metabolic resistance as well as those participating in other modes of insecticide resistance. RNA-Seq profiles of PR strains combined with tissue-specific profiles and molecular docking revealed multi-level insecticide

  12. RNA Editing During Sexual Development Occurs in Distantly Related Filamentous Ascomycetes.

    Science.gov (United States)

    Teichert, Ines; Dahlmann, Tim A; Kück, Ulrich; Nowrousian, Minou

    2017-04-01

    RNA editing is a post-transcriptional process that modifies RNA molecules leading to transcript sequences that differ from their template DNA. A-to-I editing was found to be widely distributed in nuclear transcripts of metazoa, but was detected in fungi only recently in a study of the filamentous ascomycete Fusarium graminearum that revealed extensive A-to-I editing of mRNAs in sexual structures (fruiting bodies). Here, we searched for putative RNA editing events in RNA-seq data from Sordaria macrospora and Pyronema confluens, two distantly related filamentous ascomycetes, and in data from the Taphrinomycete Schizosaccharomyces pombe. Like F. graminearum, S. macrospora is a member of the Sordariomycetes, whereas P. confluens belongs to the early-diverging group of Pezizomycetes. We found extensive A-to-I editing in RNA-seq data from sexual mycelium from both filamentous ascomycetes, but not in vegetative structures. A-to-I editing was not detected in different stages of meiosis of S. pombe. A comparison of A-to-I editing in S. macrospora with F. graminearum and P. confluens, respectively, revealed little conservation of individual editing sites. An analysis of RNA-seq data from two sterile developmental mutants of S. macrospora showed that A-to-I editing is strongly reduced in these strains. Sequencing of cDNA fragments containing more than one editing site from P. confluens showed that at the beginning of sexual development, transcripts were incompletely edited or unedited, whereas in later stages transcripts were more extensively edited. Taken together, these data suggest that A-to-I RNA editing is an evolutionary conserved feature during fruiting body development in filamentous ascomycetes. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. The RNA Polymerase II C-Terminal Domain Phosphatase-Like Protein FIERY2/CPL1 Interacts with eIF4AIII and Is Essential for Nonsense-Mediated mRNA Decay in Arabidopsis

    KAUST Repository

    Cui, Peng; Chen, Tao; Qin, Tao; Ding, Feng; Wang, Zhenyu; Chen, Hao; Xiong, Liming

    2016-01-01

    © 2016 American Society of Plant Biologists. All rights reserved. Nonsense-mediated decay (NMD) is a posttranscriptional surveillance mechanism in eukaryotes that recognizes and degrades transcripts with premature translation-termination codons. The RNA polymerase II C-terminal domain phosphatase-like protein FIERY2 (FRY2; also known as C-TERMINAL DOMAIN PHOSPHATASE-LIKE1 [CPL1]) plays multiple roles in RNA processing in Arabidopsis thaliana. Here, we found that FRY2/CPL1 interacts with two NMD factors, eIF4AIII and UPF3, and is involved in the dephosphorylation of eIF4AIII. This dephosphorylation retains eIF4AIII in the nucleus and limits its accumulation in the cytoplasm. By analyzing RNA-seq data combined with quantitative RT-PCR validation, we found that a subset of alternatively spliced transcripts and 59-extended mRNAs with NMD-eliciting features accumulated in the fry2-1 mutant, cycloheximidetreated wild type, and upf3 mutant plants, indicating that FRY2 is essential for the degradation of these NMD transcripts.

  14. The RNA Polymerase II C-Terminal Domain Phosphatase-Like Protein FIERY2/CPL1 Interacts with eIF4AIII and Is Essential for Nonsense-Mediated mRNA Decay in Arabidopsis

    KAUST Repository

    Cui, Peng

    2016-02-18

    © 2016 American Society of Plant Biologists. All rights reserved. Nonsense-mediated decay (NMD) is a posttranscriptional surveillance mechanism in eukaryotes that recognizes and degrades transcripts with premature translation-termination codons. The RNA polymerase II C-terminal domain phosphatase-like protein FIERY2 (FRY2; also known as C-TERMINAL DOMAIN PHOSPHATASE-LIKE1 [CPL1]) plays multiple roles in RNA processing in Arabidopsis thaliana. Here, we found that FRY2/CPL1 interacts with two NMD factors, eIF4AIII and UPF3, and is involved in the dephosphorylation of eIF4AIII. This dephosphorylation retains eIF4AIII in the nucleus and limits its accumulation in the cytoplasm. By analyzing RNA-seq data combined with quantitative RT-PCR validation, we found that a subset of alternatively spliced transcripts and 59-extended mRNAs with NMD-eliciting features accumulated in the fry2-1 mutant, cycloheximidetreated wild type, and upf3 mutant plants, indicating that FRY2 is essential for the degradation of these NMD transcripts.

  15. Identification of transcriptional biomarkers by RNA-sequencing for improved detection of β2-agonists abuse in goat skeletal muscle.

    Directory of Open Access Journals (Sweden)

    Luyao Zhao

    Full Text Available In this paper, high-throughput RNA-sequencing (RNA-seq was used to search for transcriptional biomarkers for β2-agonists. In combination with drug mechanisms, a smaller group of genes with higher detection accuracy was screened out. Unknown samples were first predicted by this group of genes, and liquid chromatograph tandem mass spectrometer (LC-MS/MS was applied to positive samples to validate the biomarkers. The results of principal component analysis (PCA, hierarchical cluster analysis (HCA and discriminant analysis (DA indicated that the eight genes screened by high-throughput RNA-seq were able to distinguish samples in the experimental group and control group. Compared with the nine genes selected from an earlier literature, 17 genes including these nine genes were proven to have a more satisfactory effect, which validated the accuracy of gene selection by RNA-seq. Then, six key genes were selected from the 17 genes according to the variable importance in projection (VIP value of greater than 1. The test results using the six genes and 17 genes were similar, revealing that the six genes were critical genes. By using the six genes, three positive samples possibly treated with drugs were screened out from 25 unknown samples through DA and partial least squares discriminant analysis (PLS-DA. Then, the three samples were verified by a standard method, and mapenterol was detected in a sample. Therefore, the six genes can be used as biomarkers to detect β2-agonists. Compared with the previous study, accurate detection of β2-agonists abuse using six key genes is an improvement method, which show great significance in the monitoring of β2-agonists abuse in animal husbandry.

  16. Comparison of miRNA quantitation by Nanostring in serum and plasma samples.

    Directory of Open Access Journals (Sweden)

    Catherine Foye

    Full Text Available Circulating microRNAs that are associated with specific diseases have garnered much attention for use in diagnostic assays. However, detection of disease-associated miRNA can be affected by several factors such as release of contaminating cellular miRNA during sample collection, variations due to amplification of transcript for detection, or controls used for normalization for accurate quantitation. We analyzed circulating miRNA in serum and plasma samples obtained concurrently from 28 patients, using a Nanostring quantitative assay platform. Total RNA concentration ranged from 32-125 μg/ml from serum and 30-220 μg/ml from plasma. Of 798 miRNAs, 371 miRNAs were not detected in either serum or plasma samples. 427 were detected in either serum or plasma but not both, whereas 151 miRNA were detected in both serum and plasma samples. The diversity of miRNA detected was greater in plasma than in serum samples. In serum samples, the number of detected miRNA ranged from 3 to 82 with a median of 17, whereas in plasma samples, the number of miRNA detected ranged from 25 to 221 with a median of 91. Several miRNA such as miR451a, miR 16-5p, miR-223-3p, and mir25-3p were highly abundant and differentially expressed between serum and plasma. The detection of endogenous and exogenous control miRNAs varied in serum and plasma, with higher levels observed in plasma. Gene expression stability identified candidate invariant microRNA that were highly stable across all samples, and could be used for normalization. In conclusion, there are significant differences in both the number of miRNA detected and the amount of miRNA detected between serum and plasma. Normalization using miRNA with constant expression is essential to minimize the impact of technical variations. Given the challenges involved, ideal candidates for blood based biomarkers would be those that are indifferent to type of body fluid, are detectable and can be reliably quantitated.

  17. Analysis of transcript and protein overlap in a human osteosarcoma cell line

    Directory of Open Access Journals (Sweden)

    Emanuelsson Olof

    2010-12-01

    Full Text Available Abstract Background An interesting field of research in genomics and proteomics is to compare the overlap between the transcriptome and the proteome. Recently, the tools to analyse gene and protein expression on a whole-genome scale have been improved, including the availability of the new generation sequencing instruments and high-throughput antibody-based methods to analyze the presence and localization of proteins. In this study, we used massive transcriptome sequencing (RNA-seq to investigate the transcriptome of a human osteosarcoma cell line and compared the expression levels with in situ protein data obtained in-situ from antibody-based immunohistochemistry (IHC and immunofluorescence microscopy (IF. Results A large-scale analysis based on 2749 genes was performed, corresponding to approximately 13% of the protein coding genes in the human genome. We found the presence of both RNA and proteins to a large fraction of the analyzed genes with 60% of the analyzed human genes detected by all three methods. Only 34 genes (1.2% were not detected on the transcriptional or protein level with any method. Our data suggest that the majority of the human genes are expressed at detectable transcript or protein levels in this cell line. Since the reliability of antibodies depends on possible cross-reactivity, we compared the RNA and protein data using antibodies with different reliability scores based on various criteria, including Western blot analysis. Gene products detected in all three platforms generally have good antibody validation scores, while those detected only by antibodies, but not by RNA sequencing, generally consist of more low-scoring antibodies. Conclusion This suggests that some antibodies are staining the cells in an unspecific manner, and that assessment of transcript presence by RNA-seq can provide guidance for validation of the corresponding antibodies.

  18. A unique enhancer boundary complex on the mouse ribosomal RNA genes persists after loss of Rrn3 or UBF and the inactivation of RNA polymerase I transcription.

    Science.gov (United States)

    Herdman, Chelsea; Mars, Jean-Clement; Stefanovsky, Victor Y; Tremblay, Michel G; Sabourin-Felix, Marianne; Lindsay, Helen; Robinson, Mark D; Moss, Tom

    2017-07-01

    Transcription of the several hundred of mouse and human Ribosomal RNA (rRNA) genes accounts for the majority of RNA synthesis in the cell nucleus and is the determinant of cytoplasmic ribosome abundance, a key factor in regulating gene expression. The rRNA genes, referred to globally as the rDNA, are clustered as direct repeats at the Nucleolar Organiser Regions, NORs, of several chromosomes, and in many cells the active repeats are transcribed at near saturation levels. The rDNA is also a hotspot of recombination and chromosome breakage, and hence understanding its control has broad importance. Despite the need for a high level of rDNA transcription, typically only a fraction of the rDNA is transcriptionally active, and some NORs are permanently silenced by CpG methylation. Various chromatin-remodelling complexes have been implicated in counteracting silencing to maintain rDNA activity. However, the chromatin structure of the active rDNA fraction is still far from clear. Here we have combined a high-resolution ChIP-Seq protocol with conditional inactivation of key basal factors to better understand what determines active rDNA chromatin. The data resolve questions concerning the interdependence of the basal transcription factors, show that preinitiation complex formation is driven by the architectural factor UBF (UBTF) independently of transcription, and that RPI termination and release corresponds with the site of TTF1 binding. They further reveal the existence of an asymmetric Enhancer Boundary Complex formed by CTCF and Cohesin and flanked upstream by phased nucleosomes and downstream by an arrested RNA Polymerase I complex. We find that the Enhancer Boundary Complex is the only site of active histone modification in the 45kbp rDNA repeat. Strikingly, it not only delimits each functional rRNA gene, but also is stably maintained after gene inactivation and the re-establishment of surrounding repressive chromatin. Our data define a poised state of rDNA chromatin

  19. A unique enhancer boundary complex on the mouse ribosomal RNA genes persists after loss of Rrn3 or UBF and the inactivation of RNA polymerase I transcription.

    Directory of Open Access Journals (Sweden)

    Chelsea Herdman

    2017-07-01

    Full Text Available Transcription of the several hundred of mouse and human Ribosomal RNA (rRNA genes accounts for the majority of RNA synthesis in the cell nucleus and is the determinant of cytoplasmic ribosome abundance, a key factor in regulating gene expression. The rRNA genes, referred to globally as the rDNA, are clustered as direct repeats at the Nucleolar Organiser Regions, NORs, of several chromosomes, and in many cells the active repeats are transcribed at near saturation levels. The rDNA is also a hotspot of recombination and chromosome breakage, and hence understanding its control has broad importance. Despite the need for a high level of rDNA transcription, typically only a fraction of the rDNA is transcriptionally active, and some NORs are permanently silenced by CpG methylation. Various chromatin-remodelling complexes have been implicated in counteracting silencing to maintain rDNA activity. However, the chromatin structure of the active rDNA fraction is still far from clear. Here we have combined a high-resolution ChIP-Seq protocol with conditional inactivation of key basal factors to better understand what determines active rDNA chromatin. The data resolve questions concerning the interdependence of the basal transcription factors, show that preinitiation complex formation is driven by the architectural factor UBF (UBTF independently of transcription, and that RPI termination and release corresponds with the site of TTF1 binding. They further reveal the existence of an asymmetric Enhancer Boundary Complex formed by CTCF and Cohesin and flanked upstream by phased nucleosomes and downstream by an arrested RNA Polymerase I complex. We find that the Enhancer Boundary Complex is the only site of active histone modification in the 45kbp rDNA repeat. Strikingly, it not only delimits each functional rRNA gene, but also is stably maintained after gene inactivation and the re-establishment of surrounding repressive chromatin. Our data define a poised state

  20. An Increase of Abundance and Transcriptional Activity for Acinetobacter junii Post Wastewater Treatment

    Directory of Open Access Journals (Sweden)

    Muhammad Raihan Jumat

    2018-04-01

    Full Text Available A membrane bioreactor (MBR-based wastewater treatment plant (WWTP in Saudi Arabia is assessed over a five-month period in 2015 and once in 2017 for bacterial diversity and transcriptional activity using metagenomics, metatranscriptomics and real time quantitative polymerase chain reaction (RT-qPCR. Acinetobacter spp. are shown to be enriched in the chlorinated effluent. Members of the Acinetobacter genus are the most abundant in the effluent and chlorinated effluent. At the species level, Acinetobacter junii have higher relative abundances post MBR and chlorination. RNA-seq analysis show that, in A. junii, 288 genes and 378 genes are significantly upregulated in the effluent and chlorinated effluent, respectively, with 98 genes being upregulated in both. RT-qPCR of samples in 2015 and 2017 confirm the upregulation observed in RNA-seq. Analysis of the 98 genes show that majority of the upregulated genes are involved in cellular repair and metabolism followed by resistance, virulence, and signaling. Additionally, two different subpopulations of A. junii are observed in the effluent and chlorinated effluent. The upregulation of cellular repair and metabolism genes, and the formation of different subpopulations of A. junii in both effluents provide insights into the mechanisms employed by A. junii to persist in the conditions of a WWTP.

  1. An Increase of Abundance and Transcriptional Activity for Acinetobacter junii Post Wastewater Treatment

    KAUST Repository

    Jumat, Muhammad; Haroon, Muhammad; Aljassim, Nada I.; Cheng, Hong; Hong, Pei-Ying

    2018-01-01

    A membrane bioreactor (MBR)-based wastewater treatment plant (WWTP) in Saudi Arabia is assessed over a five-month period in 2015 and once in 2017 for bacterial diversity and transcriptional activity using metagenomics, metatranscriptomics and real time quantitative polymerase chain reaction (RT-qPCR). Acinetobacter spp. are shown to be enriched in the chlorinated effluent. Members of the Acinetobacter genus are the most abundant in the effluent and chlorinated effluent. At the species level, Acinetobacter junii have higher relative abundances post MBR and chlorination. RNA-seq analysis show that, in A. junii, 288 genes and 378 genes are significantly upregulated in the effluent and chlorinated effluent, respectively, with 98 genes being upregulated in both. RT-qPCR of samples in 2015 and 2017 confirm the upregulation observed in RNA-seq. Analysis of the 98 genes show that majority of the upregulated genes are involved in cellular repair and metabolism followed by resistance, virulence, and signaling. Additionally, two different subpopulations of A. junii are observed in the effluent and chlorinated effluent. The upregulation of cellular repair and metabolism genes, and the formation of different subpopulations of A. junii in both effluents provide insights into the mechanisms employed by A. junii to persist in the conditions of a WWTP.

  2. An Increase of Abundance and Transcriptional Activity for Acinetobacter junii Post Wastewater Treatment

    KAUST Repository

    Jumat, Muhammad

    2018-04-11

    A membrane bioreactor (MBR)-based wastewater treatment plant (WWTP) in Saudi Arabia is assessed over a five-month period in 2015 and once in 2017 for bacterial diversity and transcriptional activity using metagenomics, metatranscriptomics and real time quantitative polymerase chain reaction (RT-qPCR). Acinetobacter spp. are shown to be enriched in the chlorinated effluent. Members of the Acinetobacter genus are the most abundant in the effluent and chlorinated effluent. At the species level, Acinetobacter junii have higher relative abundances post MBR and chlorination. RNA-seq analysis show that, in A. junii, 288 genes and 378 genes are significantly upregulated in the effluent and chlorinated effluent, respectively, with 98 genes being upregulated in both. RT-qPCR of samples in 2015 and 2017 confirm the upregulation observed in RNA-seq. Analysis of the 98 genes show that majority of the upregulated genes are involved in cellular repair and metabolism followed by resistance, virulence, and signaling. Additionally, two different subpopulations of A. junii are observed in the effluent and chlorinated effluent. The upregulation of cellular repair and metabolism genes, and the formation of different subpopulations of A. junii in both effluents provide insights into the mechanisms employed by A. junii to persist in the conditions of a WWTP.

  3. Amplification of pico-scale DNA mediated by bacterial carrier DNA for small-cell-number transcription factor ChIP-seq

    DEFF Research Database (Denmark)

    Jakobsen, Janus S; Bagger, Frederik O; Hasemann, Marie S

    2015-01-01

    BACKGROUND: Chromatin-Immunoprecipitation coupled with deep sequencing (ChIP-seq) is used to map transcription factor occupancy and generate epigenetic profiles genome-wide. The requirement of nano-scale ChIP DNA for generation of sequencing libraries has impeded ChIP-seq on in vivo tissues of low...... transcription factor (CEBPA) and histone mark (H3K4me3) ChIP. We further demonstrate that genomic profiles are highly resilient to changes in carrier DNA to ChIP DNA ratios. CONCLUSIONS: This represents a significant advance compared to existing technologies, which involve either complex steps of pre...... cell numbers. RESULTS: We describe a robust, simple and scalable methodology for ChIP-seq of low-abundant cell populations, verified down to 10,000 cells. By employing non-mammalian genome mapping bacterial carrier DNA during amplification, we reliably amplify down to 50 pg of ChIP DNA from...

  4. Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels

    NARCIS (Netherlands)

    Deelen, Patrick; Zhernakova, Daria V.; de Haan, Mark; van der Sijde, Marijke; Bonder, Marc Jan; Karjalainen, Juha; van der Velde, K. Joeri; Abbott, Kristin M.; Fu, Jingyuan; Wijmenga, Cisca; Sinke, Richard J.; Swertz, Morris A.; Franke, Lude

    2015-01-01

    Background: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq

  5. Simultaneous DNA-RNA Extraction from Coastal Sediments and Quantification of 16S rRNA Genes and Transcripts by Real-time PCR.

    Science.gov (United States)

    Tatti, Enrico; McKew, Boyd A; Whitby, Corrine; Smith, Cindy J

    2016-06-11

    Real Time Polymerase Chain Reaction also known as quantitative PCR (q-PCR) is a widely used tool in microbial ecology to quantify gene abundances of taxonomic and functional groups in environmental samples. Used in combination with a reverse transcriptase reaction (RT-q-PCR), it can also be employed to quantify gene transcripts. q-PCR makes use of highly sensitive fluorescent detection chemistries that allow quantification of PCR amplicons during the exponential phase of the reaction. Therefore, the biases associated with 'end-point' PCR detected in the plateau phase of the PCR reaction are avoided. A protocol to quantify bacterial 16S rRNA genes and transcripts from coastal sediments via real-time PCR is provided. First, a method for the co-extraction of DNA and RNA from coastal sediments, including the additional steps required for the preparation of DNA-free RNA, is outlined. Second, a step-by-step guide for the quantification of 16S rRNA genes and transcripts from the extracted nucleic acids via q-PCR and RT-q-PCR is outlined. This includes details for the construction of DNA and RNA standard curves. Key considerations for the use of RT-q-PCR assays in microbial ecology are included.

  6. Post-transcriptional Mechanisms Contribute Little to Phenotypic Variation in Snake Venoms.

    Science.gov (United States)

    Rokyta, Darin R; Margres, Mark J; Calvin, Kate

    2015-09-09

    Protein expression is a major link in the genotype-phenotype relationship, and processes affecting protein abundances, such as rates of transcription and translation, could contribute to phenotypic evolution if they generate heritable variation. Recent work has suggested that mRNA abundances do not accurately predict final protein abundances, which would imply that post-transcriptional regulatory processes contribute significantly to phenotypes. Post-transcriptional processes also appear to buffer changes in transcriptional patterns as species diverge, suggesting that the transcriptional changes have little or no effect on the phenotypes undergoing study. We tested for concordance between mRNA and protein expression levels in snake venoms by means of mRNA-seq and quantitative mass spectrometry for 11 snakes representing 10 species, six genera, and three families. In contrast to most previous work, we found high correlations between venom gland transcriptomes and venom proteomes for 10 of our 11 comparisons. We tested for protein-level buffering of transcriptional changes during species divergence by comparing the difference between transcript abundance and protein abundance for three pairs of species and one intraspecific pair. We found no evidence for buffering during divergence of our three species pairs but did find evidence for protein-level buffering for our single intraspecific comparison, suggesting that buffering, if present, was a transient phenomenon in venom divergence. Our results demonstrated that post-transcriptional mechanisms did not contribute significantly to phenotypic evolution in venoms and suggest a more prominent and direct role for cis-regulatory evolution in phenotypic variation, particularly for snake venoms. Copyright © 2015 Rokyta et al.

  7. Identification of circulating miRNA biomarkers based on global quantitative real-time PCR profiling

    Directory of Open Access Journals (Sweden)

    Kang Kang

    2012-02-01

    Full Text Available Abstract MicroRNAs (miRNAs are small noncoding RNAs (18-25 nucleotides that regulate gene expression at the post-transcriptional level. Recent studies have demonstrated the presence of miRNAs in the blood circulation. Deregulation of miRNAs in serum or plasma has been associated with many diseases including cancers and cardiovascular diseases, suggesting the possible use of miRNAs as diagnostic biomarkers. However, the detection of the small amount of miRNAs found in serum or plasma requires a method with high sensitivity and accuracy. Therefore, the current study describes polymerase chain reaction (PCR-based methods for measuring circulating miRNAs. Briefly, the procedure involves four major steps: (1 sample collection and preparation; (2 global miRNAs profiling using quantitative real-time PCR (qRT-PCR; (3 data normalization and analysis; and (4 selection and validation of miRNA biomarkers. In conclusion, qRT-PCR is a promising method for profiling of circulating miRNAs as biomarkers.

  8. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis

    KAUST Repository

    Kulakovskiy, Ivan V.; Vorontsov, Ilya E.; Yevshin, Ivan S.; Sharipov, Ruslan N.; Fedorova, Alla D.; Rumynskiy, Eugene I.; Medvedeva, Yulia A.; Magana-Mora, Arturo; Bajic, Vladimir B.; Papatsenko, Dmitry A.; Kolpakov, Fedor A.; Makeev, Vsevolod J.

    2017-01-01

    We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.

  9. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis

    KAUST Repository

    Kulakovskiy, Ivan V.

    2017-10-31

    We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.

  10. RNA-seq Reveals the Overexpression of IGSF9 in Endometrial Cancer

    Directory of Open Access Journals (Sweden)

    Zonggao Shi

    2018-01-01

    Full Text Available We performed RNA-seq on an Illumina platform for 7 patients with endometrioid endometrial carcinoma for which both tumor tissue and adjacent noncancer tissue were available. A total of 66 genes were differentially expressed with significance level at adjusted p value < 0.01. Using the gene functional classification tool in the NIH DAVID bioinformatics resource, 5 genes were found to be the only enriched group out of that list of genes. The gene IGSF9 was chosen for further characterization with immunohistochemical staining of a larger cohort of human endometrioid carcinoma tissues. The expression level of IGSF9 in cancer cells was significantly higher than that in control glandular cells in paired tissue samples from the same patients (p=0.008 or in overall comparison between cancer and the control (p=0.003. IGSF9 expression is higher in patients with myometrium invasion relative to those without invasion (p=0.015. Reanalysis of RNA-seq dataset from The Cancer Genome Atlas shows higher expression of IGSF9 in endometrial cancer versus normal control and expression was associated with poor prognosis. These results suggest IGSF9 as a new biomarker in endometrial cancer and warrant further studies on its function, mechanism of action, and potential clinical utility.

  11. Transcription factor trapping by RNA in gene regulatory elements.

    Science.gov (United States)

    Sigova, Alla A; Abraham, Brian J; Ji, Xiong; Molinie, Benoit; Hannett, Nancy M; Guo, Yang Eric; Jangi, Mohini; Giallourakis, Cosmas C; Sharp, Phillip A; Young, Richard A

    2015-11-20

    Transcription factors (TFs) bind specific sequences in promoter-proximal and -distal DNA elements to regulate gene transcription. RNA is transcribed from both of these DNA elements, and some DNA binding TFs bind RNA. Hence, RNA transcribed from regulatory elements may contribute to stable TF occupancy at these sites. We show that the ubiquitously expressed TF Yin-Yang 1 (YY1) binds to both gene regulatory elements and their associated RNA species across the entire genome. Reduced transcription of regulatory elements diminishes YY1 occupancy, whereas artificial tethering of RNA enhances YY1 occupancy at these elements. We propose that RNA makes a modest but important contribution to the maintenance of certain TFs at gene regulatory elements and suggest that transcription of regulatory elements produces a positive-feedback loop that contributes to the stability of gene expression programs. Copyright © 2015, American Association for the Advancement of Science.

  12. A model-based approach to identify binding sites in CLIP-Seq data.

    Directory of Open Access Journals (Sweden)

    Tao Wang

    Full Text Available Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq has made it possible to identify the targeting sites of RNA-binding proteins in various cell culture systems and tissue types on a genome-wide scale. Here we present a novel model-based approach (MiClip to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets. In the HITS-CLIP dataset, the signal/noise ratios of miRNA seed motif enrichment produced by the MiClip approach are between 17% and 301% higher than those by the ad hoc method for the top 10 most enriched miRNAs. In the PAR-CLIP dataset, the MiClip approach can identify ∼50% more validated binding targets than the original ad hoc method and two recently published methods. To facilitate the application of the algorithm, we have released an R package, MiClip (http://cran.r-project.org/web/packages/MiClip/index.html, and a public web-based graphical user interface software (http://galaxy.qbrc.org/tool_runner?tool_id=mi_clip for customized analysis.

  13. Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing.

    Science.gov (United States)

    Anvar, Seyed Yahya; Allard, Guy; Tseng, Elizabeth; Sheynkman, Gloria M; de Klerk, Eleonora; Vermaat, Martijn; Yin, Raymund H; Johansson, Hans E; Ariyurek, Yavuz; den Dunnen, Johan T; Turner, Stephen W; 't Hoen, Peter A C

    2018-03-29

    The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.

  14. Single-tube linear DNA amplification (LinDA) for robust ChIP-seq

    NARCIS (Netherlands)

    Shankaranarayanan, P.; Mendoza-Parra, M.A.; Walia, M.; Wang, L.; Li, N.; Trindade, L.M.; Gronemeyer, H.

    2011-01-01

    Genome-wide profiling of transcription factors based on massive parallel sequencing of immunoprecipitated chromatin (ChIP-seq) requires nanogram amounts of DNA. Here we describe a high-fidelity, single-tube linear DNA amplification method (LinDA) for ChIP-seq and reChIP-seq with picogram DNA amounts

  15. Inferring Molecular Processes Heterogeneity from Transcriptional Data.

    Science.gov (United States)

    Gogolewski, Krzysztof; Wronowska, Weronika; Lech, Agnieszka; Lesyng, Bogdan; Gambin, Anna

    2017-01-01

    RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs.

  16. Targeted Integration of RNA-Seq and Metabolite Data to Elucidate Curcuminoid Biosynthesis in Four Curcuma Species.

    Science.gov (United States)

    Li, Donghan; Ono, Naoaki; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Ohta, Daisaku; Suzuki, Hideyuki; Arita, Masanori; Tanaka, Ken; Ma, Zhiqiang; Kanaya, Shigehiko

    2015-05-01

    Curcuminoids, namely curcumin and its analogs, are secondary metabolites that act as the primary active constituents of turmeric (Curcuma longa). The contents of these curcuminoids vary among species in the genus Curcuma. For this reason, we compared two wild strains and two cultivars to understand the differences in the synthesis of curcuminoids. Because the fluxes of metabolic reactions depend on the amounts of their substrate and the activity of the catalysts, we analyzed the metabolite concentrations and gene expression of related enzymes. We developed a method based on RNA sequencing (RNA-Seq) analysis that focuses on a specific set of genes to detect expression differences between species in detail. We developed a 'selection-first' method for RNA-Seq analysis in which short reads are mapped to selected enzymes in the target biosynthetic pathways in order to reduce the effect of mapping errors. Using this method, we found that the difference in the contents of curcuminoids among the species, as measured by gas chromatography-mass spectrometry, could be explained by the changes in the expression of genes encoding diketide-CoA synthase, and curcumin synthase at the branching point of the curcuminoid biosynthesis pathway. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  17. RNA binding specificity of Ebola virus transcription factor VP30.

    Science.gov (United States)

    Schlereth, Julia; Grünweller, Arnold; Biedenkopf, Nadine; Becker, Stephan; Hartmann, Roland K

    2016-09-01

    The transcription factor VP30 of the non-segmented RNA negative strand Ebola virus balances viral transcription and replication. Here, we comprehensively studied RNA binding by VP30. Using a novel VP30:RNA electrophoretic mobility shift assay, we tested truncated variants of 2 potential natural RNA substrates of VP30 - the genomic Ebola viral 3'-leader region and its complementary antigenomic counterpart (each ∼155 nt in length) - and a series of other non-viral RNAs. Based on oligonucleotide interference, the major VP30 binding region on the genomic 3'-leader substrate was assigned to the internal expanded single-stranded region (∼ nt 125-80). Best binding to VP30 was obtained with ssRNAs of optimally ∼ 40 nt and mixed base composition; underrepresentation of purines or pyrimidines was tolerated, but homopolymeric sequences impaired binding. A stem-loop structure, particularly at the 3'-end or positioned internally, supports stable binding to VP30. In contrast, dsRNA or RNAs exposing large internal loops flanked by entirely helical arms on both sides are not bound. Introduction of a 5´-Cap(0) structure impaired VP30 binding. Also, ssDNAs bind substantially weaker than isosequential ssRNAs and heparin competes with RNA for binding to VP30, indicating that ribose 2'-hydroxyls and electrostatic contacts of the phosphate groups contribute to the formation of VP30:RNA complexes. Our results indicate a rather relaxed RNA binding specificity of filoviral VP30, which largely differs from that of the functionally related transcription factor of the Paramyxoviridae which binds to ssRNAs as short as 13 nt with a preference for oligo(A) sequences.

  18. RNA polymerase II mediated transcription from the polymerase III promoters in short hairpin RNA expression vector

    International Nuclear Information System (INIS)

    Rumi, Mohammad; Ishihara, Shunji; Aziz, Monowar; Kazumori, Hideaki; Ishimura, Norihisa; Yuki, Takafumi; Kadota, Chikara; Kadowaki, Yasunori; Kinoshita, Yoshikazu

    2006-01-01

    RNA polymerase III promoters of human ribonuclease P RNA component H1, human U6, and mouse U6 small nuclear RNA genes are commonly used in short hairpin RNA (shRNA) expression vectors due their precise initiation and termination sites. During transient transfection of shRNA vectors, we observed that H1 or U6 promoters also express longer transcripts enough to express several reporter genes including firefly luciferase, green fluorescent protein EGFP, and red fluorescent protein JRed. Expression of such longer transcripts was augmented by upstream RNA polymerase II enhancers and completely inhibited by downstream polyA signal sequences. Moreover, the transcription of firefly luciferase from human H1 promoter was sensitive to RNA polymerase II inhibitor α-amanitin. Our findings suggest that commonly used polymerase III promoters in shRNA vectors are also prone to RNA polymerase II mediated transcription, which may have negative impacts on their targeted use

  19. Post-transcriptional generation of miRNA variants by multiple nucleotidyl transferases contributes to miRNA transcriptome complexity.

    Science.gov (United States)

    Wyman, Stacia K; Knouf, Emily C; Parkin, Rachael K; Fritz, Brian R; Lin, Daniel W; Dennis, Lucas M; Krouse, Michael A; Webster, Philippa J; Tewari, Muneesh

    2011-09-01

    Modification of microRNA sequences by the 3' addition of nucleotides to generate so-called "isomiRs" adds to the complexity of miRNA function, with recent reports showing that 3' modifications can influence miRNA stability and efficiency of target repression. Here, we show that the 3' modification of miRNAs is a physiological and common post-transcriptional event that shows selectivity for specific miRNAs and is observed across species ranging from C. elegans to human. The modifications result predominantly from adenylation and uridylation and are seen across tissue types, disease states, and developmental stages. To quantitatively profile 3' nucleotide additions, we developed and validated a novel assay based on NanoString Technologies' nCounter platform. For certain miRNAs, the frequency of modification was altered by processes such as cell differentiation, indicating that 3' modification is a biologically regulated process. To investigate the mechanism of 3' nucleotide additions, we used RNA interference to screen a panel of eight candidate miRNA nucleotidyl transferases for 3' miRNA modification activity in human cells. Multiple enzymes, including MTPAP, PAPD4, PAPD5, ZCCHC6, ZCCHC11, and TUT1, were found to govern 3' nucleotide addition to miRNAs in a miRNA-specific manner. Three of these enzymes-MTPAP, ZCCHC6, and TUT1-have not previously been known to modify miRNAs. Collectively, our results indicate that 3' modification observed in next-generation small RNA sequencing data is a biologically relevant process, and identify enzymatic mechanisms that may lead to new approaches for modulating miRNA activity in vivo.

  20. mRNA-seq analysis of the Gossypium arboreum transcriptome reveals tissue selective signaling in response to water stress during seedling stage.

    Directory of Open Access Journals (Sweden)

    Xueyan Zhang

    Full Text Available The cotton diploid species, Gossypium arboreum, shows important properties of stress tolerance and good genetic stability. In this study, through mRNA-seq, we de novo assembled the unigenes of multiple samples with 3h H(2O, NaCl, or PEG treatments in leaf, stem and root tissues and successfully obtained 123,579 transcripts of G. arboreum, 89,128 of which were with hits through BLAST against known cotton ESTs and draft genome of G. raimondii. About 36,961 transcripts (including 1,958 possible transcription factor members were identified with differential expression under water stresses. Principal component analysis of differential expression levels in multiple samples suggested tissue selective signalling responding to water stresses. Venn diagram analysis showed the specificity and intersection of transcripts' response to NaCl and PEG treatments in different tissues. Self-organized mapping and hierarchical cluster analysis of the data also revealed strong tissue selectivity of transcripts under salt and osmotic stresses. In addition, the enriched gene ontology (GO terms for the selected tissue groups were differed, including some unique enriched GO terms such as photosynthesis and tetrapyrrole binding only in leaf tissues, while the stem-specific genes showed unique GO terms related to plant-type cell wall biogenesis, and root-specific genes showed unique GO terms such as monooxygenase activity. Furthermore, there were multiple hormone cross-talks in response to osmotic and salt stress. In summary, our multidimensional mRNA sequencing revealed tissue selective signalling and hormone crosstalk in response to salt and osmotic stresses in G. arboreum. To our knowledge, this is the first such report of spatial resolution of transcriptome analysis in G. arboreum. Our study will potentially advance understanding of possible transcriptional networks associated with water stress in cotton and other crop species.

  1. The effects of age-in-block on RNA-seq analysis of archival formalin-fixed paraffin-embedded (FFPE) samples

    Science.gov (United States)

    Archival samples represent a vast resource for identification of chemical and pharmaceutical targets. Previous use of formalin-fixed paraffin-embedded (FFPE) samples has been limited due to changes in RNA introduced by fixation and embedding procedures. Recent advances in RNA-seq...

  2. RNA-Seq Study of Microbially Induced Hemocyte Transcripts from Larval Heliothis virescens (Lepidoptera: Noctuidae

    Directory of Open Access Journals (Sweden)

    Kent S. Shelby

    2012-08-01

    Full Text Available Larvae of the tobacco budworm are major polyphagous pests throughout the Americas. Development of effective microbial biopesticides for this and related noctuid pests has been stymied by the natural resistance mediated innate immune response. Hemocytes play an early and central role in activating and coordinating immune responses to entomopathogens. To approach this problem we completed RNA-seq expression profiling of hemocytes collected from larvae following an in vivo challenge with bacterial and fungal cell wall components to elicit an immune response. A de novo exome assembly was constructed by combination of sequence tags from all treatments. Sequence tags from each treatment were aligned separately with the assembly to measure expression. The resulting table of differential expression had > 22,000 assemblies each with a distinct combination of annotation and expression. Within these assemblies > 1,400 were upregulated and > 1,500 downregulated by immune activation with bacteria or fungi. Orthologs to innate immune components of other insects were identified including pattern recognition, signal transduction pathways, antimicrobial peptides and enzymes, melanization and coagulation. Additionally orthologs of components regulating hemocytic functions such as autophagy, apoptosis, phagocytosis and nodulation were identified. Associated cellular oxidative defenses and detoxification responses were identified providing a comprehensive snapshot of the early response to elicitation.

  3. RNA-seq Transcriptional Profiling of an Arbuscular Mycorrhiza Provides Insights into Regulated and Coordinated Gene Expression in Lotus japonicus and Rhizophagus irregularis.

    Science.gov (United States)

    Handa, Yoshihiro; Nishide, Hiroyo; Takeda, Naoya; Suzuki, Yutaka; Kawaguchi, Masayoshi; Saito, Katsuharu

    2015-08-01

    Gene expression during arbuscular mycorrhizal development is highly orchestrated in both plants and arbuscular mycorrhizal fungi. To elucidate the gene expression profiles of the symbiotic association, we performed a digital gene expression analysis of Lotus japonicus and Rhizophagus irregularis using a HiSeq 2000 next-generation sequencer with a Cufflinks assembly and de novo transcriptome assembly. There were 3,641 genes differentially expressed during arbuscular mycorrhizal development in L. japonicus, approximately 80% of which were up-regulated. The up-regulated genes included secreted proteins, transporters, proteins involved in lipid and amino acid metabolism, ribosomes and histones. We also detected many genes that were differentially expressed in small-secreted peptides and transcription factors, which may be involved in signal transduction or transcription regulation during symbiosis. Co-regulated genes between arbuscular mycorrhizal and root nodule symbiosis were not particularly abundant, but transcripts encoding for membrane traffic-related proteins, transporters and iron transport-related proteins were found to be highly co-up-regulated. In transcripts of arbuscular mycorrhizal fungi, expansion of cytochrome P450 was observed, which may contribute to various metabolic pathways required to accommodate roots and soil. The comprehensive gene expression data of both plants and arbuscular mycorrhizal fungi provide a powerful platform for investigating the functional and molecular mechanisms underlying arbuscular mycorrhizal symbiosis. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  4. HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data

    Directory of Open Access Journals (Sweden)

    Maher Christopher A

    2010-07-01

    Full Text Available Abstract Background Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP with next-generation sequencing, (ChIP-Seq. This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method. Results Here we introduce HPeak, a Hidden Markov model (HMM-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage. Conclusions Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.

  5. MAJIQ-SPEL: Web-tool to interrogate classical and complex splicing variations from RNA-Seq data.

    Science.gov (United States)

    Green, Christopher J; Gazzara, Matthew R; Barash, Yoseph

    2017-09-11

    Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret, and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Program and code will be available at http://majiq.biociphers.org/majiq-spel. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. Transcriptomic analysis of the stress response to weaning at housing in bovine leukocytes using RNA-seq technology

    Directory of Open Access Journals (Sweden)

    O’Loughlin Aran

    2012-06-01

    Full Text Available Abstract Background Weaning of beef calves is a necessary husbandry practice and involves separating the calf from its mother, resulting in numerous stressful events including dietary change, social reorganisation and the cessation of the maternal-offspring bond and is often accompanied by housing. While much recent research has focused on the physiological response of the bovine immune system to stress in recent years, little is known about the molecular mechanisms modulating the immune response. Therefore, the objective of this study was to provide new insights into the molecular mechanisms underlying the physiological response to weaning at housing in beef calves using Illumina RNA-seq. Results The leukocyte transcriptome was significantly altered for at least 7 days following either housing or weaning at housing. Analysis of differentially expressed genes revealed that four main pathways, cytokine signalling, transmembrane transport, haemostasis and G-protein-coupled receptor (GPRC signalling were differentially regulated between control and weaned calves and underwent significant transcriptomic alterations in response to weaning stress on day 1, 2 and 7. Of particular note, chemokines, cytokines and integrins were consistently found to be up-regulated on each day following weaning. Evidence for alternative splicing of genes was also detected, indicating a number of genes involved in the innate and adaptive immune response may be alternatively transcribed, including those responsible for toll receptor cascades and T cell receptor signalling. Conclusions This study represents the first application of RNA-Seq technology for genomic studies in bovine leukocytes in response to weaning stress. Weaning stress induces the activation of a number of cytokine, chemokine and integrin transcripts and may alter the immune system whereby the ability of a number of cells of the innate and adaptive immune system to locate and destroy pathogens is

  7. RNA-seq analysis of Brachypodium distachyon responses to Barley stripe mosaic virus infection

    Directory of Open Access Journals (Sweden)

    Guoxin Wang

    2017-02-01

    Full Text Available Barley stripe mosaic virus (BSMV is the type member of the genus Hordeivirus. Brachypodium distachyon line Bd3-1 shows resistance to the BSMV ND18 strain, but is susceptible to an ND18 double mutant (β NDTGB1R390K, T392K in which lysine is substituted for an arginine at position 390 and for threonine at position 392 of the triple gene block 1 (TGB1 protein. In order to understand differences in gene expression following infection with ND18 and double mutant ND18, Bd3-1 seedlings were subjected to RNA-seq analyses at 1, 6, and 14 days post inoculation (dpi. The results revealed that basal immunity genes involved in cellulose synthesis and pathogenesis-related protein biosynthesis were enhanced in incompatible interactions between Bd3-1 and ND18. Most of the differentially expressed transcripts are related to trehalose biosynthesis, ethylene, jasmonic acid metabolism, protein phosphorylation, protein ubiquitination, transcriptional regulation, and transport process, as well as pathogenesis-related protein biosynthesis. In compatible interactions between Bd3-1 and ND18 mutant, Bd3-1 developed weak basal resistance responses to the virus. Many genes involved in cellulose biosynthesis, protein amino acid phosphorylation, protein biosynthesis, protein glycosylation, glycolysis and cellular macromolecular complex assembly that may be related to virus replication, assembly and movement were up-regulated. Some genes involved in oxidative stress responses were also up-regulated at 14 dpi. BSMV ND18 mutant infection suppressed expression of genes functioning in regulation of transcription, protein kinase, cellular nitrogen compound biosynthetic process and photosynthesis. Differential expression patterns between compatible and incompatible interactions in Bd3-1 to the two BSMV strains provide important clues for understanding mechanism of resistance to BMSV in the model plant Brachypodium.

  8. Single-Cell RNA-Seq of Mouse Dopaminergic Neurons Informs Candidate Gene Selection for Sporadic Parkinson Disease.

    Science.gov (United States)

    Hook, Paul W; McClymont, Sarah A; Cannon, Gabrielle H; Law, William D; Morton, A Jennifer; Goff, Loyal A; McCallion, Andrew S

    2018-03-01

    Genetic variation modulating risk of sporadic Parkinson disease (PD) has been primarily explored through genome-wide association studies (GWASs). However, like many other common genetic diseases, the impacted genes remain largely unknown. Here, we used single-cell RNA-seq to characterize dopaminergic (DA) neuron populations in the mouse brain at embryonic and early postnatal time points. These data facilitated unbiased identification of DA neuron subpopulations through their unique transcriptional profiles, including a postnatal neuroblast population and substantia nigra (SN) DA neurons. We use these population-specific data to develop a scoring system to prioritize candidate genes in all 49 GWAS intervals implicated in PD risk, including genes with known PD associations and many with extensive supporting literature. As proof of principle, we confirm that the nigrostriatal pathway is compromised in Cplx1-null mice. Ultimately, this systematic approach establishes biologically pertinent candidates and testable hypotheses for sporadic PD, informing a new era of PD genetic research. Copyright © 2018 American Society of Human Genetics. All rights reserved.

  9. miRNA-mediated 'tug-of-war' model reveals ceRNA propensity of genes in cancers.

    Science.gov (United States)

    Swain, Arpit Chandan; Mallick, Bibekanand

    2018-06-01

    Competing endogenous RNA (ceRNA) are transcripts that cross-regulate each other at the post-transcriptional level by competing for shared microRNA response elements (MREs). These have been implicated in various biological processes impacting cell-fate decisions and diseases including cancer. There are several studies that predict possible ceRNA pairs by adopting various machine-learning and mathematical approaches; however, there is no method that enables us to gauge as well as compare the propensity of the ceRNA of a gene and precisely envisages which among a pair exerts a stronger pull on the shared miRNA pool. In this study, we developed a method that uses the 'tug of war of genes' concept to predict and quantify ceRNA potential of a gene for the shared miRNA pool in cancers based on a score represented by SoCeR (score of competing endogenous RNA). The method was executed on the RNA-Seq transcriptional profiles of genes and miRNA available at TCGA along with CLIP-supported miRNA-target sites to predict ceRNA in 32 cancer types which were validated with already reported cases. The proposed method can be used to determine the sequestering capability of the gene of interest as well as in ranking the probable ceRNA candidates of a gene. Finally, we developed standalone applications (SoCeR tool) to aid researchers in easier implementation of the method in analysing different data sets or diseases. © 2018 The Authors. Published by FEBS Press and John Wiley & Sons Ltd.

  10. RNA-seq analysis of Quercus pubescens Leaves: de novo transcriptome assembly, annotation and functional markers development.

    Directory of Open Access Journals (Sweden)

    Sara Torre

    Full Text Available Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.

  11. Genome-wide profiling of transcription factor binding and epigenetic marks in adipocytes by ChIP-seq

    DEFF Research Database (Denmark)

    Nielsen, Ronni; Mandrup, Susanne

    2014-01-01

    of the most widely used of these technologies. Using these methods, association of transcription factors, cofactors, and epigenetic marks can be mapped to DNA in a genome-wide manner. Here, we provide a detailed protocol for performing ChIP-seq analyses in preadipocytes and adipocytes. We have focused mainly...

  12. Environmental contaminants and microRNA regulation: Transcription factors as regulators of toxicant-altered microRNA expression

    Energy Technology Data Exchange (ETDEWEB)

    Sollome, James; Martin, Elizabeth [Department of Environmental Science & Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill (United States); Sethupathy, Praveen [Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC (United States); Fry, Rebecca C., E-mail: rfry@unc.edu [Department of Environmental Science & Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill (United States); Curriculum in Toxicology, School of Medicine, University of North Carolina, Chapel Hill, NC (United States)

    2016-12-01

    MicroRNAs (miRNAs) regulate gene expression by binding mRNA and inhibiting translation and/or inducing degradation of the associated transcripts. Expression levels of miRNAs have been shown to be altered in response to environmental toxicants, thus impacting cellular function and influencing disease risk. Transcription factors (TFs) are known to be altered in response to environmental toxicants and play a critical role in the regulation of miRNA expression. To date, environmentally-responsive TFs that are important for regulating miRNAs remain understudied. In a state-of-the-art analysis, we utilized an in silico bioinformatic approach to characterize potential transcriptional regulators of environmentally-responsive miRNAs. Using the miRStart database, genomic sequences of promoter regions for all available human miRNAs (n = 847) were identified and promoter regions were defined as − 1000/+500 base pairs from the transcription start site. Subsequently, the promoter region sequences of environmentally-responsive miRNAs (n = 128) were analyzed using enrichment analysis to determine overrepresented TF binding sites (TFBS). While most (56/73) TFs differed across environmental contaminants, a set of 17 TFs was enriched for promoter binding among miRNAs responsive to numerous environmental contaminants. Of these, one TF was common to miRNAs altered by the majority of environmental contaminants, namely SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily A, member 3 (SMARCA3). These identified TFs represent candidate common transcriptional regulators of miRNAs perturbed by environmental toxicants. - Highlights: • Transcription factors that regulate environmentally-modulated miRNA expression are understudied • Transcription factor binding sites (TFBS) located within DNA promoter regions of miRNAs were identified. • Specific transcription factors may serve as master regulators of environmentally-mediated microRNA expression.

  13. Environmental contaminants and microRNA regulation: Transcription factors as regulators of toxicant-altered microRNA expression

    International Nuclear Information System (INIS)

    Sollome, James; Martin, Elizabeth; Sethupathy, Praveen; Fry, Rebecca C.

    2016-01-01

    MicroRNAs (miRNAs) regulate gene expression by binding mRNA and inhibiting translation and/or inducing degradation of the associated transcripts. Expression levels of miRNAs have been shown to be altered in response to environmental toxicants, thus impacting cellular function and influencing disease risk. Transcription factors (TFs) are known to be altered in response to environmental toxicants and play a critical role in the regulation of miRNA expression. To date, environmentally-responsive TFs that are important for regulating miRNAs remain understudied. In a state-of-the-art analysis, we utilized an in silico bioinformatic approach to characterize potential transcriptional regulators of environmentally-responsive miRNAs. Using the miRStart database, genomic sequences of promoter regions for all available human miRNAs (n = 847) were identified and promoter regions were defined as − 1000/+500 base pairs from the transcription start site. Subsequently, the promoter region sequences of environmentally-responsive miRNAs (n = 128) were analyzed using enrichment analysis to determine overrepresented TF binding sites (TFBS). While most (56/73) TFs differed across environmental contaminants, a set of 17 TFs was enriched for promoter binding among miRNAs responsive to numerous environmental contaminants. Of these, one TF was common to miRNAs altered by the majority of environmental contaminants, namely SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily A, member 3 (SMARCA3). These identified TFs represent candidate common transcriptional regulators of miRNAs perturbed by environmental toxicants. - Highlights: • Transcription factors that regulate environmentally-modulated miRNA expression are understudied • Transcription factor binding sites (TFBS) located within DNA promoter regions of miRNAs were identified. • Specific transcription factors may serve as master regulators of environmentally-mediated microRNA expression

  14. Quantitative Proteomics Analysis Reveals Novel Insights into Mechanisms of Action of Long Noncoding RNA Hox Transcript Antisense Intergenic RNA (HOTAIR) in HeLa Cells*

    Science.gov (United States)

    Zheng, Peng; Xiong, Qian; Wu, Ying; Chen, Ying; Chen, Zhuo; Fleming, Joy; Gao, Ding; Bi, Lijun; Ge, Feng

    2015-01-01

    Long noncoding RNAs (lncRNAs), which have emerged in recent years as a new and crucial layer of gene regulators, regulate various biological processes such as carcinogenesis and metastasis. HOTAIR (Hox transcript antisense intergenic RNA), a lncRNA overexpressed in most human cancers, has been shown to be an oncogenic lncRNA. Here, we explored the role of HOTAIR in HeLa cells and searched for proteins regulated by HOTAIR. To understand the mechanism of action of HOTAIR from a systems perspective, we employed a quantitative proteomic strategy to systematically identify potential targets of HOTAIR. The expression of 170 proteins was significantly dys-regulated after inhibition of HOTAIR, implying that they could be potential targets of HOTAIR. Analysis of this data at the systems level revealed major changes in proteins involved in diverse cellular components, including the cytoskeleton and the respiratory chain. Further functional studies on vimentin (VIM), a key protein involved in the cytoskeleton, revealed that HOTAIR exerts its effects on migration and invasion of HeLa cells, at least in part, through the regulation of VIM expression. Inhibition of HOTAIR leads to mitochondrial dysfunction and ultrastructural alterations, suggesting a novel role of HOTAIR in maintaining mitochondrial function in cancer cells. Our results provide novel insights into the mechanisms underlying the function of HOTAIR in cancer cells. We expect that the methods used in this study will become an integral part of functional studies of lncRNAs. PMID:25762744

  15. Normalization of RNA-seq data using factor analysis of control genes or samples

    Science.gov (United States)

    Risso, Davide; Ngai, John; Speed, Terence P.; Dudoit, Sandrine

    2015-01-01

    Normalization of RNA-seq data has proven essential to ensure accurate inference of expression levels. Here we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more-complex unwanted effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more-accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple labs, technicians, and/or platforms. PMID:25150836

  16. Pyrosequencing data reveals tissue-specific expression of lineage-specific transcripts in chickpea

    OpenAIRE

    Garg, Rohini; Jain, Mukesh

    2011-01-01

    Chickpea is a very important crop legume plant, which provides a protein-rich supplement to cereal-based diets and has the ability to fix atmospheric nitrogen. Despite its economic importance, the functional genomic resources for chickpea are very limited. Recently, we reported the complete transcriptome of chickpea using next generation sequencing technologies. We analyzed the tissue-specific expression of chickpea transcripts based on RNA-seq data. In addition, we identified two sets of lin...

  17. HALO--a Java framework for precise transcript half-life determination.

    Science.gov (United States)

    Friedel, Caroline C; Kaufmann, Stefanie; Dölken, Lars; Zimmer, Ralf

    2010-05-01

    Recent improvements in experimental technologies now allow measurements of de novo transcription and/or RNA decay at whole transcriptome level and determination of precise transcript half-lives. Such transcript half-lives provide important insights into the regulation of biological processes and the relative contributions of RNA decay and de novo transcription to differential gene expression. In this article, we present HALO (Half-life Organizer), the first software for the precise determination of transcript half-lives from measurements of RNA de novo transcription or decay determined with microarrays or RNA-seq. In addition, methods for quality control, filtering and normalization are supplied. HALO provides a graphical user interface, command-line tools and a well-documented Java application programming interface (API). Thus, it can be used both by biologists to determine transcript half-lives fast and reliably with the provided user interfaces as well as software developers integrating transcript half-life analysis into other gene expression profiling pipelines. Source code, executables and documentation are available at http://www.bio.ifi.lmu.de/software/halo.

  18. Cytochrome c oxidase subunit 1-based human RNA quantification to enhance mRNA profiling in forensic biology

    Directory of Open Access Journals (Sweden)

    Dong Zhao

    2017-01-01

    Full Text Available RNA analysis offers many potential applications in forensic science, and molecular identification of body fluids by analysis of cell-specific RNA markers represents a new technique for use in forensic cases. However, due to the nature of forensic materials that often admixed with nonhuman cellular components, human-specific RNA quantification is required for the forensic RNA assays. Quantification assay for human RNA has been developed in the present study with respect to body fluid samples in forensic biology. The quantitative assay is based on real-time reverse transcription-polymerase chain reaction of mitochondrial RNA cytochrome c oxidase subunit I and capable of RNA quantification with high reproducibility and a wide dynamic range. The human RNA quantification improves the quality of mRNA profiling in the identification of body fluids of saliva and semen because the quantification assay can exclude the influence of nonhuman components and reduce the adverse affection from degraded RNA fragments.

  19. Accelerating the design of biomimetic materials by integrating RNA-seq with proteomics and materials science.

    Science.gov (United States)

    Guerette, Paul A; Hoon, Shawn; Seow, Yiqi; Raida, Manfred; Masic, Admir; Wong, Fong T; Ho, Vincent H B; Kong, Kiat Whye; Demirel, Melik C; Pena-Francesch, Abdon; Amini, Shahrouz; Tay, Gavin Z; Ding, Dawei; Miserez, Ali

    2013-10-01

    Efforts to engineer new materials inspired by biological structures are hampered by the lack of genomic data from many model organisms studied in biomimetic research. Here we show that biomimetic engineering can be accelerated by integrating high-throughput RNA-seq with proteomics and advanced materials characterization. This approach can be applied to a broad range of systems, as we illustrate by investigating diverse high-performance biological materials involved in embryo protection, adhesion and predation. In one example, we rapidly engineer recombinant squid sucker ring teeth proteins into a range of structural and functional materials, including nanopatterned surfaces and photo-cross-linked films that exceed the mechanical properties of most natural and synthetic polymers. Integrating RNA-seq with proteomics and materials science facilitates the molecular characterization of natural materials and the effective translation of their molecular designs into a wide range of bio-inspired materials.

  20. RNA-Seq using two populations reveals genes and alleles controlling wood traits and growth in Eucalyptus nitens.

    Directory of Open Access Journals (Sweden)

    Saravanan Thavamanikumar

    Full Text Available Eucalyptus nitens is a perennial forest tree species grown mainly for kraft pulp production in many parts of the world. Kraft pulp yield (KPY is a key determinant of plantation profitability and increasing the KPY of trees grown in plantations is a major breeding objective. To speed up the breeding process, molecular markers that can predict KPY are desirable. To achieve this goal, we carried out RNA-Seq studies on trees at extremes of KPY in two different trials to identify genes and alleles whose expression correlated with KPY. KPY is positively correlated with growth measured as diameter at breast height (DBH in both trials. In total, six RNA bulks from two treatments were sequenced on an Illumina HiSeq platform. At 5% false discovery rate level, 3953 transcripts showed differential expression in the same direction in both trials; 2551 (65% were down-regulated and 1402 (35% were up-regulated in low KPY samples. The genes up-regulated in low KPY trees were largely involved in biotic and abiotic stress response reflecting the low growth among low KPY trees. Genes down-regulated in low KPY trees mainly belonged to gene categories involved in wood formation and growth. Differential allelic expression was observed in 2103 SNPs (in 1068 genes and of these 640 SNPs (30% occurred in 313 unique genes that were also differentially expressed. These SNPs may represent the cis-acting regulatory variants that influence total gene expression. In addition we also identified 196 genes which had Ka/Ks ratios greater than 1.5, suggesting that these genes are under positive selection. Candidate genes and alleles identified in this study will provide a valuable resource for future association studies aimed at identifying molecular markers for KPY and growth.

  1. RNA synthetic biology inspired from bacteria: construction of transcription attenuators under antisense regulation.

    Science.gov (United States)

    Dawid, Alexandre; Cayrol, Bastien; Isambert, Hervé

    2009-07-01

    Among all biopolymers, ribonucleic acids or RNA have unique functional versatility, which led to the early suggestion that RNA alone (or a closely related biopolymer) might have once sustained a primitive form of life based on a single type of biopolymer. This has been supported by the demonstration of processive RNA-based replication and the discovery of 'riboswitches' or RNA switches, which directly sense their metabolic environment. In this paper, we further explore the plausibility of this 'RNA world' scenario and show, through synthetic molecular design guided by advanced RNA simulations, that RNA can also perform elementary regulation tasks on its own. We demonstrate that RNA synthetic regulatory modules directly inspired from bacterial transcription attenuators can efficiently activate or repress the expression of other RNA by merely controlling their folding paths 'on the fly' during transcription through simple RNA-RNA antisense interaction. Factors, such as NTP concentration and RNA synthesis rate, affecting the efficiency of this kinetic regulation mechanism are also studied and discussed in the light of evolutionary constraints. Overall, this suggests that direct coupling among synthesis, folding and regulation of RNAs may have enabled the early emergence of autonomous RNA-based regulation networks in absence of both DNA and protein partners.

  2. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D

    2016-10-01

    Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. An sRNA and Cold Shock Protein Homolog-Based Feedforward Loop Post-transcriptionally Controls Cell Cycle Master Regulator CtrA.

    Science.gov (United States)

    Robledo, Marta; Schlüter, Jan-Philip; Loehr, Lars O; Linne, Uwe; Albaum, Stefan P; Jiménez-Zurdo, José I; Becker, Anke

    2018-01-01

    Adjustment of cell cycle progression is crucial for bacterial survival and adaptation under adverse conditions. However, the understanding of modulation of cell cycle control in response to environmental changes is rather incomplete. In α-proteobacteria, the broadly conserved cell cycle master regulator CtrA underlies multiple levels of control, including coupling of cell cycle and cell differentiation. CtrA levels are known to be tightly controlled through diverse transcriptional and post-translational mechanisms. Here, small RNA (sRNA)-mediated post-transcriptional regulation is uncovered as an additional level of CtrA fine-tuning. Computational predictions as well as transcriptome and proteome studies consistently suggested targeting of ctrA and the putative cold shock chaperone cspA5 mRNAs by the trans- encoded sRNA ( trans- sRNA) GspR (formerly SmelC775) in several Sinorhizobium species. GspR strongly accumulated in the stationary growth phase, especially in minimal medium (MM) cultures. Lack of the gspR locus confers a fitness disadvantage in competition with the wild type, while its overproduction hampers cell growth, suggesting that this riboregulator interferes with cell cycle progression. An eGFP-based reporter in vivo assay, involving wild-type and mutant sRNA and mRNA pairs, experimentally confirmed GspR-dependent post-transcriptional down-regulation of ctrA and cspA5 expression, which most likely occurs through base-pairing to the respective mRNA. The energetically favored secondary structure of GspR is predicted to comprise three stem-loop domains, with stem-loop 1 and stem-loop 3 targeting ctrA and cspA5 mRNA, respectively. Moreover, this work reports evidence for post-transcriptional control of ctrA by CspA5. Thus, this regulation and GspR-mediated post-transcriptional repression of ctrA and cspA5 expression constitute a coherent feed-forward loop, which may enhance the negative effect of GspR on CtrA levels. This novel regulatory circuit involving

  4. First insight into the viral community of the cnidarian model metaorganism Aiptasia using RNA-Seq data

    KAUST Repository

    Brü wer, Jan D.; Voolstra, Christian R.

    2018-01-01

    of the globally threatened coral reef ecosystems. To gain first insight into viruses associated with the coral model system Aiptasia (sensu Exaiptasia pallida), we analyzed an existing RNA-Seq dataset of aposymbiotic, partially populated, and fully symbiotic

  5. RNA synthetic biology inspired from bacteria: construction of transcription attenuators under antisense regulation

    International Nuclear Information System (INIS)

    Dawid, Alexandre; Cayrol, Bastien; Isambert, Hervé

    2009-01-01

    Among all biopolymers, ribonucleic acids or RNA have unique functional versatility, which led to the early suggestion that RNA alone (or a closely related biopolymer) might have once sustained a primitive form of life based on a single type of biopolymer. This has been supported by the demonstration of processive RNA-based replication and the discovery of 'riboswitches' or RNA switches, which directly sense their metabolic environment. In this paper, we further explore the plausibility of this 'RNA world' scenario and show, through synthetic molecular design guided by advanced RNA simulations, that RNA can also perform elementary regulation tasks on its own. We demonstrate that RNA synthetic regulatory modules directly inspired from bacterial transcription attenuators can efficiently activate or repress the expression of other RNA by merely controlling their folding paths 'on the fly' during transcription through simple RNA–RNA antisense interaction. Factors, such as NTP concentration and RNA synthesis rate, affecting the efficiency of this kinetic regulation mechanism are also studied and discussed in the light of evolutionary constraints. Overall, this suggests that direct coupling among synthesis, folding and regulation of RNAs may have enabled the early emergence of autonomous RNA-based regulation networks in absence of both DNA and protein partners

  6. Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq.

    Science.gov (United States)

    Chang, Yiming K; Srivastava, Yogesh; Hu, Caizhen; Joyce, Adam; Yang, Xiaoxiao; Zuo, Zheng; Havranek, James J; Stormo, Gary D; Jauch, Ralf

    2017-01-25

    Cooperative binding of transcription factors is known to be important in the regulation of gene expression programs conferring cellular identities. However, current methods to measure cooperativity parameters have been laborious and therefore limited to studying only a few sequence variants at a time. We developed Coop-seq (cooperativity by sequencing) that is capable of efficiently and accurately determining the cooperativity parameters for hundreds of different DNA sequences in a single experiment. We apply Coop-seq to 12 dimer pairs from the Sox and POU families of transcription factors using 324 unique sequences with changed half-site orientation, altered spacing and discrete randomization within the binding elements. The study reveals specific dimerization profiles of different Sox factors with Oct4. By contrast, Oct4 and the three neural class III POU factors Brn2, Brn4 and Oct6 assemble with Sox2 in a surprisingly indistinguishable manner. Two novel half-site configurations can support functional Sox/Oct dimerization in addition to known composite motifs. Moreover, Coop-seq uncovers a nucleotide switch within the POU half-site when spacing is altered, which is mirrored in genomic loci bound by Sox2/Oct4 complexes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    Science.gov (United States)

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  8. RNA-seq analyses of blood-induced changes in gene expression in the mosquito vector species, Aedes aegypti

    Directory of Open Access Journals (Sweden)

    Olson Ken E

    2011-01-01

    Full Text Available Abstract Background Hematophagy is a common trait of insect vectors of disease. Extensive genome-wide transcriptional changes occur in mosquitoes after blood meals, and these are related to digestive and reproductive processes, among others. Studies of these changes are expected to reveal molecular targets for novel vector control and pathogen transmission-blocking strategies. The mosquito Aedes aegypti (Diptera, Culicidae, a vector of Dengue viruses, Yellow Fever Virus (YFV and Chikungunya virus (CV, is the subject of this study to look at genome-wide changes in gene expression following a blood meal. Results Transcriptional changes that follow a blood meal in Ae. aegypti females were explored using RNA-seq technology. Over 30% of more than 18,000 investigated transcripts accumulate differentially in mosquitoes at five hours after a blood meal when compared to those fed only on sugar. Forty transcripts accumulate only in blood-fed mosquitoes. The list of regulated transcripts correlates with an enhancement of digestive activity and a suppression of environmental stimuli perception and innate immunity. The alignment of more than 65 million high-quality short reads to the Ae. aegypti reference genome permitted the refinement of the current annotation of transcript boundaries, as well as the discovery of novel transcripts, exons and splicing variants. Cis-regulatory elements (CRE and cis-regulatory modules (CRM enriched significantly at the 5'end flanking sequences of blood meal-regulated genes were identified. Conclusions This study provides the first global view of the changes in transcript accumulation elicited by a blood meal in Ae. aegypti females. This information permitted the identification of classes of potentially co-regulated genes and a description of biochemical and physiological events that occur immediately after blood feeding. The data presented here serve as a basis for novel vector control and pathogen transmission

  9. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; referees: 5 approved

    Directory of Open Access Journals (Sweden)

    Yunshun Chen

    2016-08-01

    Full Text Available In recent years, RNA sequencing (RNA-seq has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  10. Comprehensive Identification and Spatial Mapping of Habenular Neuronal Types Using Single-Cell RNA-Seq.

    Science.gov (United States)

    Pandey, Shristi; Shekhar, Karthik; Regev, Aviv; Schier, Alexander F

    2018-04-02

    The identification of cell types and marker genes is critical for dissecting neural development and function, but the size and complexity of the brain has hindered the comprehensive discovery of cell types. We combined single-cell RNA-seq (scRNA-seq) with anatomical brain registration to create a comprehensive map of the zebrafish habenula, a conserved forebrain hub involved in pain processing and learning. Single-cell transcriptomes of ∼13,000 habenular cells with 4× cellular coverage identified 18 neuronal types and dozens of marker genes. Registration of marker genes onto a reference atlas created a resource for anatomical and functional studies and enabled the mapping of active neurons onto neuronal types following aversive stimuli. Strikingly, despite brain growth and functional maturation, cell types were retained between the larval and adult habenula. This study provides a gene expression atlas to dissect habenular development and function and offers a general framework for the comprehensive characterization of other brain regions. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. mRNA Transcript Diversity Creates New Opportunities for Pharmacological Intervention

    OpenAIRE

    Barrie, Elizabeth S.; Smith, Ryan M.; Sanford, Jonathan C.; Sadee, Wolfgang

    2012-01-01

    Most protein coding genes generate multiple RNA transcripts through alternative splicing, variable 3′ and 5′UTRs, and RNA editing. Although drug design typically targets the main transcript, alternative transcripts can have profound physiological effects, encoding proteins with distinct functions or regulatory properties. Formation of these alternative transcripts is tissue-selective and context-dependent, creating opportunities for more effective and targeted therapies with reduced adverse e...

  12. Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments.

    Science.gov (United States)

    Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R

    2017-11-10

    RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  13. RNA sequencing analysis of transcriptional change in the freshwater mussel Elliptio complanata after environmentally relevant sodium chloride exposure.

    Science.gov (United States)

    Robertson, Laura S; Galbraith, Heather S; Iwanowicz, Deborah; Blakeslee, Carrie J; Cornman, R Scott

    2017-09-01

    To identify potential biomarkers of salt stress in a freshwater sentinel species, we examined transcriptional responses of the common mussel Elliptio complanata to controlled sodium chloride (NaCl) exposures. Ribonucleic acid sequencing (RNA-Seq) of mantle tissue identified 481 transcripts differentially expressed in adult mussels exposed to 2 ppt NaCl (1.2 ppt chloride) for 7 d, of which 290 had nonoverlapping intervals. Differentially expressed gene categories included ion and transmembrane transport, oxidoreductase activity, maintenance of protein folding, and amino acid metabolism. The rate-limiting enzyme for synthesis of taurine, an amino acid frequently linked to osmotic stress in aquatic species, was upregulated, as was the transmembrane ion pump sodium/potassium adenosine 5'-triphosphatase. These patterns confirm a primary transcriptional response to the experimental dose, albeit likely overlapping with nonspecific secondary stress responses. Substantial involvement of the heat shock protein 70 chaperone family and the water-transporting aquaporin family was not detected, however, in contrast to some studies in other bivalves. A subset of the most significantly regulated genes was confirmed by quantitative polymerase chain reaction in an independent sample. Cluster analysis showed separation of mussels exposed to 2 ppt NaCl from control mussels in multivariate space, but mussels exposed to 1 ppt NaCl were largely indistinguishable from controls. Transcriptome-scale analysis of salt exposure under laboratory conditions efficiently identified candidate biomarkers for further functional analysis and field validation. Environ Toxicol Chem 2017;36:2352-2366. © Published 2017 Wiley Periodicals Inc. on behalf of SETAC. This article is a US government work and, as such, is in the public domain in the United States of America. © 2017 SETAC.

  14. Integration analysis of microRNA and mRNA paired expression profiling identifies deregulated microRNA-transcription factor-gene regulatory networks in ovarian endometriosis.

    Science.gov (United States)

    Zhao, Luyang; Gu, Chenglei; Ye, Mingxia; Zhang, Zhe; Li, Li'an; Fan, Wensheng; Meng, Yuanguang

    2018-01-22

    The etiology and pathophysiology of endometriosis remain unclear. Accumulating evidence suggests that aberrant microRNA (miRNA) and transcription factor (TF) expression may be involved in the pathogenesis and development of endometriosis. This study therefore aims to survey the key miRNAs, TFs and genes and further understand the mechanism of endometriosis. Paired expression profiling of miRNA and mRNA in ectopic endometria compared with eutopic endometria were determined by high-throughput sequencing techniques in eight patients with ovarian endometriosis. Binary interactions and circuits among the miRNAs, TFs, and corresponding genes were identified by the Pearson correlation coefficients. miRNA-TF-gene regulatory networks were constructed using bioinformatic methods. Eleven selected miRNAs and TFs were validated by quantitative reverse transcription-polymerase chain reaction in 22 patients. Overall, 107 differentially expressed miRNAs and 6112 differentially expressed mRNAs were identified by comparing the sequencing of the ectopic endometrium group and the eutopic endometrium group. The miRNA-TF-gene regulatory network consists of 22 miRNAs, 12 TFs and 430 corresponding genes. Specifically, some key regulators from the miR-449 and miR-34b/c cluster, miR-200 family, miR-106a-363 cluster, miR-182/183, FOX family, GATA family, and E2F family as well as CEBPA, SOX9 and HNF4A were suggested to play vital regulatory roles in the pathogenesis of endometriosis. Integration analysis of the miRNA and mRNA expression profiles presents a unique insight into the regulatory network of this enigmatic disorder and possibly provides clues regarding replacement therapy for endometriosis.

  15. FACT facilitates chromatin transcription by RNA polymerases I and III

    DEFF Research Database (Denmark)

    Birch, Joanna L; Tan, Bertrand C-M; Panov, Kostya I

    2009-01-01

    Efficient transcription elongation from a chromatin template requires RNA polymerases (Pols) to negotiate nucleosomes. Our biochemical analyses demonstrate that RNA Pol I can transcribe through nucleosome templates and that this requires structural rearrangement of the nucleosomal core particle....... The subunits of the histone chaperone FACT (facilitates chromatin transcription), SSRP1 and Spt16, co-purify and co-immunoprecipitate with mammalian Pol I complexes. In cells, SSRP1 is detectable at the rRNA gene repeats. Crucially, siRNA-mediated repression of FACT subunit expression in cells results...... in a significant reduction in 47S pre-rRNA levels, whereas synthesis of the first 40 nt of the rRNA is not affected, implying that FACT is important for Pol I transcription elongation through chromatin. FACT also associates with RNA Pol III complexes, is present at the chromatin of genes transcribed by Pol III...

  16. Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs.

    Directory of Open Access Journals (Sweden)

    Nicholas J Schurch

    Full Text Available The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3' untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3' polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1 gene and 3' UTR re-annotation (including extension of one 3' UTR by 5.9 kb; (2 disentangling of gene expression in complex regions; (3 clearer interpretation of small RNA expression and (4 identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.

  17. Matrin 3 binds and stabilizes mRNA.

    Directory of Open Access Journals (Sweden)

    Maayan Salton

    Full Text Available Matrin 3 (MATR3 is a highly conserved, inner nuclear matrix protein with two zinc finger domains and two RNA recognition motifs (RRM, whose function is largely unknown. Recently we found MATR3 to be phosphorylated by the protein kinase ATM, which activates the cellular response to double strand breaks in the DNA. Here, we show that MATR3 interacts in an RNA-dependent manner with several proteins with established roles in RNA processing, and maintains its interaction with RNA via its RRM2 domain. Deep sequencing of the bound RNA (RIP-seq identified several small noncoding RNA species. Using microarray analysis to explore MATR3's role in transcription, we identified 77 transcripts whose amounts depended on the presence of MATR3. We validated this finding with nine transcripts which were also bound to the MATR3 complex. Finally, we demonstrated the importance of MATR3 for maintaining the stability of several of these mRNA species and conclude that it has a role in mRNA stabilization. The data suggest that the cellular level of MATR3, known to be highly regulated, modulates the stability of a group of gene transcripts.

  18. RNA-Seq Reveals Infection-Related Gene Expression Changes in Phytophthora capsici

    Science.gov (United States)

    Chen, Xiao-Ren; Xing, Yu-Ping; Li, Yan-Peng; Tong, Yun-Hui; Xu, Jing-You

    2013-01-01

    Phytophthora capsici is a soilborne plant pathogen capable of infecting a wide range of plants, including many solanaceous crops. However, genetic resistance and fungicides often fail to manage P. capsici due to limited knowledge on the molecular biology and basis of P. capsici pathogenicity. To begin to rectify this situation, Illumina RNA-Seq was used to perform massively parallel sequencing of three cDNA samples derived from P. capsici mycelia (MY), zoospores (ZO) and germinating cysts with germ tubes (GC). Over 11 million reads were generated for each cDNA library analyzed. After read mapping to the gene models of P. capsici reference genome, 13,901, 14,633 and 14,695 putative genes were identified from the reads of the MY, ZO and GC libraries, respectively. Comparative analysis between two of samples showed major differences between the expressed gene content of MY, ZO and GC stages. A large number of genes associated with specific stages and pathogenicity were identified, including 98 predicted effector genes. The transcriptional levels of 19 effector genes during the developmental and host infection stages of P. capsici were validated by RT-PCR. Ectopic expression in Nicotiana benthamiana showed that P. capsici RXLR and Crinkler effectors can suppress host cell death triggered by diverse elicitors including P. capsici elicitin and NLP effectors. This study provides a first look at the transcriptome and effector arsenal of P. capsici during the important pre-infection stages. PMID:24019970

  19. Isolation of Blastomyces dermatitidis yeast from lung tissue during murine infection for in vivo transcriptional profiling.

    Science.gov (United States)

    Marty, Amber J; Wüthrich, Marcel; Carmen, John C; Sullivan, Thomas D; Klein, Bruce S; Cuomo, Christina A; Gauthier, Gregory M

    2013-07-01

    Blastomyces dermatitidis belongs to a group of thermally dimorphic fungi that grow as sporulating mold in the soil and convert to pathogenic yeast in the lung following inhalation of spores. Knowledge about the molecular events important for fungal adaptation and survival in the host remains limited. The development of high-throughput analytic tools such as RNA sequencing (RNA-Seq) has potential to provide novel insight on fungal pathogenesis especially if applied in vivo during infection. However, in vivo transcriptional profiling is hindered by the low abundance of fungal cells relative to mammalian tissue and difficulty in isolating fungal cells from the tissues they infect. For the purpose of obtaining B. dermatitidis RNA for in vivo transcriptional analysis by RNA-Seq, we developed a simple technique for isolating yeast from murine lung tissue. Using a two-step approach of filtration and centrifugation following lysis of murine lung cells, 91% of yeast cells causing infection were isolated from lung tissue. B. dermatitidis recovered from the lung yielded high-quality RNA with minimal murine contamination and was suitable for RNA-Seq. Approximately 87% of the sequencing reads obtained from the recovered yeast aligned with the B. dermatitidis genome. This was similar to 93% alignment for yeast grown in vitro. The use of near-freezing temperature along with short ex vivo time minimized transcriptional changes that would have otherwise occurred with higher temperature or longer processing time. In conclusion, we have developed a technique that recovers the majority of yeast causing pulmonary infection and yields high-quality fungal RNA with minimal contamination by mammalian RNA. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. RNA polymerase II collision interrupts convergent transcription

    DEFF Research Database (Denmark)

    Hobson, David J; Wei, Wu; Steinmetz, Lars M

    2012-01-01

    Antisense noncoding transcripts, genes-within-genes, and convergent gene pairs are prevalent among eukaryotes. The existence of such transcription units raises the question of what happens when RNA polymerase II (RNAPII) molecules collide head-to-head. Here we use a combination of biochemical...

  1. Transcription and translation of the rpsJ, rplN and rRNA operons of the tubercle bacillus.

    Science.gov (United States)

    Cortes, Teresa; Cox, Robert Ashley

    2015-04-01

    Several species of the genus Mycobacterium are human pathogens, notably the tubercle bacillus (Mycobacterium tuberculosis). The rate of proliferation of a bacterium is reflected in the rate of ribosome synthesis. This report describes a quantitative analysis of the early stages of the synthesis of ribosomes of M. tuberculosis. Specifically, the roles of three large operons, namely: the rrn operon (1.7 microns) encoding rrs (16S rRNA), rrl (23S rRNA) and rrf (5S rRNA); the rpsJ operon (1.93 microns), which encodes 11 ribosomal proteins; and the rplN operon (1.45 microns), which encodes 10 ribosomal proteins. A mathematical framework based on properties of population-average cells was developed to identify the number of transcripts of the rpsJ and rplN operons needed to maintain exponential growth. The values obtained were supported by RNaseq data. The motif 5'-gcagac-3' was found close to 5' end of transcripts of mycobacterial rplN operons, suggesting it may form part of the RpsH feedback binding site because the same motif is present in the ribosome within the region of rrs that forms the binding site for RpsH. Medical Research Council.

  2. An RNA-Seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants.

    Science.gov (United States)

    O'Rourke, Jamie A; Yang, S Samuel; Miller, Susan S; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W; Vance, Carroll P

    2013-02-01

    Phosphorus, in its orthophosphate form (P(i)), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to P(i) deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in P(i)-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to P(i) supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to P(i) deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to P(i) deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the P(i) status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in P(i) deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to P(i) deficiency.

  3. Development of mRNA-based body fluid identification using reverse transcription loop-mediated isothermal amplification.

    Science.gov (United States)

    Satoh, Tetsuya; Kouroki, Seiya; Ogawa, Keita; Tanaka, Yorika; Matsumura, Kazutoshi; Iwase, Susumu

    2018-04-25

    Identifying body fluids from forensic samples can provide valuable evidence for criminal investigations. Messenger RNA (mRNA)-based body fluid identification was recently developed, and highly sensitive parallel identification using reverse transcription polymerase chain reaction (RT-PCR) has been described. In this study, we developed reverse transcription loop-mediated isothermal amplification (RT-LAMP) as a simple, rapid assay for identifying three common forensic body fluids, namely blood, semen, and saliva, and evaluated its specificity and sensitivity. Hemoglobin beta (HBB), transglutaminase 4 (TGM4), and statherin (STATH) were selected as marker genes for blood, semen, and saliva, respectively. RT-LAMP could be performed in a single step including both reverse transcription and DNA amplification under an isothermal condition within 60 min, and detection could be conveniently performed via visual fluorescence. Marker-specific amplification was performed in each assay, and no cross-reaction was observed among five representative forensically relevant body fluids. The detection limits of the assays were 0.3 nL, 30 nL, and 0.3 μL for blood, semen, and saliva, respectively, and their sensitivities were comparable with those of RT-PCR. Furthermore, RT-LAMP assays were applicable to forensic casework samples. It is considered that RT-LAMP is useful for body fluid identification.

  4. RNA-Seq Analysis Reveals MAPKKK Family Members Related to Drought Tolerance in Maize

    Science.gov (United States)

    Ren, Wen; Yang, Fengling; He, Hang; Zhao, Jiuran

    2015-01-01

    The mitogen-activated protein kinase (MAPK) cascade is an evolutionarily conserved signal transduction pathway that is involved in plant development and stress responses. As the first component of this phosphorelay cascade, mitogen-activated protein kinase kinase kinases (MAPKKKs) act as adaptors linking upstream signaling steps to the core MAPK cascade to promote the appropriate cellular responses; however, the functions of MAPKKKs in maize are unclear. Here, we identified 71 MAPKKK genes, of which 14 were novel, based on a computational analysis of the maize (Zea mays L.) genome. Using an RNA-seq analysis in the leaf, stem and root of maize under well-watered and drought-stress conditions, we identified 5,866 differentially expressed genes (DEGs), including 8 MAPKKK genes responsive to drought stress. Many of the DEGs were enriched in processes such as drought stress, abiotic stimulus, oxidation-reduction, and metabolic processes. The other way round, DEGs involved in processes such as oxidation, photosynthesis, and starch, proline, ethylene, and salicylic acid metabolism were clearly co-expressed with the MAPKKK genes. Furthermore, a quantitative real-time PCR (qRT-PCR) analysis was performed to assess the relative expression levels of MAPKKKs. Correlation analysis revealed that there was a significant correlation between expression levels of two MAPKKKs and relative biomass responsive to drought in 8 inbred lines. Our results indicate that MAPKKKs may have important regulatory functions in drought tolerance in maize. PMID:26599013

  5. COBRA-Seq: Sensitive and Quantitative Methylome Profiling

    Directory of Open Access Journals (Sweden)

    Hilal Varinli

    2015-10-01

    Full Text Available Combined Bisulfite Restriction Analysis (COBRA quantifies DNA methylation at a specific locus. It does so via digestion of PCR amplicons produced from bisulfite-treated DNA, using a restriction enzyme that contains a cytosine within its recognition sequence, such as TaqI. Here, we introduce COBRA-seq, a genome wide reduced methylome method that requires minimal DNA input (0.1–1.0 mg and can either use PCR or linear amplification to amplify the sequencing library. Variants of COBRA-seq can be used to explore CpG-depleted as well as CpG-rich regions in vertebrate DNA. The choice of enzyme influences enrichment for specific genomic features, such as CpG-rich promoters and CpG islands, or enrichment for less CpG dense regions such as enhancers. COBRA-seq coupled with linear amplification has the additional advantage of reduced PCR bias by producing full length fragments at high abundance. Unlike other reduced representative methylome methods, COBRA-seq has great flexibility in the choice of enzyme and can be multiplexed and tuned, to reduce sequencing costs and to interrogate different numbers of sites. Moreover, COBRA-seq is applicable to non-model organisms without the reference genome and compatible with the investigation of non-CpG methylation by using restriction enzymes containing CpA, CpT, and CpC in their recognition site.

  6. iSmaRT: a toolkit for a comprehensive analysis of small RNA-Seq data.

    Science.gov (United States)

    Panero, Riccardo; Rinaldi, Antonio; Memoli, Domenico; Nassa, Giovanni; Ravo, Maria; Rizzo, Francesca; Tarallo, Roberta; Milanesi, Luciano; Weisz, Alessandro; Giurato, Giorgio

    2017-03-15

    The interest in investigating the biological roles of small non-coding RNAs (sncRNAs) is increasing, due to the pleiotropic effects of these molecules exert in many biological contexts. While several methods and tools are available to study microRNAs (miRNAs), only few focus on novel classes of sncRNAs, in particular PIWI-interacting RNAs (piRNAs). To overcome these limitations, we implemented iSmaRT ( i ntegrative Sm all R NA T ool-kit), an automated pipeline to analyze smallRNA-Seq data. iSmaRT is a collection of bioinformatics tools and own algorithms, interconnected through a Graphical User Interface (GUI). In addition to performing comprehensive analyses on miRNAs, it implements specific computational modules to analyze piRNAs, predicting novel ones and identifying their RNA targets. A smallRNA-Seq dataset generated from brain samples of Huntington's Disease patients was used here to illustrate iSmaRT performances, demonstrating how the pipeline can provide, in a rapid and user friendly way, a comprehensive analysis of different classes of sncRNAs. iSmaRT is freely available on the web at ftp://labmedmolge-1.unisa.it (User: iSmart - Password: password). aweisz@unisa.it or ggiurato@unisa.it. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Identification of Mediator Kinase Substrates in Human Cells using Cortistatin A and Quantitative Phosphoproteomics.

    Science.gov (United States)

    Poss, Zachary C; Ebmeier, Christopher C; Odell, Aaron T; Tangpeerachaikul, Anupong; Lee, Thomas; Pelish, Henry E; Shair, Matthew D; Dowell, Robin D; Old, William M; Taatjes, Dylan J

    2016-04-12

    Cortistatin A (CA) is a highly selective inhibitor of the Mediator kinases CDK8 and CDK19. Using CA, we now report a large-scale identification of Mediator kinase substrates in human cells (HCT116). We identified over 16,000 quantified phosphosites including 78 high-confidence Mediator kinase targets within 64 proteins, including DNA-binding transcription factors and proteins associated with chromatin, DNA repair, and RNA polymerase II. Although RNA-seq data correlated with Mediator kinase targets, the effects of CA on gene expression were limited and distinct from CDK8 or CDK19 knockdown. Quantitative proteome analyses, tracking around 7,000 proteins across six time points (0-24 hr), revealed that CA selectively affected pathways implicated in inflammation, growth, and metabolic regulation. Contrary to expectations, increased turnover of Mediator kinase targets was not generally observed. Collectively, these data support Mediator kinases as regulators of chromatin and RNA polymerase II activity and suggest their roles extend beyond transcription to metabolism and DNA repair. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  8. MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.

    Science.gov (United States)

    Lee, Sangseon; Park, Youngjune; Kim, Sun

    2017-07-15

    Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information. http://biohealth.snu.ac.kr/software/MIDAS/. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  9. A powerful method for transcriptional profiling of specific cell types in eukaryotes: laser-assisted microdissection and RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Marc W Schmid

    Full Text Available The acquisition of distinct cell fates is central to the development of multicellular organisms and is largely mediated by gene expression patterns specific to individual cells and tissues. A spatially and temporally resolved analysis of gene expression facilitates the elucidation of transcriptional networks linked to cellular identity and function. We present an approach that allows cell type-specific transcriptional profiling of distinct target cells, which are rare and difficult to access, with unprecedented sensitivity and resolution. We combined laser-assisted microdissection (LAM, linear amplification starting from <1 ng of total RNA, and RNA-sequencing (RNA-Seq. As a model we used the central cell of the Arabidopsis thaliana female gametophyte, one of the female gametes harbored in the reproductive organs of the flower. We estimated the number of expressed genes to be more than twice the number reported previously in a study using LAM and ATH1 microarrays, and identified several classes of genes that were systematically underrepresented in the transcriptome measured with the ATH1 microarray. Among them are many genes that are likely to be important for developmental processes and specific cellular functions. In addition, we identified several intergenic regions, which are likely to be transcribed, and describe a considerable fraction of reads mapping to introns and regions flanking annotated loci, which may represent alternative transcript isoforms. Finally, we performed a de novo assembly of the transcriptome and show that the method is suitable for studying individual cell types of organisms lacking reference sequence information, demonstrating that this approach can be applied to most eukaryotic organisms.

  10. Predicting stimulation-dependent enhancer-promoter interactions from ChIP-Seq time course data

    Directory of Open Access Journals (Sweden)

    Tomasz Dzida

    2017-09-01

    Full Text Available We have developed a machine learning approach to predict stimulation-dependent enhancer-promoter interactions using evidence from changes in genomic protein occupancy over time. The occupancy of estrogen receptor alpha (ERα, RNA polymerase (Pol II and histone marks H2AZ and H3K4me3 were measured over time using ChIP-Seq experiments in MCF7 cells stimulated with estrogen. A Bayesian classifier was developed which uses the correlation of temporal binding patterns at enhancers and promoters and genomic proximity as features to predict interactions. This method was trained using experimentally determined interactions from the same system and was shown to achieve much higher precision than predictions based on the genomic proximity of nearest ERα binding. We use the method to identify a genome-wide confident set of ERα target genes and their regulatory enhancers genome-wide. Validation with publicly available GRO-Seq data demonstrates that our predicted targets are much more likely to show early nascent transcription than predictions based on genomic ERα binding proximity alone.

  11. Error baseline rates of five sample preparation methods used to characterize RNA virus populations.

    Directory of Open Access Journals (Sweden)

    Jeffrey R Kugelman

    Full Text Available Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic "no amplification" method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a "targeted" amplification method, sequence-independent single-primer amplification (SISPA as a "random" amplification method, rolling circle reverse transcription sequencing (CirSeq as an advanced "no amplification" method, and Illumina TruSeq RNA Access as a "targeted" enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4-5 of all compared methods.

  12. Error baseline rates of five sample preparation methods used to characterize RNA virus populations

    Science.gov (United States)

    Kugelman, Jeffrey R.; Wiley, Michael R.; Nagle, Elyse R.; Reyes, Daniel; Pfeffer, Brad P.; Kuhn, Jens H.; Sanchez-Lockhart, Mariano; Palacios, Gustavo F.

    2017-01-01

    Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic “no amplification” method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a “targeted” amplification method, sequence-independent single-primer amplification (SISPA) as a “random” amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced “no amplification” method, and Illumina TruSeq RNA Access as a “targeted” enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4−5) of all compared methods. PMID:28182717

  13. An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

    Science.gov (United States)

    Xu, Maoqi; Chen, Liang

    2018-01-01

    The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Identification of targets of miR-200b by a SILAC-based quantitative proteomic approach

    Directory of Open Access Journals (Sweden)

    Arivusudar Marimuthu

    2014-09-01

    Full Text Available miRNAs regulate gene expression by binding to cognate mRNAs causing mRNA degradation or translational repression. Mass spectrometry-based proteomic analysis is being widely used to identify miRNA targets. The miR-200b miRNA cluster is often overexpressed in multiple cancer types, but the identity of the targets remains elusive. Using SILAC-based analysis, we examined the effects of overexpression of a miR-200b mimic or a control miRNA in fibrosarcoma cells. We identified around 300 potential targets of miR-200b based on a change in the expression of protein levels. We validated a subset of potential targets at the transcript level using quantitative PCR.

  15. Global RNA association with the transcriptionally active chromosome of chloroplasts.

    Science.gov (United States)

    Lehniger, Marie-Kristin; Finster, Sabrina; Melonek, Joanna; Oetke, Svenja; Krupinska, Karin; Schmitz-Linneweber, Christian

    2017-10-01

    Processed chloroplast RNAs are co-enriched with preparations of the chloroplast transcriptionally active chromosome. Chloroplast genomes are organized as a polyploid DNA-protein structure called the nucleoid. Transcriptionally active chloroplast DNA together with tightly bound protein factors can be purified by gel filtration as a functional entity called the transcriptionally active chromosome (TAC). Previous proteomics analyses of nucleoids and of TACs demonstrated a considerable overlap in protein composition including RNA binding proteins. Therefore the RNA content of TAC preparations from Nicotiana tabacum was determined using whole genome tiling arrays. A large number of chloroplast RNAs was found to be associated with the TAC. The pattern of RNAs attached to the TAC consists of RNAs produced by different chloroplast RNA polymerases and differs from the pattern of RNA found in input controls. An analysis of RNA splicing and RNA editing of selected RNA species demonstrated that TAC-associated RNAs are processed to a similar extent as the RNA in input controls. Thus, TAC fractions contain a specific subset of the processed chloroplast transcriptome.

  16. RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries.

    Science.gov (United States)

    Habegger, Lukas; Sboner, Andrea; Gianoulis, Tara A; Rozowsky, Joel; Agarwal, Ashish; Snyder, Michael; Gerstein, Mark

    2011-01-15

    The advent of next-generation sequencing for functional genomics has given rise to quantities of sequence information that are often so large that they are difficult to handle. Moreover, sequence reads from a specific individual can contain sufficient information to potentially identify and genetically characterize that person, raising privacy concerns. In order to address these issues, we have developed the Mapped Read Format (MRF), a compact data summary format for both short and long read alignments that enables the anonymization of confidential sequence information, while allowing one to still carry out many functional genomics studies. We have developed a suite of tools (RSEQtools) that use this format for the analysis of RNA-Seq experiments. These tools consist of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads and segmenting that signal into actively transcribed regions. Moreover, the tools can readily be used to build customizable RNA-Seq workflows. In addition to the anonymization afforded by MRF, this format also facilitates the decoupling of the alignment of reads from downstream analyses. RSEQtools is implemented in C and the source code is available at http://rseqtools.gersteinlab.org/.

  17. YY1 binding association with sex-biased transcription revealed through X-linked transcript levels and allelic binding analyses.

    Science.gov (United States)

    Chen, Chih-Yu; Shi, Wenqiang; Balaton, Bradley P; Matthews, Allison M; Li, Yifeng; Arenillas, David J; Mathelier, Anthony; Itoh, Masayoshi; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R R; Brown, Carolyn J; Wasserman, Wyeth W

    2016-11-18

    Sex differences in susceptibility and progression have been reported in numerous diseases. Female cells have two copies of the X chromosome with X-chromosome inactivation imparting mono-allelic gene silencing for dosage compensation. However, a subset of genes, named escapees, escape silencing and are transcribed bi-allelically resulting in sexual dimorphism. Here we conducted in silico analyses of the sexes using human datasets to gain perspectives into such regulation. We identified transcription start sites of escapees (escTSSs) based on higher transcription levels in female cells using FANTOM5 CAGE data. Significant over-representations of YY1 transcription factor binding motif and ChIP-seq peaks around escTSSs highlighted its positive association with escapees. Furthermore, YY1 occupancy is significantly biased towards the inactive X (Xi) at long non-coding RNA loci that are frequent contacts of Xi-specific superloops. Our study suggests a role for YY1 in transcriptional activity on Xi in general through sequence-specific binding, and its involvement at superloop anchors.

  18. Transcriptomic analysis of ‘Suli’ pear (Pyrus pyrifolia white pear group buds during the dormancy by RNA-Seq

    Directory of Open Access Journals (Sweden)

    Liu Guoqin

    2012-12-01

    Full Text Available Abstract Background Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy. Results We performed de novo transcriptome assembly and digital gene expression (DGE profiling analyses of ‘Suli’ pear (Pyrus pyrifolia white pear group using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp, including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1% unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR. Conclusions The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species.

  19. Involvement of DNA topoisomerase I in transcription of human ribosomal RNA genes

    International Nuclear Information System (INIS)

    Zhang, H.; Wang, J.C.; Liu, L.F.

    1988-01-01

    Treatment of HeLa cells with a DNA topoisomerase I-specific inhibitor, camptothecin, results in rapid cessation of the synthesis of the 45S rRNA precursor. The inhibition of rRNA synthesis is reversible following drug removal and correlates with the presence of camptothecin-trapped topoisomerase I-DNA abortive complexes, which can be detected as topoisomerase I-linked DNA breaks upon lysis with sodium dodecyl sulfate. These breaks were found to be concentrated within the transcribed region of human rRNA genes. No such sites can be detected in the inactive human rRNA genes in mouse-human hybrid cells, suggesting a preferential association of topoisomerase I with actively transcribed genes. The distribution of RNA polymerase molecules along the transcription unit of human rRNA genes in camptothecin-treated HeLa cells, as assayed by nuclear run-on transcription, shows a graded decrease of the RNA polymerase density toward the 3' end of the transcription unit; the density is minimally affected near the 5' start of the transcription unit. These results suggest that DNA topoisomerase I is normally involved in the elongation step of transcription, especially when the transcripts are long, and that camptothecin interferes with this role

  20. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Zichen Wang

    2016-07-01

    Full Text Available RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA and hierarchical clustering (HC plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV. In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at: http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and https://hub.docker.com/r/maayanlab/zika/.

  1. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    Science.gov (United States)

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis

  2. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Yunshun Chen

    2016-06-01

    Full Text Available In recent years, RNA sequencing (RNA-seq has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  3. Total rRNA-Seq Analysis Gives Insight into Bacterial, Fungal, Protozoal and Archaeal Communities in the Rumen Using an Optimized RNA Isolation Method

    Directory of Open Access Journals (Sweden)

    Chijioke O. Elekwachi

    2017-09-01

    Full Text Available Advances in high throughput, next generation sequencing technologies have allowed an in-depth examination of biological environments and phenomena, and are particularly useful for culture-independent microbial community studies. Recently the use of RNA for metatranscriptomic studies has been used to elucidate the role of active microbes in the environment. Extraction of RNA of appropriate quality is critical in these experiments and TRIzol reagent is often used for maintaining stability of RNA molecules during extraction. However, for studies using rumen content there is no consensus on (1 the amount of rumen digesta to use or (2 the amount of TRIzol reagent to be used in RNA extraction procedures. This study evaluated the effect of using various quantities of ground rumen digesta and of TRIzol reagent on the yield and quality of extracted RNA. It also investigated the possibility of using lower masses of solid-phase rumen digesta and lower amounts of TRIzol reagent than is used currently, for extraction of RNA for metatranscriptomic studies. We found that high quality RNA could be isolated from 2 g of ground rumen digesta sample, whilst using 0.6 g of ground matter for RNA extraction and using 3 mL (a 5:1 TRIzol : extraction mass ratio of TRIzol reagent. This represents a significant savings in the cost of RNA isolation. These lower masses and volumes were then applied in the RNA-Seq analysis of solid-phase rumen samples obtained from 6 Angus X Hereford beef heifers which had been fed a high forage diet (comprised of barley straw in a forage-to-concentrate ratio of 70:30 for 102 days. A bioinformatics analysis pipeline was developed in-house that generated relative abundance values of archaea, protozoa, fungi and bacteria in the rumen and also allowed the extraction of individual rRNA variable regions that could be analyzed in downstream molecular ecology programs. The average relative abundances of rRNA transcripts of archaea, bacteria

  4. Total rRNA-Seq Analysis Gives Insight into Bacterial, Fungal, Protozoal and Archaeal Communities in the Rumen Using an Optimized RNA Isolation Method.

    Science.gov (United States)

    Elekwachi, Chijioke O; Wang, Zuo; Wu, Xiaofeng; Rabee, Alaa; Forster, Robert J

    2017-01-01

    Advances in high throughput, next generation sequencing technologies have allowed an in-depth examination of biological environments and phenomena, and are particularly useful for culture-independent microbial community studies. Recently the use of RNA for metatranscriptomic studies has been used to elucidate the role of active microbes in the environment. Extraction of RNA of appropriate quality is critical in these experiments and TRIzol reagent is often used for maintaining stability of RNA molecules during extraction. However, for studies using rumen content there is no consensus on (1) the amount of rumen digesta to use or (2) the amount of TRIzol reagent to be used in RNA extraction procedures. This study evaluated the effect of using various quantities of ground rumen digesta and of TRIzol reagent on the yield and quality of extracted RNA. It also investigated the possibility of using lower masses of solid-phase rumen digesta and lower amounts of TRIzol reagent than is used currently, for extraction of RNA for metatranscriptomic studies. We found that high quality RNA could be isolated from 2 g of ground rumen digesta sample, whilst using 0.6 g of ground matter for RNA extraction and using 3 mL (a 5:1 TRIzol : extraction mass ratio) of TRIzol reagent. This represents a significant savings in the cost of RNA isolation. These lower masses and volumes were then applied in the RNA-Seq analysis of solid-phase rumen samples obtained from 6 Angus X Hereford beef heifers which had been fed a high forage diet (comprised of barley straw in a forage-to-concentrate ratio of 70:30) for 102 days. A bioinformatics analysis pipeline was developed in-house that generated relative abundance values of archaea, protozoa, fungi and bacteria in the rumen and also allowed the extraction of individual rRNA variable regions that could be analyzed in downstream molecular ecology programs. The average relative abundances of rRNA transcripts of archaea, bacteria, protozoa and fungi in

  5. Direct Regulation of tRNA and 5S rRNA Gene Transcription by Polo-like Kinase 1

    NARCIS (Netherlands)

    Fairley, Jennifer A.; Mitchell, Louise E.; Berg, Tracy; Kenneth, Niall S.; von Schubert, Conrad; Sillje, Herman H. W.; Medema, Rene H.; Nigg, Erich A.; White, Robert J.

    2012-01-01

    Polo-like kinase Plk1 controls numerous aspects of cell-cycle progression. We show that it associates with tRNA and 5S rRNA genes and regulates their transcription by RNA polymerase Ill (pol Ill) through direct binding and phosphorylation of transcription factor Brit During interphase, Plk1 promotes

  6. Transcriptomic characterization of soybean (Glycine max) roots in response to rhizobium infection by RNA sequencing

    International Nuclear Information System (INIS)

    He, Q.; Li, Z.; Wang, S.; Huang, S.; Yang, H.

    2018-01-01

    Legumes interacting with rhizobium to convert N2 into ammonia for plant use has attracted worldwide interest. However, the plant basal nitrogen fixation mechanisms induced in response to Rhizobium, giving differential gene expression of plants, have not yet been fully realized. The differential expressed genes of soybean between inoculated and mock-inoculated were analyzed by a RNA-Seq. The results of the sequencing were aligned against the Williams 82 genome sequence, which contain 55787 transcripts; 280 and 316 transcripts were found to be up- and down-regulated, respectively, for inoculated and mock-inoculated soybean roots at stage V1. Gene ontology (GO) analyses detected 104, 182 and 178 genes associated with the cell component category, molecular function category and biological process category, respectively. Pathway analysis revealed that 98 differentially expressed genes (115 transcripts) were involved in 169 biological pathways. We selected 19 differentially expressed genes and analyzed their expressions in mock-inoculated, inoculated USDA110 and CCBAU45436 using qRT-PCR. The results were in accordance with those obtained from rhizobia infected RNA-Seq data. These showed that the results of RNA-Seq had reliability and universality. Additionally, this study showed some novel genes associated with the nitrogen fixation process in comparison to previously identified QTLs. (author)

  7. RNA Seq analysis of the role of calcium chloride stress and electron transport in mitochondria for malachite green decolorization by Aspergillus niger.

    Science.gov (United States)

    Gomaa, Ola M; Selim, Nabila S; Wee, Josephine; Linz, John E

    2017-08-01

    Aspergillus niger was previously demonstrated to decolorize the commercial dye malachite green (MG) and this process was enhanced under calcium chloride (CaCl 2 ) treatment. Previous data also suggested that the decolorization process is related to mitochondrial cytochrome c. In the current work, we analyzed in depth the specific relationship between CaCl 2 treatment and MG decolorization. Gene expression analysis (RNA Seq) using Next Generation Sequencing (NGS) revealed up-regulation of 28 genes that are directly or indirectly associated with stress response functions as early as 30min of CaCl 2 treatment; these data further strengthen our previous findings that CaCl 2 treatment induces a stress response in A. niger which enhances the ability to decolorize MG. A significant increase in fluorescence observed by MitoTracker dye suggests that CaCl 2 treatment also increased mitochondrial membrane potential. Isolated mitochondrial membrane protein fractions obtained from A. niger grown under standard growth conditions decolorized MG in the presence of NADH and decolorization was enhanced in samples isolated from CaCl 2 -treated A. niger cultures. Treatment of whole mitochondrial fraction with KCN which inhibits electron transport by cytochrome c oxidase and Triton-X 100 which disrupts mitochondrial membrane integrity suggests that cyanide sensitive cytochrome c oxidase activity is a key biochemical step in MG decolorization. This suggestion was confirmed by the addition of palladium α-lipoic acid complex (PLAC) which resulted in an initial increase in decolorization. Although the role of cytochrome c and cytochrome c oxidase was confirmed at the biochemical level, changes in levels of transcripts encoding these enzymes after CaCl 2 treatment were not found to be statistically significant in RNA Seq analysis. These data suggest that the regulation of cytochrome c enzymes occur predominantly at the post-transcriptional level under CaCl 2 stress. Thus, using global

  8. RNA-Seq Analysis of Developing Pecan (Carya illinoinensis) Embryos Reveals Parallel Expression Patterns among Allergen and Lipid Metabolism Genes.

    Science.gov (United States)

    Mattison, Christopher P; Rai, Ruhi; Settlage, Robert E; Hinchliffe, Doug J; Madison, Crista; Bland, John M; Brashear, Suzanne; Graham, Charles J; Tarver, Matthew R; Florane, Christopher; Bechtel, Peter J

    2017-02-22

    The pecan nut is a nutrient-rich part of a healthy diet full of beneficial fatty acids and antioxidants, but can also cause allergic reactions in people suffering from food allergy to the nuts. The transcriptome of a developing pecan nut was characterized to identify the gene expression occurring during the process of nut development and to highlight those genes involved in fatty acid metabolism and those that commonly act as food allergens. Pecan samples were collected at several time points during the embryo development process including the water, gel, dough, and mature nut stages. Library preparation and sequencing were performed using Illumina-based mRNA HiSeq with RNA from four time points during the growing season during August and September 2012. Sequence analysis with Trinotate software following the Trinity protocol identified 133,000 unigenes with 52,267 named transcripts and 45,882 annotated genes. A total of 27,312 genes were defined by GO annotation. Gene expression clustering analysis identified 12 different gene expression profiles, each containing a number of genes. Three pecan seed storage proteins that commonly act as allergens, Car i 1, Car i 2, and Car i 4, were significantly up-regulated during the time course. Up-regulated fatty acid metabolism genes that were identified included acyl-[ACP] desaturase and omega-6 desaturase genes involved in oleic and linoleic acid metabolism. Notably, a few of the up-regulated acyl-[ACP] desaturase and omega-6 desaturase genes that were identified have expression patterns similar to the allergen genes based upon gene expression clustering and qPCR analysis. These findings suggest the possibility of coordinated accumulation of lipids and allergens during pecan nut embryogenesis.

  9. RNA-Seq Reveals Extensive Transcriptional Response to Heat Stress in the Stony Coral Galaxea fascicularis

    Science.gov (United States)

    Hou, Jing; Xu, Tao; Su, Dingjia; Wu, Ying; Cheng, Li; Wang, Jun; Zhou, Zhi; Wang, Yan

    2018-01-01

    Galaxea fascicularis, a stony coral belonging to family Oculinidae, is widely distributed in Red Sea, the Gulf of Aden and large areas of the Indo-Pacific oceans. So far there is a lack of gene expression knowledge concerning this massive coral. In the present study, G. fascicularis was subjected to heat stress at 32.0 ± 0.5°C in the lab, we found that the density of symbiotic zooxanthellae decreased significantly; meanwhile apparent bleaching and tissue lysing were observed at 10 h and 18 h after heat stress. The transcriptome responses were investigated in the stony coral G. fascicularis during heat bleaching using RNA-seq. A total of 42,028 coral genes were assembled from over 439 million reads. Gene expressions were compared at 10 and 18 h after heat stress. The significantly upregulated genes found in the Control_10h vs. Heat_10h comparison, presented mainly in GO terms related with DNA integration and unfolded protein response; and for the Control_18h vs. Heat_18h comparison, the GO terms include DNA integration. In addition, comparison between groups of Control_10h vs. Heat_10h and Control_18h vs. Heat_18h revealed that 125 genes were significantly upregulated in common between the two groups, whereas 21 genes were significantly downregulated in common, all these differentially expressed genes were found to be involved in stress response, DNA integration and unfolded protein response. Taken together, our results suggest that high temperature could activate the stress response at the early stage, and subsequently induce the bleaching and lysing through DNA integration and unfolded protein response, which are able to disrupt the balance of coral-zooxanthella symbiosis in the stony coral G. fascicularis. PMID:29487614

  10. RNA-Seq Reveals Extensive Transcriptional Response to Heat Stress in the Stony Coral Galaxea fascicularis

    Directory of Open Access Journals (Sweden)

    Jing Hou

    2018-02-01

    Full Text Available Galaxea fascicularis, a stony coral belonging to family Oculinidae, is widely distributed in Red Sea, the Gulf of Aden and large areas of the Indo-Pacific oceans. So far there is a lack of gene expression knowledge concerning this massive coral. In the present study, G. fascicularis was subjected to heat stress at 32.0 ± 0.5°C in the lab, we found that the density of symbiotic zooxanthellae decreased significantly; meanwhile apparent bleaching and tissue lysing were observed at 10 h and 18 h after heat stress. The transcriptome responses were investigated in the stony coral G. fascicularis during heat bleaching using RNA-seq. A total of 42,028 coral genes were assembled from over 439 million reads. Gene expressions were compared at 10 and 18 h after heat stress. The significantly upregulated genes found in the Control_10h vs. Heat_10h comparison, presented mainly in GO terms related with DNA integration and unfolded protein response; and for the Control_18h vs. Heat_18h comparison, the GO terms include DNA integration. In addition, comparison between groups of Control_10h vs. Heat_10h and Control_18h vs. Heat_18h revealed that 125 genes were significantly upregulated in common between the two groups, whereas 21 genes were significantly downregulated in common, all these differentially expressed genes were found to be involved in stress response, DNA integration and unfolded protein response. Taken together, our results suggest that high temperature could activate the stress response at the early stage, and subsequently induce the bleaching and lysing through DNA integration and unfolded protein response, which are able to disrupt the balance of coral-zooxanthella symbiosis in the stony coral G. fascicularis.

  11. A ChIP-Seq benchmark shows that sequence conservation mainly improves detection of strong transcription factor binding sites.

    Directory of Open Access Journals (Sweden)

    Tony Håndstad

    Full Text Available BACKGROUND: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. RESULTS: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. CONCLUSIONS: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.

  12. De novo assembly of the perennial ryegrass transcriptome using an RNA-Seq strategy.

    Directory of Open Access Journals (Sweden)

    Jacqueline D Farrell

    Full Text Available Perennial ryegrass is a highly heterozygous outbreeding grass species used for turf and forage production. Heterozygosity can affect de-Bruijn graph assembly making de novo transcriptome assembly of species such as perennial ryegrass challenging. Creating a reference transcriptome from a homozygous perennial ryegrass genotype can circumvent the challenge of heterozygosity. The goals of this study were to perform RNA-sequencing on multiple tissues from a highly inbred genotype to develop a reference transcriptome. This was complemented with RNA-sequencing of a highly heterozygous genotype for SNP calling.De novo transcriptome assembly of the inbred genotype created 185,833 transcripts with an average length of 830 base pairs. Within the inbred reference transcriptome 78,560 predicted open reading frames were found of which 24,434 were predicted as complete. Functional annotation found 50,890 transcripts with a BLASTp hit from the Swiss-Prot non-redundant database, 58,941 transcripts with a Pfam protein domain and 1,151 transcripts encoding putative secreted peptides. To evaluate the reference transcriptome we targeted the high-affinity K+ transporter gene family and found multiple orthologs. Using the longest unique open reading frames as the reference sequence, 64,242 single nucleotide polymorphisms were found. One thousand sixty one open reading frames from the inbred genotype contained heterozygous sites, confirming the high degree of homozygosity.Our study has developed an annotated, comprehensive transcriptome reference for perennial ryegrass that can aid in determining genetic variation, expression analysis, genome annotation, and gene mapping.

  13. Combining RNA-seq and proteomic profiling to identify seminal fluid proteins in the migratory grasshopper Melanoplus sanguinipes (F).

    Science.gov (United States)

    Bonilla, Martha L; Todd, Christopher; Erlandson, Martin; Andres, Jose

    2015-12-22

    Seminal fluid proteins control many aspects of fertilization and in turn, they play a key role in post-mating sexual selection and possibly reproductive isolation. Because effective proteome profiling relies on the availability of high-quality DNA reference databases, our knowledge of these proteins is still largely limited to model organisms with ample genetic resources. New advances in sequencing technology allow for the rapid characterization of transcriptomes at low cost. By combining high throughput RNA-seq and shotgun proteomic profiling, we have characterized the seminal fluid proteins secreted by the primary male accessory gland of the migratory grasshopper (Melanoplus sanguinipes), one of the main agricultural pests in central North America. Using RNA sequencing, we characterized the transcripts of ~ 8,100 genes expressed in the long hyaline tubules (LHT) of the accessory glands. Proteomic profiling identified 353 proteins expressed in the long hyaline tubules (LHT). Of special interest are seminal fluid proteins (SFPs), such as EJAC-SP, ACE and prostaglandin synthetases, which are known to regulate female oviposition in insects. Our study provides new insights into the proteomic components of male ejaculate in Orthopterans, and highlights several important patterns. First, the presence of proteins that lack predicted classical secretory tags in accessory gland proteomes is common in male accessory glands. Second, the products of a few highly expressed genes dominate the accessory gland secretions. Third, accessory gland transcriptomes are enriched for novel transcripts. Fourth, there is conservation of SFPs' functional classes across distantly related taxonomic groups with very different life histories, mating systems and sperm transferring mechanisms. The identified SFPs may serve as targets of future efforts to develop species- specific genetic control strategies.

  14. Transcriptome Analysis of Salicylic Acid Treatment in Rehmannia glutinosa Hairy Roots Using RNA-seq Technique for Identification of Genes Involved in Acteoside Biosynthesis

    Directory of Open Access Journals (Sweden)

    Fengqing Wang

    2017-05-01

    Full Text Available Rehmannia glutinosa is a common bulk medicinal material that has been widely used in China due to its active ingredients. Acteoside, one of the ingredients, has antioxidant, antinephritic, anti-inflammatory, hepatoprotective, immunomodulatory, and neuroprotective effects, is usually selected as a quality-control component for R. glutinosa herb in the Chinese Pharmacopeia. The acteoside biosynthesis pathway in R. glutinosa has not yet been clearly established. Herein, we describe the establishment of a genetic transformation system for R. glutinosa mediated by Agrobacterium rhizogenes. We screened the optimal elicitors that markedly increased acteoside accumulation in R. glutinosa hairy roots. We found that acteoside accumulation dramatically increased with the addition of salicylic acid (SA; the optimal SA dose was 25 μmol/L for hairy roots. RNA-seq was applied to analyze the transcriptomic changes in hairy roots treated with SA for 24 h in comparison with an untreated control. A total of 3,716, 4,018, and 2,715 differentially expressed transcripts (DETs were identified in 0 h-vs.-12 h, 0 h-vs.-24 h, and 12 h-vs.-24 h libraries, respectively. KEGG pathway-based analysis revealed that 127 DETs were enriched in “phenylpropanoid biosynthesis.” Of 219 putative unigenes involved in acteoside biosynthesis, 54 were found to be up-regulated at at least one of the time points after SA treatment. Selected candidate genes were analyzed by quantitative real-time PCR (qRT-PCR in hairy roots with SA, methyl jasmonate (MeJA, AgNO3 (Ag+, and putrescine (Put treatment. All genes investigated were up-regulated by SA treatment, and most candidate genes were weakly increased by MeJA to some degree. Furthermore, transcription abundance of eight candidate genes in tuberous roots of the high-acteoside-content (HA cultivar QH were higher than those of the low-acteoside-content (LA cultivar Wen 85-5. These results will pave the way for understanding the molecular

  15. SLAM-seq defines direct gene-regulatory functions of the BRD4-MYC axis.

    Science.gov (United States)

    Muhar, Matthias; Ebert, Anja; Neumann, Tobias; Umkehrer, Christian; Jude, Julian; Wieshofer, Corinna; Rescheneder, Philipp; Lipp, Jesse J; Herzog, Veronika A; Reichholf, Brian; Cisneros, David A; Hoffmann, Thomas; Schlapansky, Moritz F; Bhat, Pooja; von Haeseler, Arndt; Köcher, Thomas; Obenauf, Anna C; Popow, Johannes; Ameres, Stefan L; Zuber, Johannes

    2018-05-18

    Defining direct targets of transcription factors and regulatory pathways is key to understanding their roles in physiology and disease. We combined SLAM-seq [thiol(SH)-linked alkylation for the metabolic sequencing of RNA], a method for direct quantification of newly synthesized messenger RNAs (mRNAs), with pharmacological and chemical-genetic perturbation in order to define regulatory functions of two transcriptional hubs in cancer, BRD4 and MYC, and to interrogate direct responses to BET bromodomain inhibitors (BETis). We found that BRD4 acts as general coactivator of RNA polymerase II-dependent transcription, which is broadly repressed upon high-dose BETi treatment. At doses triggering selective effects in leukemia, BETis deregulate a small set of hypersensitive targets including MYC. In contrast to BRD4, MYC primarily acts as a selective transcriptional activator controlling metabolic processes such as ribosome biogenesis and de novo purine synthesis. Our study establishes a simple and scalable strategy to identify direct transcriptional targets of any gene or pathway. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  16. Methods for RNA Analysis

    DEFF Research Database (Denmark)

    Olivarius, Signe

    of the transcriptome, 5’ end capture of RNA is combined with next-generation sequencing for high-throughput quantitative assessment of transcription start sites by two different methods. The methods presented here allow for functional investigation of coding as well as noncoding RNA and contribute to future...... RNAs rely on interactions with proteins, the establishment of protein-binding profiles is essential for the characterization of RNAs. Aiming to facilitate RNA analysis, this thesis introduces proteomics- as well as transcriptomics-based methods for the functional characterization of RNA. First, RNA...

  17. Global effects of the CSR-1 RNA interference pathway on the transcriptional landscape.

    Science.gov (United States)

    Cecere, Germano; Hoersch, Sebastian; O'Keeffe, Sean; Sachidanandam, Ravi; Grishok, Alla

    2014-04-01

    Argonaute proteins and their small RNA cofactors short interfering RNAs are known to inhibit gene expression at the transcriptional and post-transcriptional levels. In Caenorhabditis elegans, the Argonaute CSR-1 binds thousands of endogenous siRNAs (endo-siRNAs) that are antisense to germline transcripts. However, its role in gene expression regulation remains controversial. Here we used genome-wide profiling of nascent RNA transcripts and found that the CSR-1 RNA interference pathway promoted sense-oriented RNA polymerase II transcription. Moreover, a loss of CSR-1 function resulted in global increase in antisense transcription and ectopic transcription of silent chromatin domains, which led to reduced chromatin incorporation of centromere-specific histone H3. On the basis of these findings, we propose that the CSR-1 pathway helps maintain the directionality of active transcription, thereby propagating the distinction between transcriptionally active and silent genomic regions.

  18. Evaluation of carrier-mediated siRNA delivery

    DEFF Research Database (Denmark)

    Colombo, Stefano; Nielsen, Hanne Mørck; Foged, Camilla

    2013-01-01

    RNA delivery. An in vitro cell culture model system expressing enhanced green fluorescent protein (EGFP) was used to develop the assay, which was based on the intracellular quantification of a full-length double-stranded Dicer substrate siRNA by stem-loop RT qPCR. The result is a well-documented protocol......RNA delivered by use of carriers remains an analytical challenge. The purpose of the present study was to optimize and validate an analytical protocol based on stem-loop reverse transcription quantitative polymerase chain reaction (RT qPCR) to quantitatively monitor the carrier-mediated intracellular si...

  19. RNA polymerase II transcriptional fidelity control and its functional interplay with DNA modifications

    Science.gov (United States)

    Xu, Liang; Wang, Wei; Chong, Jenny; Shin, Ji Hyun; Xu, Jun; Wang, Dong

    2016-01-01

    Accurate genetic information transfer is essential for life. As a key enzyme involved in the first step of gene expression, RNA polymerase II (Pol II) must maintain high transcriptional fidelity while it reads along DNA template and synthesizes RNA transcript in a stepwise manner during transcription elongation. DNA lesions or modifications may lead to significant changes in transcriptional fidelity or transcription elongation dynamics. In this review, we will summarize recent progress towards understanding the molecular basis of RNA Pol II transcriptional fidelity control and impacts of DNA lesions and modifications on Pol II transcription elongation. PMID:26392149

  20. MAGI: a Node.js web service for fast microRNA-Seq analysis in a GPU infrastructure

    OpenAIRE

    Kim, Jihoon; Levy, Eric; Ferbrache, Alex; Stepanowsky, Petra; Farcas, Claudiu; Wang, Shuang; Brunner, Stefan; Bath, Tyler; Wu, Yuan; Ohno-Machado, Lucila

    2014-01-01

    Summary: MAGI is a web service for fast MicroRNA-Seq data analysis in a graphics processing unit (GPU) infrastructure. Using just a browser, users have access to results as web reports in just a few hours—>600% end-to-end performance improvement over state of the art. MAGI’s salient features are (i) transfer of large input files in native FASTA with Qualities (FASTQ) format through drag-and-drop operations, (ii) rapid prediction of microRNA target genes leveraging parallel computing with GPU ...

  1. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection [version 2; referees: 2 approved, 1 approved with reservations

    Directory of Open Access Journals (Sweden)

    Laura Oikkonen

    2017-03-01

    Full Text Available Identifying variants from RNA-seq (transcriptome sequencing data is a cost-effective and versatile complement to whole-exome (WES and whole-genome sequencing (WGS analysis. RNA-seq (transcriptome sequencing is primarily considered a method of gene expression analysis but it can also be used to detect DNA variants in expressed regions of the genome. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.

  2. High SINE RNA Expression Correlates with Post-Transcriptional Downregulation of BRCA1

    Directory of Open Access Journals (Sweden)

    Giovanni Bosco

    2013-04-01

    Full Text Available Short Interspersed Nuclear Elements (SINEs are non-autonomous retrotransposons that comprise a large fraction of the human genome. SINEs are demethylated in human disease, but whether SINEs become transcriptionally induced and how the resulting transcripts may affect the expression of protein coding genes is unknown. Here, we show that downregulation of the mRNA of the tumor suppressor gene BRCA1 is associated with increased transcription of SINEs and production of sense and antisense SINE small RNAs. We find that BRCA1 mRNA is post-transcriptionally down-regulated in a Dicer and Drosha dependent manner and that expression of a SINE inverted repeat with sequence identity to a BRCA1 intron is sufficient for downregulation of BRCA1 mRNA. These observations suggest that transcriptional activation of SINEs could contribute to a novel mechanism of RNA mediated post-transcriptional silencing of human genes.

  3. PASSPORT-seq: A Novel High-Throughput Bioassay to Functionally Test Polymorphisms in Micro-RNA Target Sites

    Directory of Open Access Journals (Sweden)

    Joseph Ipe

    2018-06-01

    Full Text Available Next-generation sequencing (NGS studies have identified large numbers of genetic variants that are predicted to alter miRNA–mRNA interactions. We developed a novel high-throughput bioassay, PASSPORT-seq, that can functionally test in parallel 100s of these variants in miRNA binding sites (mirSNPs. The results are highly reproducible across both technical and biological replicates. The utility of the bioassay was demonstrated by testing 100 mirSNPs in HEK293, HepG2, and HeLa cells. The results of several of the variants were validated in all three cell lines using traditional individual luciferase assays. Fifty-five mirSNPs were functional in at least one of three cell lines (FDR ≤ 0.05; 11, 36, and 27 of them were functional in HEK293, HepG2, and HeLa cells, respectively. Only four of the variants were functional in all three cell lines, which demonstrates the cell-type specific effects of mirSNPs and the importance of testing the mirSNPs in multiple cell lines. Using PASSPORT-seq, we functionally tested 111 variants in the 3′ UTR of 17 pharmacogenes that are predicted to alter miRNA regulation. Thirty-three of the variants tested were functional in at least one cell line.

  4. Simultaneous transcriptional profiling of bacteria and their host cells.

    Directory of Open Access Journals (Sweden)

    Michael S Humphrys

    Full Text Available We developed an RNA-Seq-based method to simultaneously capture prokaryotic and eukaryotic expression profiles of cells infected with intracellular bacteria. As proof of principle, this method was applied to Chlamydia trachomatis-infected epithelial cell monolayers in vitro, successfully obtaining transcriptomes of both C. trachomatis and the host cells at 1 and 24 hours post-infection. Chlamydiae are obligate intracellular bacterial pathogens that cause a range of mammalian diseases. In humans chlamydiae are responsible for the most common sexually transmitted bacterial infections and trachoma (infectious blindness. Disease arises by adverse host inflammatory reactions that induce tissue damage & scarring. However, little is known about the mechanisms underlying these outcomes. Chlamydia are genetically intractable as replication outside of the host cell is not yet possible and there are no practical tools for routine genetic manipulation, making genome-scale approaches critical. The early timeframe of infection is poorly understood and the host transcriptional response to chlamydial infection is not well defined. Our simultaneous RNA-Seq method was applied to a simplified in vitro model of chlamydial infection. We discovered a possible chlamydial strategy for early iron acquisition, putative immune dampening effects of chlamydial infection on the host cell, and present a hypothesis for Chlamydia-induced fibrotic scarring through runaway positive feedback loops. In general, simultaneous RNA-Seq helps to reveal the complex interplay between invading bacterial pathogens and their host mammalian cells and is immediately applicable to any bacteria/host cell interaction.

  5. An RNA-Seq Transcriptome Analysis of Orthophosphate-Deficient White Lupin Reveals Novel Insights into Phosphorus Acclimation in Plants1[W][OA

    Science.gov (United States)

    O’Rourke, Jamie A.; Yang, S. Samuel; Miller, Susan S.; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W.; Vance, Carroll P.

    2013-01-01

    Phosphorus, in its orthophosphate form (Pi), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to Pi deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in Pi-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to Pi supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to Pi deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to Pi deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the Pi status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in Pi deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to Pi deficiency. PMID:23197803

  6. Genome-wide analysis of KAP1, the 7SK snRNP complex, and RNA polymerase II

    Directory of Open Access Journals (Sweden)

    Ryan P. McNamara

    2016-03-01

    Full Text Available The transition of RNA polymerase II (Pol II from transcription initiation into productive elongation in eukaryotic cells is regulated by the P-TEFb kinase, which phosphorylates the C-terminal domain of paused Pol II at promoter-proximal regions. Our recent study found that P-TEFb (in an inhibited state bound to the 7SK snRNP complex interacts with the KAP1/TRIM28 transcriptional regulator, and that KAP1 and the 7SK snRNP co-occupy most gene promoters containing paused Pol II. Here we provide a detailed experimental description and analysis of the ChIP-seq datasets that have been deposited into Gene Expression Omnibus (GEO: GS72622, so that independent groups can replicate and expand upon these findings. We propose these datasets would provide valuable information for researchers studying mechanisms of transcriptional regulation including Pol II pausing and pause release. Keywords: P-TEFb/7SK snRNP, KAP1, RNA polymerase II, ChIP-seq, Transcription elongation

  7. Quantitative correlation between promoter methylation and messenger RNA levels of the reduced folate carrier

    Directory of Open Access Journals (Sweden)

    Kheradpour Albert

    2008-05-01

    Full Text Available Abstract Background Methotrexate (MTX uptake is mediated by the reduced folate carrier (RFC. Defective drug uptake in association with decreased RFC expression is a common mechanism of MTX resistance in many tumor types. Heavy promoter methylation was previously identified as a basis for the complete silencing of RFC in MDA-MB-231 breast cancer cells, its role and prevalence in RFC transcription regulation are, however, not widely studied. Methods In the current study, RFC promoter methylation was assessed using methylation specific PCR in a panel of malignant cell lines (n = 8, including MDA-MB-231, and M805, a MTX resistant cell line directly established from the specimen of a patient with malignant fibrohistocytoma, whom received multiple doses of MTX. A quantitative approach of real-time PCR for measuring the extent of RFC promoter methylation was developed, and was validated by direct bisulfite genomic sequencing. RFC mRNA levels were determined by quantitative real-time RT-PCR and were related to the extent of promoter methylation in these cell lines. Results A partial promoter methylation and RFC mRNA down-regulation were observed in M805. Using the quantitative approach, a reverse correlation (correlation coefficient = -0.59, p Conclusion This study further suggests that promoter methylation is a potential basis for MTX resistance. The quantitative correlation identified in this study implies that promoter methylation is possibly a mechanism involved in the fine regulation of RFC transcription.

  8. Genome wide predictions of miRNA regulation by transcription factors.

    Science.gov (United States)

    Ruffalo, Matthew; Bar-Joseph, Ziv

    2016-09-01

    Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ zivbj@cs.cmu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. The Orthology Clause in the Next Generation Sequencing Era: Novel Reference Genes Identified by RNA-seq in Humans Improve Normalization of Neonatal Equine Ovary RT-qPCR Data.

    Directory of Open Access Journals (Sweden)

    Dragos Scarlet

    Full Text Available Vertebrate evolution is accompanied by a substantial conservation of transcriptional programs with more than a third of unique orthologous genes showing constrained levels of expression. Moreover, there are genes and exons exhibiting excellent expression stability according to RNA-seq data across a panel of eighteen tissues including the ovary (Human Body Map 2.0.We hypothesized that orthologs of these exons would also be highly uniformly expressed across neonatal ovaries of the horse, which would render them appropriate reference genes (RGs for normalization of reverse transcription quantitative PCR (RT-qPCR data in this context. The expression stability of eleven novel RGs (C1orf43, CHMP2A, EMC7, GPI, PSMB2, PSMB4, RAB7A, REEP5, SNRPD3, VCP and VPS29 was assessed by RT-qPCR in ovaries of seven neonatal fillies and compared to that of the expressed repetitive element ERE-B, two universal (OAZ1 and RPS29 and four traditional RGs (ACTB, GAPDH, UBB and B2M. Expression stability analyzed with the software tool RefFinder top ranked the normalization factor constituted of the genes SNRPD3 and VCP, a gene pair that is not co-expressed according to COEXPRESdb and GeneMANIA. The traditional RGs GAPDH, B2M, ACTB and UBB were only ranked 3rd and 12th to 14th, respectively.The functional diversity of the novel RGs likely facilitates expression studies over a wide range of physiological and pathological contexts related to the neonatal equine ovary. In addition, this study augments the potential for RT-qPCR-based profiling of human samples by introducing seven new human RG assays (C1orf43, CHMP2A, EMC7, GPI, RAB7A, VPS29 and UBB.

  10. Uncovering layers of human RNA polymerase II transcription

    DEFF Research Database (Denmark)

    Jensen, Torben Heick

    In recent years DNA microarray and high-throughput sequencing technologies have challenged the “gene-centric” view that pre-mRNA is the only RNA species transcribed off protein-coding genes. Instead unorthodox transcription from within genic- and intergenic regions has been demonstrated to occur...

  11. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    Science.gov (United States)

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  12. Predictive modelling of gene expression from transcriptional regulatory elements.

    Science.gov (United States)

    Budden, David M; Hurley, Daniel G; Crampin, Edmund J

    2015-07-01

    Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited by TFs and HMs. We then construct log-linear and ϵ-support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs. Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  13. Real-time observation of the initiation of RNA polymerase II transcription.

    Science.gov (United States)

    Fazal, Furqan M; Meng, Cong A; Murakami, Kenji; Kornberg, Roger D; Block, Steven M

    2015-09-10

    Biochemical and structural studies have shown that the initiation of RNA polymerase II transcription proceeds in the following stages: assembly of the polymerase with general transcription factors and promoter DNA in a 'closed' preinitiation complex (PIC); unwinding of about 15 base pairs of the promoter DNA to form an 'open' complex; scanning downstream to a transcription start site; synthesis of a short transcript, thought to be about 10 nucleotides long; and promoter escape. Here we have assembled a 32-protein, 1.5-megadalton PIC derived from Saccharomyces cerevisiae, and observe subsequent initiation processes in real time with optical tweezers. Contrary to expectation, scanning driven by the transcription factor IIH involved the rapid opening of an extended transcription bubble, averaging 85 base pairs, accompanied by the synthesis of a transcript up to the entire length of the extended bubble, followed by promoter escape. PICs that failed to achieve promoter escape nevertheless formed open complexes and extended bubbles, which collapsed back to closed or open complexes, resulting in repeated futile scanning.

  14. RNA-sequencing-based transcriptome and biochemical analyses of steroidal saponin pathway in a complete set of Allium fistulosum—A. cepa monosomic addition lines

    Science.gov (United States)

    Abdelrahman, Mostafa; El-Sayed, Magdi; Sato, Shusei; Hirakawa, Hideki; Ito, Shin-ichi; Tanaka, Keisuke; Mine, Yoko; Sugiyama, Nobuo; Suzuki, Minoru; Yamauchi, Naoki

    2017-01-01

    The genus Allium is a rich source of steroidal saponins, and its medicinal properties have been attributed to these bioactive compounds. The saponin compounds with diverse structures play a pivotal role in Allium’s defense mechanism. Despite numerous studies on the occurrence and chemical structure of steroidal saponins, their biosynthetic pathway in Allium species is poorly understood. The monosomic addition lines (MALs) of the Japanese bunching onion (A. fistulosum, FF) with an extra chromosome from the shallot (A. cepa Aggregatum group, AA) are powerful genetic resources that enable us to understand many physiological traits of Allium. In the present study, we were able to isolate and identify Alliospiroside A saponin compound in A. fistulosum with extra chromosome 2A from shallot (FF2A) and its role in the defense mechanism against Fusarium pathogens. Furthermore, to gain molecular insight into the Allium saponin biosynthesis pathway, high-throughput RNA-Seq of the root, bulb, and leaf of AA, MALs, and FF was carried out using Illumina's HiSeq 2500 platform. An open access Allium Transcript Database (Allium TDB, http://alliumtdb.kazusa.or.jp) was generated based on RNA-Seq data. The resulting assembled transcripts were functionally annotated, revealing 50 unigenes involved in saponin biosynthesis. Differential gene expression (DGE) analyses of AA and MALs as compared with FF (as a control) revealed a strong up-regulation of the saponin downstream pathway, including cytochrome P450, glycosyltransferase, and beta-glucosidase in chromosome 2A. An understanding of the saponin compounds and biosynthesis-related genes would facilitate the development of plants with unique saponin content and, subsequently, improved disease resistance. PMID:28800607

  15. RNA-sequencing-based transcriptome and biochemical analyses of steroidal saponin pathway in a complete set of Allium fistulosum-A. cepa monosomic addition lines.

    Science.gov (United States)

    Abdelrahman, Mostafa; El-Sayed, Magdi; Sato, Shusei; Hirakawa, Hideki; Ito, Shin-Ichi; Tanaka, Keisuke; Mine, Yoko; Sugiyama, Nobuo; Suzuki, Yutaka; Yamauchi, Naoki; Shigyo, Masayoshi

    2017-01-01

    The genus Allium is a rich source of steroidal saponins, and its medicinal properties have been attributed to these bioactive compounds. The saponin compounds with diverse structures play a pivotal role in Allium's defense mechanism. Despite numerous studies on the occurrence and chemical structure of steroidal saponins, their biosynthetic pathway in Allium species is poorly understood. The monosomic addition lines (MALs) of the Japanese bunching onion (A. fistulosum, FF) with an extra chromosome from the shallot (A. cepa Aggregatum group, AA) are powerful genetic resources that enable us to understand many physiological traits of Allium. In the present study, we were able to isolate and identify Alliospiroside A saponin compound in A. fistulosum with extra chromosome 2A from shallot (FF2A) and its role in the defense mechanism against Fusarium pathogens. Furthermore, to gain molecular insight into the Allium saponin biosynthesis pathway, high-throughput RNA-Seq of the root, bulb, and leaf of AA, MALs, and FF was carried out using Illumina's HiSeq 2500 platform. An open access Allium Transcript Database (Allium TDB, http://alliumtdb.kazusa.or.jp) was generated based on RNA-Seq data. The resulting assembled transcripts were functionally annotated, revealing 50 unigenes involved in saponin biosynthesis. Differential gene expression (DGE) analyses of AA and MALs as compared with FF (as a control) revealed a strong up-regulation of the saponin downstream pathway, including cytochrome P450, glycosyltransferase, and beta-glucosidase in chromosome 2A. An understanding of the saponin compounds and biosynthesis-related genes would facilitate the development of plants with unique saponin content and, subsequently, improved disease resistance.

  16. Omic personality: implications of stable transcript and methylation profiles for personalized medicine.

    Science.gov (United States)

    Tabassum, Rubina; Sivadas, Ambily; Agrawal, Vartika; Tian, Haozheng; Arafat, Dalia; Gibson, Greg

    2015-08-13

    Personalized medicine is predicated on the notion that individual biochemical and genomic profiles are relatively constant in times of good health and to some extent predictive of disease or therapeutic response. We report a pilot study quantifying gene expression and methylation profile consistency over time, addressing the reasons for individual uniqueness, and its relation to N = 1 phenotypes. Whole blood samples from four African American women, four Caucasian women, and four Caucasian men drawn from the Atlanta Center for Health Discovery and Well Being study at three successive 6-month intervals were profiled by RNA-Seq, miRNA-Seq, and Illumina Methylation 450 K arrays. Standard regression approaches were used to evaluate the proportion of variance for each type of omic measure among individuals, and to quantify correlations among measures and with clinical attributes related to wellness. Longitudinal omic profiles were in general highly consistent over time, with an average of 67 % variance in transcript abundance, 42 % in CpG methylation level (but 88 % for the most differentiated CpG per gene), and 50 % in miRNA abundance among individuals, which are all comparable to 74 % variance among individuals for 74 clinical traits. One third of the variance could be attributed to differential blood cell type abundance, which was also fairly stable over time, and a lesser amount to expression quantitative trait loci (eQTL) effects. Seven conserved axes of covariance that capture diverse aspects of immune function explained over half of the variance. These axes also explained a considerable proportion of individually extreme transcript abundance, namely approximately 100 genes that were significantly up-regulated or down-regulated in each person and were in some cases enriched for relevant gene activities that plausibly associate with clinical attributes. A similar fraction of genes had individually divergent methylation levels, but these did not overlap with the

  17. Structure of noncoding RNA is a determinant of function of RNA binding proteins in transcriptional regulation

    Directory of Open Access Journals (Sweden)

    Oyoshi Takanori

    2012-01-01

    Full Text Available Abstract The majority of the noncoding regions of mammalian genomes have been found to be transcribed to generate noncoding RNAs (ncRNAs, resulting in intense interest in their biological roles. During the past decade, numerous ncRNAs and aptamers have been identified as regulators of transcription. 6S RNA, first described as a ncRNA in E. coli, mimics an open promoter structure, which has a large bulge with two hairpin/stalk structures that regulate transcription through interactions with RNA polymerase. B2 RNA, which has stem-loops and unstructured single-stranded regions, represses transcription of mRNA in response to various stresses, including heat shock in mouse cells. The interaction of TLS (translocated in liposarcoma with CBP/p300 was induced by ncRNAs that bind to TLS, and this in turn results in inhibition of CBP/p300 histone acetyltransferase (HAT activity in human cells. Transcription regulator EWS (Ewing's sarcoma, which is highly related to TLS, and TLS specifically bind to G-quadruplex structures in vitro. The carboxy terminus containing the Arg-Gly-Gly (RGG repeat domains in these proteins are necessary for cis-repression of transcription activation and HAT activity by the N-terminal glutamine-rich domain. Especially, the RGG domain in the carboxy terminus of EWS is important for the G-quadruplex specific binding. Together, these data suggest that functions of EWS and TLS are modulated by specific structures of ncRNAs.

  18. MicroRNA-dependent regulation of transcription in non-small cell lung cancer.

    Directory of Open Access Journals (Sweden)

    Sonia Molina-Pinelo

    Full Text Available Squamous cell lung cancer (SCC and adenocarcinoma are the most common histological subtypes of non-small cell lung cancer (NSCLC, and have been traditionally managed in the clinic as a single entity. Increasing evidence, however, illustrates the biological diversity of these two histological subgroups of lung cancer, and supports the need to improve our understanding of the molecular basis beyond the different phenotypes if we aim to develop more specific and individualized targeted therapy. The purpose of this study was to identify microRNA (miRNA-dependent transcriptional regulation differences between SCC and adenocarcinoma histological lung cancer subtypes. In this work, paired miRNA (667 miRNAs by TaqMan Low Density Arrays (TLDA and mRNA profiling (Whole Genome 44 K array G112A, Agilent was performed in tumor samples of 44 NSCLC patients. Nine miRNAs and 56 mRNAs were found to be differentially expressed in SCC versus adenocarcinoma samples. Eleven of these 56 mRNA were predicted as targets of the miRNAs identified to be differently expressed in these two histological conditions. Of them, 6 miRNAs (miR-149, miR-205, miR-375, miR-378, miR-422a and miR-708 and 9 target genes (CEACAM6, CGN, CLDN3, ABCC3, MLPH, ACSL5, TMEM45B, MUC1 were validated by quantitative PCR in an independent cohort of 41 lung cancer patients. Furthermore, the inverse correlation between mRNAs and microRNAs expression was also validated. These results suggest miRNA-dependent transcriptional regulation differences play an important role in determining key hallmarks of NSCLC, and may provide new biomarkers for personalized treatment strategies.

  19. Transcriptome analysis of Acidovorax avenae subsp. avenae cultivated in vivo and co-culture with Burkholderia seminalis.

    Science.gov (United States)

    Li, Bin; Ibrahim, Muhammad; Ge, Mengyu; Cui, Zhouqi; Sun, Guochang; Xu, Fei; Kube, Michael

    2014-07-16

    Response of bacterial pathogen to environmental bacteria and its host is critical for understanding of microbial adaption and pathogenesis. Here, we used RNA-Seq to comprehensively and quantitatively assess the transcriptional response of Acidovorax avenae subsp. avenae strain RS-1 cultivated in vitro, in vivo and in co-culture with rice rhizobacterium Burkholderia seminalis R456. Results revealed a slight response to other bacteria, but a strong response to host. In particular, a large number of virulence associated genes encoding Type I to VI secretion systems, 118 putative non-coding RNAs, and 7 genomic islands (GIs) were differentially expressed in vivo based on comparative genomic and transcriptomic analyses. Furthermore, the loss of virulence for knockout mutants of 11 differentially expressed T6SS genes emphasized the importance of these genes in bacterial pathogenicity. In addition, the reliability of expression data obtained by RNA-Seq was supported by quantitative real-time PCR of the 25 selected T6SS genes. Overall, this study highlighted the role of differentially expressed genes in elucidating bacterial pathogenesis based on combined analysis of RNA-Seq data and knockout of T6SS genes.

  20. Understanding Strategy of Nitrate and Urea Assimilation in a Chinese Strain of Aureococcus anophagefferens through RNA-Seq Analysis

    Science.gov (United States)

    Dong, Hong-Po; Huang, Kai-Xuan; Wang, Hua-Long; Lu, Song-Hui; Cen, Jing-Yi; Dong, Yue-Lei

    2014-01-01

    Aureococcus anophagefferens is a harmful alga that dominates plankton communities during brown tides in North America, Africa, and Asia. Here, RNA-seq technology was used to profile the transcriptome of a Chinese strain of A. anophagefferens that was grown on urea, nitrate, and a mixture of urea and nitrate, and that was under N-replete, limited and recovery conditions to understand the molecular mechanisms that underlie nitrate and urea utilization. The number of differentially expressed genes between urea-grown and mixture N-grown cells were much less than those between urea-grown and nitrate-grown cells. Compared with nitrate-grown cells, mixture N-grown cells contained much lower levels of transcripts encoding proteins that are involved in nitrate transport and assimilation. Together with profiles of nutrient changes in media, these results suggest that A. anophagefferens primarily feeds on urea instead of nitrate when urea and nitrate co-exist. Furthermore, we noted that transcripts upregulated by nitrate and N-limitation included those encoding proteins involved in amino acid and nucleotide transport, degradation of amides and cyanates, and nitrate assimilation pathway. The data suggest that A. anophagefferens possesses an ability to utilize a variety of dissolved organic nitrogen. Moreover, transcripts for synthesis of proteins, glutamate-derived amino acids, spermines and sterols were upregulated by urea. Transcripts encoding key enzymes that are involved in the ornithine-urea and TCA cycles were differentially regulated by urea and nitrogen concentration, which suggests that the OUC may be linked to the TCA cycle and involved in reallocation of intracellular carbon and nitrogen. These genes regulated by urea may be crucial for the rapid proliferation of A. anophagefferens when urea is provided as the N source. PMID:25338000

  1. Drosophila Imp iCLIP identifies an RNA assemblage coordinating F-actin formation

    DEFF Research Database (Denmark)

    Hansen, Heidi Theil; Rasmussen, Simon Horskjær; Adolph, Sidsel Kramshøj

    2015-01-01

    BACKGROUND: Post-transcriptional RNA regulons ensure co-ordinated expression of monocistronic mRNAs encoding functionally related proteins. In this study, we employ a combination of RIP-seq and short- and long-wave individual-nucleotide resolution crosslinking and immunoprecipitation (iCLIP...

  2. The Mediator Complex: At the Nexus of RNA Polymerase II Transcription.

    Science.gov (United States)

    Jeronimo, Célia; Robert, François

    2017-10-01

    Mediator is an essential, large, multisubunit, transcriptional co-activator highly conserved across eukaryotes. Mediator interacts with gene-specific transcription factors at enhancers as well as with the RNA polymerase II (RNAPII) transcription machinery bound at promoters. It also interacts with several other factors involved in various aspects of transcription, chromatin regulation, and mRNA processing. Hence, Mediator is at the nexus of RNAPII transcription, regulating its many steps and connecting transcription with co-transcriptional events. To achieve this flexible role, Mediator, which is divided into several functional modules, reorganizes its conformation and composition while making transient contacts with other components. Here, we review the mechanisms of action of Mediator and propose a unifying model for its function. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Boiler: lossy compression of RNA-seq alignments using coverage vectors.

    Science.gov (United States)

    Pritt, Jacob; Langmead, Ben

    2016-09-19

    We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Semi-Nested Real-Time Reverse Transcription Polymerase Chain Reaction Methods for the Successful Quantitation of Cytokeratin mRNA Expression Levels for the Subtyping of Non-Small-Cell Lung Carcinoma Using Paraffin-Embedded and Microdissected Lung Biopsy Specimens

    International Nuclear Information System (INIS)

    Nakanishi, Yoko; Shimizu, Tetsuo; Tsujino, Ichiro; Obana, Yukari; Seki, Toshimi; Fuchinoue, Fumi; Ohni, Sumie; Oinuma, Toshinori; Kusumi, Yoshiaki; Yamada, Tsutomu; Takahashi, Noriaki; Hashimoto, Shu; Nemoto, Norimichi

    2013-01-01

    In patients with inoperable advanced non-small cell lung carcinomas (NSCLCs), histological subtyping using small-mount biopsy specimens was often required to decide the indications for drug treatment. The aim of this study was to assess the utility of highly sensitive mRNA quantitation for the subtyping of advanced NSCLC using small formalin fixing and paraffin embedding (FFPE) biopsy samples. Cytokeratin (CK) 6, CK7, CK14, CK18, and thyroid transcription factor (TTF)-1 mRNA expression levels were measured using semi-nested real-time quantitative (snq) reverse-transcribed polymerase chain reaction (RT-PCR) in microdissected tumor cells collected from 52 lung biopsies. Our results using the present snqRT-PCR method showed an improvement in mRNA quantitation from small FFPE samples, and the mRNA expression level using snqRT-PCR was correlated with the immunohistochemical protein expression level. CK7, CK18, and TTF-1 mRNA were expressed at significantly higher levels (P<0.05) in adenocarcinoma (AD) than in squamous cell carcinoma (SQ), while CK6 and CK14 mRNA expression was significantly higher (P<0.05) in SQ than in AD. Each histology-specific CK, particularly CK18 in AD and CK6 in SQ, were shown to be correlated with a poor prognosis (P=0.02, 0.02, respectively). Our results demonstrated that a quantitative CK subtype mRNA analysis from lung biopsy samples can be useful for predicting the histology subtype and prognosis of advanced NSCLC

  5. OccuPeak: ChIP-Seq peak calling based on internal background modelling

    NARCIS (Netherlands)

    de Boer, Bouke A.; van Duijvenboden, Karel; van den Boogaard, Malou; Christoffels, Vincent M.; Barnett, Phil; Ruijter, Jan M.

    2014-01-01

    ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the

  6. Widespread anti-sense transcription in apple is correlated with siRNA production and indicates a large potential for transcriptional and/or post-transcriptional control.

    Science.gov (United States)

    Celton, Jean-Marc; Gaillard, Sylvain; Bruneau, Maryline; Pelletier, Sandra; Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Navarro, Lionel; Laurens, François; Renou, Jean-Pierre

    2014-07-01

    Characterizing the transcriptome of eukaryotic organisms is essential for studying gene regulation and its impact on phenotype. The realization that anti-sense (AS) and noncoding RNA transcription is pervasive in many genomes has emphasized our limited understanding of gene transcription and post-transcriptional regulation. Numerous mechanisms including convergent transcription, anti-correlated expression of sense and AS transcripts, and RNAi remain ill-defined. Here, we have combined microarray analysis and high-throughput sequencing of small RNAs (sRNAs) to unravel the complexity of transcriptional and potential post-transcriptional regulation in eight organs of apple (Malus × domestica). The percentage of AS transcript expression is higher than that identified in annual plants such as rice and Arabidopsis thaliana. Furthermore, we show that a majority of AS transcripts are transcribed beyond 3'UTR regions, and may cover a significant portion of the predicted sense transcripts. Finally we demonstrate at a genome-wide scale that anti-sense transcript expression is correlated with the presence of both short (21-23 nt) and long (> 30 nt) siRNAs, and that the sRNA coverage depth varies with the level of AS transcript expression. Our study provides a new insight on the functional role of anti-sense transcripts at the genome-wide level, and a new basis for the understanding of sRNA biogenesis in plants. © 2014 INRA. New Phytologist © 2014 New Phytologist Trust.

  7. Efficient computation of co-transcriptional RNA-ligand interaction dynamics.

    Science.gov (United States)

    Wolfinger, Michael T; Flamm, Christoph; Hofacker, Ivo L

    2018-05-04

    Riboswitches form an abundant class of cis-regulatory RNA elements that mediate gene expression by binding a small metabolite. For synthetic biology applications, they are becoming cheap and accessible systems for selectively triggering transcription or translation of downstream genes. Many riboswitches are kinetically controlled, hence knowledge of their co-transcriptional mechanisms is essential. We present here an efficient implementation for analyzing co-transcriptional RNA-ligand interaction dynamics. This approach allows for the first time to model concentration-dependent metabolite binding/unbinding kinetics. We exemplify this novel approach by means of the recently studied I-A 2 ' -deoxyguanosine (2 ' dG)-sensing riboswitch from Mesoplasma florum. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Defining Brugia malayi and Wolbachia symbiosis by stage-specific dual RNA-seq.

    Directory of Open Access Journals (Sweden)

    Alexandra Grote

    2017-03-01

    Full Text Available Filarial nematodes currently infect up to 54 million people worldwide, with millions more at risk for infection, representing the leading cause of disability in the developing world. Brugia malayi is one of the causative agents of lymphatic filariasis and remains the only human filarial parasite that can be maintained in small laboratory animals. Many filarial nematode species, including B. malayi, carry an obligate endosymbiont, the alpha-proteobacteria Wolbachia, which can be eliminated through antibiotic treatment. Elimination of the endosymbiont interferes with development, reproduction, and survival of the worms within the mamalian host, a clear indicator that the Wolbachia are crucial for survival of the parasite. Little is understood about the mechanism underlying this symbiosis.To better understand the molecular interplay between these two organisms we profiled the transcriptomes of B. malayi and Wolbachia by dual RNA-seq across the life cycle of the parasite. This helped identify functional pathways involved in this essential symbiotic relationship provided by the co-expression of nematode and bacterial genes. We have identified significant stage-specific and gender-specific differential expression in Wolbachia during the nematode's development. For example, during female worm development we find that Wolbachia upregulate genes involved in ATP production and purine biosynthesis, as well as genes involved in the oxidative stress response.This global transcriptional analysis has highlighted specific pathways to which both Wolbachia and B. malayi contribute concurrently over the life cycle of the parasite, paving the way for the development of novel intervention strategies.

  9. Data in support of transcriptional regulation and function of Fas-antisense long noncoding RNA during human erythropoiesis

    Directory of Open Access Journals (Sweden)

    Olga Villamizar

    2016-06-01

    Full Text Available This paper describes data related to a research article titled, “Fas-antisense long noncoding RNA is differentially expressed during maturation of human erythrocytes and confers resistance to Fas-mediated cell death” [1]. Long noncoding RNAs (lncRNAs are increasingly appreciated for their capacity to regulate many steps of gene expression. While recent studies suggest that many lncRNAs are functional, the scope of their actions throughout human biology is largely undefined including human red blood cell development (erythropoiesis. Here we include expression data for 82 lncRNAs during early, intermediate and late stages of human erythropoiesis using a commercial qPCR Array. From these data, we identified lncRNA Fas-antisense 1 (Fas-AS1 or Saf described in the research article. Also included are 5′ untranslated sequences (UTR for lncRNA Saf with transcription factor target sequences identified. Quantitative RT-PCR data demonstrate relative levels of critical erythroid transcription factors, GATA-1 and KLF1, in K562 human erythroleukemia cells and maturing erythroblasts derived from human CD34+ cells. End point and quantitative RT-PCR data for cDNA prepared using random hexamers versus oligo(dT18 revealed that lncRNA Saf is not effectively polyadenylated. Finally, we include flow cytometry histograms demonstrating Fas levels on maturing erythroblasts derived from human CD34+ cells transduced using mock conditions or with lentivirus particles encoding for Saf.

  10. RNA Polymerase II–The Transcription Machine

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 12; Issue 3. RNA Polymerase II – The Transcription Machine - Nobel Prize in Chemistry 2006. Jiyoti Verma Aruna Naorem Anand Kumar Manimala Sen Parag Sadhale. General Article Volume 12 Issue 3 March 2007 pp 47-53 ...

  11. Computational prediction of miRNA genes from small RNA sequencing data

    Directory of Open Access Journals (Sweden)

    Wenjing eKang

    2015-01-01

    Full Text Available Next-generation sequencing now for the first time allows researchers to gauge the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. miRNAs are 22 nucleotide small RNAs (sRNAs that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq, which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field.

  12. Computational Methods for Quality Check, Preprocessing and Normalization of RNA-Seq Data for Systems Biology and Analysis

    DEFF Research Database (Denmark)

    Mazzoni, Gianluca; Kadarmideen, Haja N.

    2016-01-01

    quality control, trimming and filtering procedures, alignment, postmapping quality control, counting, normalization and differential expression test. For each step, we present the most common tools and we give a complete description of their main characteristics and advantages focusing on the statistics......The use of RNA sequencing (RNA-Seq) technologies is increasing mainly due to the development of new next-generation sequencing machines that have reduced the costs and the time needed for data generation. Nevertheless, microarrays are still the more common choice and one of the reasons...

  13. Gene expression change in human dental pulp cells exposed to a low-level toxic concentration of triethylene glycol dimethacrylate: an RNA-seq analysis.

    Science.gov (United States)

    Cho, Sung-Geun; Lee, Jin-Woo; Heo, Jung Sun; Kim, Sun-Young

    2014-09-01

    Dental composite resin restoration for defective tooth may lead unpolymerized resin monomers to be leached into dental pulp tissue. The aim of this study was to investigate the early gene expression change over time of human dental pulp cells (HDPCs) treated with a low-level toxic concentration of Triethylene Glycol Dimethacrylate (TEGDMA), a common dental resin monomer, by adopting the novel high-throughput transcriptome analysis of RNA-seq. The low-level toxic concentration of TEGDMA was determined through MTT assays with serially diluted concentrations. After the HDPCs were exposed to TEGDMA for 6, 12, 24 or 48 hr, the total RNA of the samples was prepared for RNA-seq. qRT-PCR for several genes was performed for validation of RNA-seq results. In the treated group, 1280 genes were differentially expressed compared with the control group. Five patterns of time-series gene expression profiles were identified through k-means clustering analysis. Angiogenesis, cell adhesion and migration, extracellular matrix organization, response to extracellular stimulus, inflammatory response and mineralization-related process were major gene ontology terms in functional annotation clustering. HMOX1, OSGIN1, SMN2, SRXN1 AKR1C1, SPP1 and TOMM40L were highly up-regulated genes, and WRAP53 and CCL2 were highly down-regulated genes over time. qRT-PCR for several genes exhibited a high level of agreement with RNA-seq. TEGDMA induced the HDPCs to show massive and dynamic gene expression changes over time. The previously suggested toxic mechanism of TEGDMA was not only verified, but new genes whose functions have yet to be determined were also found. © 2014 Nordic Association for the Publication of BCPT (former Nordic Pharmacological Society).

  14. Development and validation of quantitative PCR assays to measure cytokine transcript levels in the Florida manatee (Trichechus manatus latirostris)

    Science.gov (United States)

    Ferrante, Jason; Hunter, Margaret; Wellehan, James F.X.

    2018-01-01

    Cytokines have important roles in the mammalian response to viral and bacterial infections, trauma, and wound healing. Because of early cytokine production after physiologic stresses, the regulation of messenger RNA (mRNA) transcripts can be used to assess immunologic responses before changes in protein production. To detect and assess early immune changes in endangered Florida manatees (Trichechus manatus latirostris), we developed and validated a panel of quantitative PCR assays to measure mRNA transcription levels for the cytokines interferon (IFN)-γ; interleukin (IL)-2, -6, and -10; tumor necrosis factor-α, and the housekeeping genes glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and β-actin (reference genes). Assays were successfully validated using blood samples from free-ranging, apparently healthy manatees from the east and west coasts of central Florida. No cytokine or housekeeping gene transcription levels were significantly different among age classes or sexes. However, the transcription levels for GAPDH, IL-2, IL-6, and IFN-γ were significantly higher (Puse as a reference gene in future studies. Our assays can aid in the investigation of manatee immune response to physical trauma and novel or ongoing environmental stressors.

  15. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    Science.gov (United States)

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  16. RNA-Seq transcriptomics and pathway analyses reveal potential regulatory genes and molecular mechanisms in high- and low-residual feed intake in Nordic dairy cattle

    DEFF Research Database (Denmark)

    Salleh, M. S.; Mazzoni, G.; Höglund, J. K.

    2017-01-01

    -throughput RNA sequencing data of liver biopsies from 19 dairy cows were used to identify differentially expressed genes (DEGs) between high- and low-FE groups of cows (based on Residual Feed Intake or RFI). Subsequently, a profile of the pathways connecting the DEGs to FE was generated, and a list of candidate...... genes and biomarkers was derived for their potential inclusion in breeding programmes to improve FE. The bovine RNA-Seq gene expression data from the liver was analysed to identify DEGs and, subsequently, identify the molecular mechanisms, pathways and possible candidate biomarkers of feed efficiency....... On average, 57 million reads (short reads or short mRNA sequences ...

  17. mRNA-Seq Reveals Novel Molecular Mechanisms and a Robust Fingerprint in Graves' Disease

    Science.gov (United States)

    Sachidanandam, Ravi; Morshed, Syed; Latif, Rauf; Shi, Ruijin; Davies, Terry F.

    2014-01-01

    Context: The immune response in autoimmune thyroid disease has been shown to occur primarily within the thyroid gland in which the most abundant antigens can be found. A variety of capture molecules are known to be expressed by thyroid epithelial cells and serve to attract and help retain an intrathyroidal immune infiltrate. Objective: To explore the entire repertoire of expressed genes in human thyroid tissue, we have deep sequenced the transcriptome (referred to as mRNA-Seq). Design and Patients: We applied mRNA-Seq to thyroid tissue from nine patients with Graves' disease subjected to total thyroidectomy and compared the data with 12 samples of normal thyroid tissue obtained from patients having a thyroid nodule removed. The expression for each gene was calculated from the sequencing data by taking the median of the coverage across the length of the gene. The expression levels were quantile normalized and a gene signature was derived from these. Results: On comparison of expression levels in tissues derived from Graves' patients and controls, there was clear evidence for overexpression of the antigen presentation pathway consisting of HLA and associated genes. We also found a robust disease signature and discovered active innate and adaptive immune signaling networks. Conclusions: These data reveal an active immune defense system in Graves' disease, which involves novel molecular mechanisms in its pathogenesis and development. PMID:24971664

  18. Quantitative transcription dynamic analysis reveals candidate genes and key regulators for ethanol tolerance in Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Ma Menggen

    2010-06-01

    Full Text Available Abstract Background Derived from our lignocellulosic conversion inhibitor-tolerant yeast, we generated an ethanol-tolerant strain Saccharomyces cerevisiae NRRL Y-50316 by enforced evolutionary adaptation. Using a newly developed robust mRNA reference and a master equation unifying gene expression data analyses, we investigated comparative quantitative transcription dynamics of 175 genes selected from previous studies for an ethanol-tolerant yeast and its closely related parental strain. Results A highly fitted master equation was established and applied for quantitative gene expression analyses using pathway-based qRT-PCR array assays. The ethanol-tolerant Y-50316 displayed significantly enriched background of mRNA abundance for at least 35 genes without ethanol challenge compared with its parental strain Y-50049. Under the ethanol challenge, the tolerant Y-50316 responded in consistent expressions over time for numerous genes belonging to groups of heat shock proteins, trehalose metabolism, glycolysis, pentose phosphate pathway, fatty acid metabolism, amino acid biosynthesis, pleiotropic drug resistance gene family and transcription factors. The parental strain showed repressed expressions for many genes and was unable to withstand the ethanol stress and establish a viable culture and fermentation. The distinct expression dynamics between the two strains and their close association with cell growth, viability and ethanol fermentation profiles distinguished the tolerance-response from the stress-response in yeast under the ethanol challenge. At least 82 genes were identified as candidate and key genes for ethanol-tolerance and subsequent fermentation under the stress. Among which, 36 genes were newly recognized by the present study. Most of the ethanol-tolerance candidate genes were found to share protein binding motifs of transcription factors Msn4p/Msn2p, Yap1p, Hsf1p and Pdr1p/Pdr3p. Conclusion Enriched background of transcription abundance

  19. 3USS: a web server for detecting alternative 3'UTRs from RNA-seq experiments.

    KAUST Repository

    Le Pera, Loredana; Mazzapioda, Mariagiovanna; Tramontano, Anna

    2015-01-01

    Protein-coding genes with multiple alternative polyadenylation sites can generate mRNA 3'UTR sequences of different lengths, thereby causing the loss or gain of regulatory elements, which can affect stability, localization and translation efficiency. 3USS is a web-server developed with the aim of giving experimentalists the possibility to automatically identify alternative 3 ': UTRs (shorter or longer with respect to a reference transcriptome), an option that is not available in standard RNA-seq data analysis procedures. The tool reports as putative novel the 3 ': UTRs not annotated in available databases. Furthermore, if data from two related samples are uploaded, common and specific alternative 3 ': UTRs are identified and reported by the server.3USS is freely available at http://www.biocomputing.it/3uss_server.

  20. 3USS: a web server for detecting alternative 3'UTRs from RNA-seq experiments.

    KAUST Repository

    Le Pera, Loredana

    2015-01-22

    Protein-coding genes with multiple alternative polyadenylation sites can generate mRNA 3\\'UTR sequences of different lengths, thereby causing the loss or gain of regulatory elements, which can affect stability, localization and translation efficiency. 3USS is a web-server developed with the aim of giving experimentalists the possibility to automatically identify alternative 3 \\': UTRs (shorter or longer with respect to a reference transcriptome), an option that is not available in standard RNA-seq data analysis procedures. The tool reports as putative novel the 3 \\': UTRs not annotated in available databases. Furthermore, if data from two related samples are uploaded, common and specific alternative 3 \\': UTRs are identified and reported by the server.3USS is freely available at http://www.biocomputing.it/3uss_server.