Bryce, C. F. A.
Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…
Lundblad, Eirik W.; Xiao, Gaoping; Ko, Jae-hyeong; Altman, Sidney
A method of inhibiting the expression of particular genes by using external guide sequences (EGSs) has been improved in its rapidity and specificity. Random EGSs that have 14-nt random sequences are used in the selection procedure for an EGS that attacks the mRNA for a gene in a particular location. A mixture of the random EGSs, the particular target RNA, and RNase P is used in the diagnostic procedure, which, after completion, is analyzed in a gel with suitable control lanes. Within a few ho...
Xiong, Yuguang; Soumillon, Magali; Wu, Jie; Hansen, Jens; Hu, Bin; van Hasselt, Johan G C; Jayaraman, Gomathi; Lim, Ryan; Bouhaddou, Mehdi; Ornelas, Loren; Bochicchio, Jim; Lenaeus, Lindsay; Stocksdale, Jennifer; Shim, Jaehee; Gomez, Emilda; Sareen, Dhruv; Svendsen, Clive; Thompson, Leslie M; Mahajan, Milind; Iyengar, Ravi; Sobie, Eric A; Azeloglu, Evren U; Birtwistle, Marc R
Creating a cDNA library for deep mRNA sequencing (mRNAseq) is generally done by random priming, creating multiple sequencing fragments along each transcript. A 3'-end-focused library approach cannot detect differential splicing, but has potentially higher throughput at a lower cost, along with the ability to improve quantification by using transcript molecule counting with unique molecular identifiers (UMI) that correct PCR bias. Here, we compare an implementation of such a 3'-digital gene expression (3'-DGE) approach with "conventional" random primed mRNAseq. Given our particular datasets on cultured human cardiomyocyte cell lines, we find that, while conventional mRNAseq detects ~15% more genes and needs ~500,000 fewer reads per sample for equivalent statistical power, the resulting differentially expressed genes, biological conclusions, and gene signatures are highly concordant between two techniques. We also find good quantitative agreement at the level of individual genes between two techniques for both read counts and fold changes between given conditions. We conclude that, for high-throughput applications, the potential cost savings associated with 3'-DGE approach are likely a reasonable tradeoff for modest reduction in sensitivity and inability to observe alternative splicing, and should enable many larger scale studies focusing on not only differential expression analysis, but also quantitative transcriptome profiling.
Lundblad, Eirik W; Xiao, Gaoping; Ko, Jae-Hyeong; Altman, Sidney
A method of inhibiting the expression of particular genes by using external guide sequences (EGSs) has been improved in its rapidity and specificity. Random EGSs that have 14-nt random sequences are used in the selection procedure for an EGS that attacks the mRNA for a gene in a particular location. A mixture of the random EGSs, the particular target RNA, and RNase P is used in the diagnostic procedure, which, after completion, is analyzed in a gel with suitable control lanes. Within a few hours, the procedure is complete. The action of EGSs designed by an older method is compared with EGSs designed by the random EGS method on mRNAs from two bacterial pathogens.
Full Text Available The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR method, followed by incremental feature selection (IFS. We incorporated features of conjoint triad features and three novel features: binding propensity (BP, nonbinding propensity (NBP, and evolutionary information combined with physicochemical properties (EIPP. The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient. High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
Majem, Blanca; Li, Feng; Sun, Jie; Wong, David T W
Salivary biomarkers for disease detection, diagnostic and prognostic assessments have become increasingly well established in recent years. In this chapter we explain the current leading technology that has been used to characterize salivary non-coding RNAs (ncRNAs) from the extracellular RNA (exRNA) fraction: HiSeq from Illumina® platform for RNA sequencing. Therefore, the chapter is divided into two main sections regarding the type of the library constructed (small and long ncRNA libraries), from saliva collection, RNA extraction and quantification to cDNA library generation and corresponding QCs. Using these invaluable technical tools, one can identify thousands of ncRNA species in saliva. These methods indicate that salivary exRNA provides an efficient medium for biomarker discovery of oral and systemic diseases.
This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...
Full Text Available In this paper, the central limit theorems for the density estimator and for the integrated square error are proved for the case when the underlying sequence of random variables is nonstationary. Applications to Markov processes and ARMA processes are provided.
Derks, Kasper W J; Misovic, Branislav; van den Hout, Mirjam C G N; Kockx, Christel E M; Gomez, Cesar Payan; Brouwer, Rutger W W; Vrieling, Harry; Hoeijmakers, Jan H J; van IJcken, Wilfred F J; Pothof, Joris
Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods.
Derks, Kasper W J; Pothof, Joris
Standard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specific method to determine the expression of small and large RNAs from ribosomal RNA-depleted total RNA in a single sequence run. RNAome sequencing quantitatively preserves all RNA classes. This characteristic allows comparisons between RNA classes, thereby facilitating relationships between different RNA classes. Here, we describe in detail the experimental procedure associated with RNAome sequencing published by Derks and colleagues in RNA Biology (2015) . We also provide the R code for the developed Total Rna Analysis Pipeline (TRAP), an algorithm to analyze RNAome sequencing datasets (deposited at the Gene Expression Omnibus data repository, accession number GSE48084).
Kasper W.J. Derks
Full Text Available Standard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specific method to determine the expression of small and large RNAs from ribosomal RNA-depleted total RNA in a single sequence run. RNAome sequencing quantitatively preserves all RNA classes. This characteristic allows comparisons between RNA classes, thereby facilitating relationships between different RNA classes. Here, we describe in detail the experimental procedure associated with RNAome sequencing published by Derks and colleagues in RNA Biology (2015 . We also provide the R code for the developed Total Rna Analysis Pipeline (TRAP, an algorithm to analyze RNAome sequencing datasets (deposited at the Gene Expression Omnibus data repository, accession number GSE48084.
K.W.J. Derks (Kasper); J. Pothof (Joris)
textabstractStandard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a
K.W.J. Derks (Kasper); B. Misovic (Branislav); M.C.G.N. van den hout (Mirjam); C. Kockx (Christel); C.P. Gomez (Cesar Payan); R.W.W. Brouwer (Rutger); H. Vrieling (Harry); J.H.J. Hoeijmakers (Jan); W.F.J. van IJcken (Wilfred); J. Pothof (Joris)
textabstractCurrent RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a
Raabe, Carsten A.; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S.
High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samp...
Raabe, Carsten A; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S
High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias.
Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.
Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Portal, Maximiliano M; Pavet, Valeria; Erb, Cathie; Gronemeyer, Hinrich
High-throughput transcriptional analysis has unveiled a myriad of novel RNAs. However, technical constraints in RNA sequencing library preparation and platform performance hamper the identification of rare transcripts contained within the RNA repertoire. Herein we present targeted-RNA directional sequencing (TARDIS), a hybridization-based method that allows subsets of RNAs contained within the transcriptome to be interrogated independently of transcript length, function, the presence or absence of poly-A tracts, or the mechanism of biogenesis. TARDIS is a modular protocol that is subdivided into four main phases, including the generation of random DNA traps covering the region of interest, purification of input RNA material, DNA trap-based RNA capture, and finally RNA-sequencing library construction. Importantly, coupling RNA capture to strand-specific RNA sequencing enables robust identification and reconstruction of novel transcripts, the definition of sense and antisense RNA pairs and, by the concomitant analysis of long and natural small RNA pools, it allows the user to infer potential precursor-product relations. TARDIS takes ∼10 d to implement.
't Hoen, Peter A C; Friedländer, Marc R; Almlöf, Jonas; Sammeth, Michael; Pulyakhina, Irina; Anvar, Seyed Yahya; Laros, Jeroen F J; Buermans, Henk P J; Karlberg, Olof; Brännvall, Mathias; den Dunnen, Johan T; van Ommen, Gert-Jan B; Gut, Ivo G; Guigó, Roderic; Estivill, Xavier; Syvänen, Ann-Christine; Dermitzakis, Emmanouil T; Lappalainen, Tuuli
RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.
block of four bases or more could be required by the target sites to find suitable interactions with the targeting ... end, and (ii) randomized target UTRs retaining the same nucleotide composition, in order to avoid any ... analysis were the complete target region of 70-base flanking sequences around the microRNA target site.
Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.
Chu, Yongjun; Wang, Tao; Dodd, David; Xie, Yang; Janowski, Bethany A.; Corey, David R.
RNA sequencing (RNA-Seq) is a powerful tool for analyzing the identity of cellular RNAs but is often limited by the amount of material available for analysis. In spite of extensive efforts employing existing protocols, we observed that it was not possible to obtain useful sequencing libraries from nuclear RNA derived from cultured human cells after crosslinking and immunoprecipitation (CLIP). Here, we report a method for obtaining strand-specific small RNA libraries for RNA sequencing that requires picograms of RNA. We employ an intramolecular circularization step that increases the efficiency of library preparation and avoids the need for intermolecular ligations of adaptor sequences. Other key features include random priming for full-length cDNA synthesis and gel-free library purification. Using our method, we generated CLIP-Seq libraries from nuclear RNA that had been UV-crosslinked and immunoprecipitated with anti-Argonaute 2 (Ago2) antibody. Computational protocols were developed to enable analysis of raw sequencing data and we observe substantial differences between recognition by Ago2 of RNA species in the nucleus relative to the cytoplasm. This RNA self-circularization approach to RNA sequencing (RC-Seq) allows data to be obtained using small amounts of input RNA that cannot be sequenced by standard methods. PMID:25813040
Unnithan, Veena V.; Unc, Adrian; Joe, Valerisa; Smith, Geoffrey B.
Short indicator RNA sequences (autoclaving and are recovered intact by molecular amplification. Primers targeting longer sequences are most likely to produce false positives due to amplification errors easily verified by melting curves analyses. If short indicator RNA sequences are used for virus identification and quantification then post autoclave RNA degradation methodology should be employed, which may include further autoclaving. PMID:24518856
Ebhardt, H. Alexander; Tsang, Herbert H.; Dai, Denny C.; Liu, Yifeng; Bostan, Babak; Fahlman, Richard P.
Recent advances in DNA-sequencing technology have made it possible to obtain large datasets of small RNA sequences. Here we demonstrate that not all non-perfectly matched small RNA sequences are simple technological sequencing errors, but many hold valuable biological information. Analysis of three small RNA datasets originating from Oryza sativa and Arabidopsis thaliana small RNA-sequencing projects demonstrates that many single nucleotide substitution errors overlap when aligning homologous...
Wei, Chenyu; Pohorille, Andrzej; Popovic, Milena; Ditzler, Mark
Emergence of replicable genetic molecules was one of the marking points in the origin of life, evolution of which can be conceptualized as a walk through the space of all possible sequences. A theoretical concept of fitness landscape helps to understand evolutionary processes through assigning a value of fitness to each genotype. Then, evolution of a phenotype is viewed as a series of consecutive, single-point mutations. Natural selection biases evolution toward peaks of high fitness and away from valleys of low fitness. whereas neutral drift occurs in the sequence space without direction as mutations are introduced at random. Large networks of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long genomes, are possible or even inevitable. Their detection in experiments, however, has been elusive. Although a few near-neutral evolutionary pathways have been found, recent experimental evidence indicates landscapes consist of largely isolated islands. The generality of these results, however, is not clear, as the genome length or the fraction of functional molecules in the genotypic space might have been insufficient for the emergence of large, neutral networks. Thorough investigation on the structure of the fitness landscape is essential to understand the mechanisms of evolution of early genomes. RNA molecules are commonly assumed to play the pivotal role in the origin of genetic systems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of our recent studies on the structure of the sequence space of RNA ligase ribozymes selected through in vitro evolution. Several hundred thousands of sequences active to a different degree were obtained by way of deep sequencing. Analysis of these sequences revealed
In this thesis I focus on the application of bioinformatics to analyze RNA. The type of experimental data of interest is sequencing data generated with various Next Generation Sequencing technique: nuclear RNA, cytoplasmic RNA, captured polyadenylated RNA fragments, etc. I highlight the necessity in
Mehta, Jai Prakash
Small RNAs are important transcriptional regulators within cells. With the advent of powerful Next Generation Sequencing platforms, sequencing small RNAs seems to be an obvious choice to understand their expression and its downstream effect. Additionally, sequencing provides an opportunity to identify novel and polymorphic miRNA. However, the biggest challenge is the appropriate data analysis pipeline, which is still in phase of active development by various academic groups. This chapter describes basic and advanced steps for small RNA sequencing analysis including quality control, small RNA alignment and quantification, differential expression analysis, novel small RNA identification, target prediction, and downstream analysis. We also provide a list of various resources for small RNA analysis.
Gommans, W.M.; Berezikov, E.
High-throughput sequencing has allowed for a comprehensive small RNA (sRNA) expression analysis of numerous tissues in a diverse set of organisms. The computational analysis of the millions of generated sequencing reads has led to the discovery of novel miRNAs and other sRNA species, and resulted in
Nebel Markus E
Full Text Available Abstract Background Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest. Results In this article, we present a new general framework for deriving algorithms for the non-uniform random generation of combinatorial objects according to the encoding and probability distribution implied by a stochastic context-free grammar. Briefly, the framework extends on the well-known recursive method for (uniform random generation and uses the popular framework of admissible specifications of combinatorial classes, introducing weighted combinatorial classes to allow for the non-uniform generation by means of unranking. This framework is used to derive an algorithm for the generation of RNA secondary structures of a given fixed size. We address the random generation of these structures according to a realistic distribution obtained from real-life data by using a very detailed context-free grammar (that models the class of RNA secondary structures by distinguishing between all known motifs in RNA structure. Compared to well-known sampling approaches used in several structure prediction tools (such as SFold ours has two major advantages: Firstly, after a preprocessing step in time O(n2 for the computation of all weighted class sizes needed, with our approach a set of m random secondary structures of a given structure size n can be computed in worst-case time complexity Om⋅n⋅ log(n while other algorithms typically have a runtime in O(m⋅n2. Secondly, our approach works with integer arithmetic only which is faster and saves us from all the discomforting details of using floating point arithmetic with
RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inver...
Qin, Li-Xuan; Tuschl, Thomas; Singer, Samuel
The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic nature of exponential PCR amplification. We validated our findings with two independent datasets, one for microRNA sequencing and another for RNA sequencing. Motivated by the gamma distributed stochasticity, we provided a simple method for the analysis of RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.
Full Text Available RNA molecules have emerged as promising therapeutics. Like all other drugs, the safety profile and immune response are important criteria for drug evaluation. However, the literature on RNA immunogenicity has been controversial. Here, we used the approach of RNA nanotechnology to demonstrate that the immune response of RNA nanoparticles is size, shape, and sequence dependent. RNA triangle, square, pentagon, and tetrahedron with same shape but different sizes, or same size but different shapes were used as models to investigate the immune response. The levels of pro-inflammatory cytokines induced by these RNA nanoarchitectures were assessed in macrophage-like cells and animals. It was found that RNA polygons without extension at the vertexes were immune inert. However, when single-stranded RNA with a specific sequence was extended from the vertexes of RNA polygons, strong immune responses were detected. These immunostimulations are sequence specific, because some other extended sequences induced little or no immune response. Additionally, larger-size RNA square induced stronger cytokine secretion. 3D RNA tetrahedron showed stronger immunostimulation than planar RNA triangle. These results suggest that the immunogenicity of RNA nanoparticles is tunable to produce either a minimal immune response that can serve as safe therapeutic vectors, or a strong immune response for cancer immunotherapy or vaccine adjuvants.
Ebhardt, H Alexander; Tsang, Herbert H; Dai, Denny C; Liu, Yifeng; Bostan, Babak; Fahlman, Richard P
Recent advances in DNA-sequencing technology have made it possible to obtain large datasets of small RNA sequences. Here we demonstrate that not all non-perfectly matched small RNA sequences are simple technological sequencing errors, but many hold valuable biological information. Analysis of three small RNA datasets originating from Oryza sativa and Arabidopsis thaliana small RNA-sequencing projects demonstrates that many single nucleotide substitution errors overlap when aligning homologous non-identical small RNA sequences. Investigating the sites and identities of substitution errors reveal that many potentially originate as a result of post-transcriptional modifications or RNA editing. Modifications include N1-methyl modified purine nucleotides in tRNA, potential deamination or base substitutions in micro RNAs, 3' micro RNA uridine extensions and 5' micro RNA deletions. Additionally, further analysis of large sequencing datasets reveal that the combined effects of 5' deletions and 3' uridine extensions can alter the specificity by which micro RNAs associate with different Argonaute proteins. Hence, we demonstrate that not all sequencing errors in small RNA datasets are technical artifacts, but that these actually often reveal valuable biological insights to the sites of post-transcriptional RNA modifications.
Christensen, H.; Nordentoft, Steen; Olsen, J.E.
separated by 16S rRNA analysis and found to be closely related to the Escherichia coli and Shigella complex by both 16S and 23S rRNA analyses. The diphasic serotypes S. enterica subspp. I and VI were separated from the monophasic serotypes subspp. IIIa and IV, including S. bongori, by 23S rRNA sequence...
Goldberg, Robert B.; Galau, Glenn A.; Britten, Roy J.; Davidson, Eric H.
Messenger RNA was prepared from developing sea urchin gastrulae by puromycin release from polyribosomes. Approximately 60% of the total mRNA radioactivity of the postnuclear supernatant was recovered and shown to be free of any other labeled RNA species such as ribosomal and nuclear RNA. The mRNA was examined by hybridization to DNA present in great excess. The mRNA hybridizes almost exclusively with nonrepetitive DNA. Almost all of the messenger RNA molecules of sea urchin gastrulae therefore consist of transcripts from nonrepetitive sequences. It appears that the structural genes expressed at this stage are typically not repeated in the genome and the mRNA does not include recognizable repetitive sequence. PMID:4519642
Full Text Available RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA, MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i a new crossover operator is implemented and (ii pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.
RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i) a new crossover operator is implemented and (ii) pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.
Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.
We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences...
Qin, Li-Xuan; Tuschl, Thomas; Singer, Samuel
The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic na...
Zhu, Yanglong; Stribinskis, Vilius; Ramos, Kenneth S.; Li, Yong
RNase MRP is a eukaryote-specific endoribonuclease that generates RNA primers for mitochondrial DNA replication and processes precursor rRNA. RNase P is a ubiquitous endoribonuclease that cleaves precursor tRNA transcripts to produce their mature 5′ termini. We found extensive sequence homology of catalytic domains and specificity domains between their RNA subunits in many organisms. In Candida glabrata, the internal loop of helix P3 is 100% conserved between MRP and P RNAs. The helix P8 of MRP RNA from microsporidia Encephalitozoon cuniculi is identical to that of P RNA. Sequence homology can be widely spread over the whole molecule of MRP RNA and P RNA, such as those from Dictyostelium discoideum. These conserved nucleotides between the MRP and P RNAs strongly support the hypothesis that the MRP RNA is derived from the P RNA molecule in early eukaryote evolution. PMID:16540690
Masliah, Grégoire; Barraud, Pierre; Allain, Frédéric H-T
The double-stranded RNA binding domain (dsRBD) is a small protein domain of 65-70 amino acids adopting an αβββα fold, whose central property is to bind to double-stranded RNA (dsRNA). This domain is present in proteins implicated in many aspects of cellular life, including antiviral response, RNA editing, RNA processing, RNA transport and, last but not least, RNA silencing. Even though proteins containing dsRBDs can bind to very specific dsRNA targets in vivo, the binding of dsRBDs to dsRNA is commonly believed to be shape-dependent rather than sequence-specific. Interestingly, recent structural information on dsRNA recognition by dsRBDs opens the possibility that this domain performs a direct readout of RNA sequence in the minor groove, allowing a global reconsideration of the principles describing dsRNA recognition by dsRBDs. We review in this article the current structural and molecular knowledge on dsRBDs, emphasizing the intricate relationship between the amino acid sequence, the structure of the domain and its RNA recognition capacity. We especially focus on the molecular determinants of dsRNA recognition and describe how sequence discrimination can be achieved by this type of domain.
Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Busch, Anke; Backofen, Rolf
INFO-RNA is a new web server for designing RNA sequences that fold into a user given secondary structure. Furthermore, constraints on the sequence can be specified, e.g. one can restrict sequence positions to a fixed nucleotide or to a set of nucleotides. Moreover, the user can allow violations of the constraints at some positions, which can be advantageous in complicated cases. The INFO-RNA web server allows biologists to design RNA sequences in an automatic manner. It is clearly and intuitively arranged and easy to use. The procedure is fast, as most applications are completed within seconds and it proceeds better and faster than other existing tools. The INFO-RNA web server is freely available at http://www.bioinf.uni-freiburg.de/Software/INFO-RNA/
Full Text Available RNA silencing functions as an important antiviral defense mechanism in a broad range of eukaryotes. In plants, biogenesis of several classes of endogenous small interfering RNAs (siRNAs requires RNA-dependent RNA Polymerase (RDR activities. Members of the RDR family proteins, including RDR1and RDR6, have also been implicated in antiviral defense, although a direct role for RDRs in viral siRNA biogenesis has yet to be demonstrated. Using a crucifer-infecting strain of Tobacco Mosaic Virus (TMV-Cg and Arabidopsis thaliana as a model system, we analyzed the viral small RNA profile in wild-type plants as well as rdr mutants by applying small RNA deep sequencing technology. Over 100,000 TMV-Cg-specific small RNA reads, mostly of 21- (78.4% and 22-nucleotide (12.9% in size and originating predominately (79.9% from the genomic sense RNA strand, were captured at an early infection stage, yielding the first high-resolution small RNA map for a plant virus. The TMV-Cg genome harbored multiple, highly reproducible small RNA-generating hot spots that corresponded to regions with no apparent local hairpin-forming capacity. Significantly, both the rdr1 and rdr6 mutants exhibited globally reduced levels of viral small RNA production as well as reduced strand bias in viral small RNA population, revealing an important role for these host RDRs in viral siRNA biogenesis. In addition, an informatics analysis showed that a large set of host genes could be potentially targeted by TMV-Cg-derived siRNAs for posttranscriptional silencing. Two of such predicted host targets, which encode a cleavage and polyadenylation specificity factor (CPSF30 and an unknown protein similar to translocon-associated protein alpha (TRAP alpha, respectively, yielded a positive result in cleavage validation by 5'RACE assays. Our data raised the interesting possibility for viral siRNA-mediated virus-host interactions that may contribute to viral pathogenicity and host specificity.
Tremblay Josselyne; Le Bourhis Guenhael; Schönhuber Wilhelm; Amann Rudolf; Kulakauskas Saulius
Abstract Background Ribosomal RNA molecules are widely used for phylogenetic and in situ identification of bacteria. Nevertheless, their use to distinguish microorganisms within a species is often restricted by the high degree of sequence conservation and limited probe accessibility to the target in fluorescence in situ hybridization (FISH). To overcome these limitations, we examined the use of tmRNA for in situ identification. In E. coli, this stable 363 nucleotides long RNA is encoded by th...
Full Text Available Abstract Background Ribosomal RNA molecules are widely used for phylogenetic and in situ identification of bacteria. Nevertheless, their use to distinguish microorganisms within a species is often restricted by the high degree of sequence conservation and limited probe accessibility to the target in fluorescence in situ hybridization (FISH. To overcome these limitations, we examined the use of tmRNA for in situ identification. In E. coli, this stable 363 nucleotides long RNA is encoded by the ssrA gene, which is involved in the degradation of truncated proteins. Results Conserved sequences at the 5'- and 3'-ends of tmRNA genes were used to design universal primers that could amplify the internal part of ssrA from Gram-positive bacteria having low G+C content, i.e. genera Bacillus, Enterococcus, Lactococcus, Lactobacillus, Leuconostoc, Listeria, Streptococcus and Staphylococcus. Sequence analysis of tmRNAs showed that this molecule can be used for phylogenetic assignment of bacteria. Compared to 16S rRNA, the tmRNA nucleotide sequences of some bacteria, for example Listeria, display considerable divergence between species. Using E. coli as an example, we have shown that bacteria can be specifically visualized by FISH with tmRNA targeted probes. Conclusions Features of tmRNA, including its presence in phylogenetically distant bacteria, conserved regions at gene extremities and a potential to serve as target for FISH, make this molecule a possible candidate for identification of bacteria.
Schönhuber, W; Le Bourhis, G; Tremblay, J; Amann, R; Kulakauskas, S
Ribosomal RNA molecules are widely used for phylogenetic and in situ identification of bacteria. Nevertheless, their use to distinguish microorganisms within a species is often restricted by the high degree of sequence conservation and limited probe accessibility to the target in fluorescence in situ hybridization (FISH). To overcome these limitations, we examined the use of tmRNA for in situ identification. In E. coli, this stable 363 nucleotides long RNA is encoded by the ssrA gene, which is involved in the degradation of truncated proteins. Conserved sequences at the 5'- and 3'-ends of tmRNA genes were used to design universal primers that could amplify the internal part of ssrA from Gram-positive bacteria having low G+C content, i.e. genera Bacillus, Enterococcus, Lactococcus, Lactobacillus, Leuconostoc, Listeria, Streptococcus and Staphylococcus. Sequence analysis of tmRNAs showed that this molecule can be used for phylogenetic assignment of bacteria. Compared to 16S rRNA, the tmRNA nucleotide sequences of some bacteria, for example Listeria, display considerable divergence between species. Using E. coli as an example, we have shown that bacteria can be specifically visualized by FISH with tmRNA targeted probes. Features of tmRNA, including its presence in phylogenetically distant bacteria, conserved regions at gene extremities and a potential to serve as target for FISH, make this molecule a possible candidate for identification of bacteria.
Johansen, Steinar D; Emblem, Ase; Karlsen, Bård Ove; Okkenhaug, Siri; Hansen, Hilde; Moum, Truls; Coucheron, Dag H; Seternes, Ole Morten
RNA deep sequencing represents a new complementary approach in marine bioprospecting. Next-generation sequencing platforms have recently been developed for de novo whole transcriptome analysis, small RNA discovery and gene expression profiling. Deep sequencing transcriptomics (sequencing the complete set of cellular transcripts at a specific stage or condition) leads to sequential identification of all expressed genes in a sample. When combined to high-throughput bioinformatics and protein synthesis, RNA deep sequencing represents a new powerful approach in gene product discovery and bioprospecting. Here we summarize recent progress in the analyses of hexacoral transcriptomes with the focus on cold-water sea anemones and related organisms. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Wake, Christian; Labadorf, Adam; Dumitriu, Alexandra; Hoss, Andrew G; Bregu, Joli; Albrecht, Kenneth H; DeStefano, Anita L; Myers, Richard H
MicroRNAs (miRNAs) are short, non-coding RNAs that regulate gene expression mainly through translational repression of target mRNA molecules. More than 2700 human miRNAs have been identified and some are known to be associated with disease phenotypes and to display tissue-specific patterns of expression. We used high-throughput small RNA sequencing to discover novel miRNAs in 93 human post-mortem prefrontal cortex samples from individuals with Huntington's disease (n = 28) or Parkinson's disease (n = 29) and controls without neurological impairment (n = 36). A custom miRNA identification analysis pipeline was built, which utilizes miRDeep* miRNA identification and result filtering based on false positive rate estimates. Ninety-nine novel miRNA candidates with a false positive rate of less than 5 % were identified. Thirty-four of the candidate miRNAs show sequence similarity with known mature miRNA sequences and may be novel members of known miRNA families, while the remaining 65 may constitute previously undiscovered families of miRNAs. Nineteen of the 99 candidate miRNAs were replicated using independent, publicly-available human brain RNA-sequencing samples, and seven were experimentally validated using qPCR. We have used small RNA sequencing to identify 99 putative novel miRNAs that are present in human brain samples.
Gardner, Paul P; Vinther, Jeppe
It has long-been hypothesized that changes in non-protein-coding genes and the regulatory sequences controlling expression could undergo positive selection. Here we identify 402 putative microRNA (miRNA) target sequences that have been mutated specifically in the human lineage and show that genes...... containing such deletions are more highly expressed than their mouse orthologs. Our findings indicate that some miRNA target mutations are fixed by positive selection and might have been involved in the evolution of human-specific traits....
George, D. G.; Dayhoff, M. O.
The proposed recognition sites for RNA transcription for E. coli NRA polymerase, bacteriophage T7 RNA polymerase, and eukaryotic RNA polymerase Pol II are evaluated in the light of the requirements for efficient recognition. It is shown that although there is good experimental evidence that specific nucleic acid sequence patterns are involved in transcriptional regulation in bacteria and bacterial viruses, among the sequences now available, only in the case of the promoters recognized by bacteriophage T7 polymerase does it seem likely that the pattern is sufficient. It is concluded that the eukaryotic pattern that is investigated is not restrictive enough to serve as a recognition site.
Jennifer A Mitchell
Full Text Available In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.
Guan, Lirui; Luo, Yiling; Ja, William W; Disney, Matthew D
RNA regulation and maintenance are critical for proper cell function. Small molecules that specifically alter RNA sequence would be exceptionally useful as probes of RNA structure and function or as potential therapeutics. Here, we demonstrate a photochemical approach for altering the trinucleotide expanded repeat causative of myotonic muscular dystrophy type 1 (DM1), r(CUG)exp. The small molecule, 2H-4-Ru, binds to r(CUG)exp and converts guanosine residues to 8-oxo-7,8-dihydroguanosine upon photochemical irradiation. We demonstrate targeted modification upon irradiation in cell culture and in Drosophila larvae provided a diet containing 2H-4-Ru. Our results highlight a general chemical biology approach for altering RNA sequence in vivo by using small molecules and photochemistry. Furthermore, these studies show that addition of 8-oxo-G lesions into RNA 3' untranslated regions does not affect its steady state levels. Copyright © 2017 Elsevier Ltd. All rights reserved.
Schnattinger, Thomas; Schöning, Uwe; Marchfelder, Anita; Kestler, Hans A
Incorporating secondary structure information into the alignment process improves the quality of RNA sequence alignments. Instead of using fixed weighting parameters, sequence and structure components can be treated as different objectives and optimized simultaneously. The result is not a single, but a Pareto-set of equally optimal solutions, which all represent different possible weighting parameters. We now provide the interactive graphical software tool RNA-Pareto, which allows a direct inspection of all feasible results to the pairwise RNA sequence-structure alignment problem and greatly facilitates the exploration of the optimal solution set.
Workman, Christopher; Krogh, Anders Stærmose
This work investigates whether mRNA has a lower estimated folding free energy than random sequences. The free energy estimates are calculated by the mfold program for prediction of RNA secondary structures. For a set of 46 mRNAs it is shown that the predicted free energy is not significantly...... different from random sequences with the same dinucleotide distribution. For random sequences with the same mononucleotide distribution it has previously been shown that the native mRNA sequences have a lower predicted free energy, which indicates a more stable structure than random sequences. However......, dinucleotide content is important when assessing the significance of predicted free energy as the physical stability of RNA secondary structure is known to depend on dinucleotide base stacking energies. Even known RNA secondary structures, like tRNAs, can be shown to have predicted free energies...
RNA sequencing (RNA-Seq) is becoming the standard for transcriptome analysis. Removal of contaminating ribosomal RNA (rRNA) is a priority in the preparation of libraries suitable for sequencing. rRNAs are commonly removed from total RNA via either mRNA selection or rRNA depletion. These methods have...
Full Text Available Identifying sets of metastable conformations is a major research topic in RNA energy landscape analysis, and recently several methods have been proposed for finding local minima in landscapes spawned by RNA secondary structures. An important and time-critical component of such methods is steepest, or gradient, descent in attraction basins of local minima. We analyse the speed-up achievable by randomised descent in attraction basins in the context of large sample sets where the size has an order of magnitude in the region of ~106. While the gain for each individual sample might be marginal, the overall run-time improvement can be significant. Moreover, for the two nongradient methods we analysed for partial energy landscapes induced by ten different RNA sequences, we obtained that the number of observed local minima is on average larger by 7.3% and 3.5%, respectively. The run-time improvement is approximately 16.6% and 6.8% on average over the ten partial energy landscapes. For the large sample size we selected for descent procedures, the coverage of local minima is very high up to energy values of the region where the samples were randomly selected from the partial energy landscapes; that is, the difference to the total set of local minima is mainly due to the upper area of the energy landscapes.
Evers, Maurits; Huttner, Michael; Dueck, Anne; Meister, Gunter; Engelmann, Julia C
MicroRNAs (miRNAs) are short regulatory RNAs derived from longer precursor RNAs. miRNA biogenesis has been studied in animals and plants, recently elucidating more complex aspects, such as non-conserved, species-specific, and heterogeneous miRNA precursor populations. Small RNA sequencing data can help in computationally identifying genomic loci of miRNA precursors. The challenge is to predict a valid miRNA precursor from inhomogeneous read coverage from a complex RNA library: while the mature miRNA typically produces many sequence reads, the remaining part of the precursor is covered very sparsely. As recent results suggest, alternative miRNA biogenesis pathways may lead to a more diverse miRNA precursor population than previously assumed. In plants, the latter manifests itself in e.g. complex secondary structures and expression from multiple loci within precursors. Current miRNA identification algorithms often depend on already existing gene annotation, and/or make use of specific miRNA precursor features such as precursor lengths, secondary structures etc. Consequently and in view of the emerging new understanding of a more complex miRNA biogenesis in plants, current tools may fail to characterise organism-specific and heterogeneous miRNA populations. miRA is a new tool to identify miRNA precursors in plants, allowing for heterogeneous and complex precursor populations. miRA requires small RNA sequencing data and a corresponding reference genome, and evaluates precursor secondary structures and precursor processing accuracy; key parameters can be adapted based on the specific organism under investigation. We show that miRA outperforms the currently best plant miRNA prediction tools both in sensitivity and specificity, for data involving Arabidopsis thaliana and the Volvocine algae Chlamydomonas reinhardtii; the latter organism has been shown to exhibit a heterogeneous and complex precursor population with little cross-species miRNA sequence conservation, and
Bateman, Alex; Agrawal, Shipra; Birney, Ewan; Bruford, Elspeth A.; Bujnicki, Janusz M.; Cochrane, Guy; Cole, James R.; Dinger, Marcel E.; Enright, Anton J.; Gardner, Paul P.; Gautheret, Daniel; Griffiths-Jones, Sam; Harrow, Jen; Herrero, Javier; Holmes, Ian H.; Huang, Hsien-Da; Kelly, Krystyna A.; Kersey, Paul; Kozomara, Ana; Lowe, Todd M.; Marz, Manja; Moxon, Simon; Pruitt, Kim D.; Samuelsson, Tore; Stadler, Peter F.; Vilella, Albert J.; Vogel, Jan-Hinnerk; Williams, Kelly P.; Wright, Mathew W.; Zwieb, Christian
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor. PMID:21940779
Stik, Grégoire; Muylkens, Benoît; Coupeau, Damien; Laurent, Sylvie; Dambrine, Ginette; Messmer, Mélanie; Chane-Woon-Ming, Béatrice; Pfeffer, Sébastien; Rasschaert, Denis
The establishment of the microRNA (miRNA) expression signatures is the basic element to investigate the role played by these regulatory molecules in the biology of an organism. Marek's disease virus 1 (MDV-1) is an avian herpesvirus that naturally infects chicken and induces T cells lymphomas. During latency, MDV-1, like other herpesviruses, expresses a limited subset of transcripts. These include three miRNA clusters. Several studies identified the expression of virus and host encoded miRNAs from MDV-1 infected cell cultures and chickens. But a high discrepancy was observed when miRNA cloning frequencies obtained from different cloning and sequencing protocols were compared. Thus, we analyzed the effect of small RNA library preparation and sequencing on the miRNA frequencies obtained from the same RNA samples collected during MDV-1 infection of chicken at different steps of the oncoviral pathogenesis. Qualitative and quantitative variations were found in the data, depending on the strategy used. One of the mature miRNA derived from the latency-associated-transcript (LAT), mdv1-miR-M7-5p, showed the highest variation. Its cloning frequency was 50% of the viral miRNA counts when a small scale sequencing approach was used. Its frequency was 100 times less abundant when determined through the deep sequencing approach. Northern blot analysis showed a better correlation with the miRNA frequencies found by the small scale sequencing approach. By analyzing the cellular miRNA repertoire, we also found a gap between the two sequencing approaches. Collectively, our study indicates that next-generation sequencing data considered alone are limited for assessing the absolute copy number of transcripts. Thus, the quantification of small RNA should be addressed by compiling data obtained by using different techniques such as microarrays, qRT-PCR and NB analysis in support of high throughput sequencing data. These observations should be considered when miRNA variations are studied
Full Text Available Abstract Background Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses. Results We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA. The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory. Conclusion The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.
Full Text Available Abstract Background Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors. Methods In this paper, we introduce achARNement, a maximum parsimony approach that, given two alignments of homologous ncRNA families with consensus secondary structures and a phylogenetic tree, simultaneously calculates ancestral RNA sequences for these two families. Results We test our methodology on simulated data sets, and show that achARNement outperforms classical maximum parsimony approaches in terms of accuracy, but also reduces by several orders of magnitude the number of candidate sequences. To conclude this study, we apply our algorithms on the Glm clan and the FinP-traJ clan from the Rfam database. Conclusions Our results show that our methods reconstruct small sets of high-quality candidate ancestors with better agreement to the two target structures than with classical approaches. Our program is freely available at: http://csb.cs.mcgill.ca/acharnement .
Vallejos, Catalina A; Risso, Davide; Scialdone, Antonio; Dudoit, Sandrine; Marioni, John C
Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.
Kielpinski, Lukasz Jan
In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental and comp......In this thesis we describe the development of four related methods for RNA structure probing that utilize massive parallel sequencing. Using them, we were able to gather structural data for multiple, long molecules simultaneously. First, we have established an easy to follow experimental...... with known priming sites....
Faridani, Omid R; Abdullayev, Ilgar; Hagemann-Jensen, Michael; Schell, John P; Lanner, Fredrik; Sandberg, Rickard
Little is known about the heterogeneity of small-RNA expression as small-RNA profiling has so far required large numbers of cells. Here we present a single-cell method for small-RNA sequencing and apply it to naive and primed human embryonic stem cells and cancer cells. Analysis of microRNAs and fragments of tRNAs and small nucleolar RNAs (snoRNAs) reveals the potential of microRNAs as markers for different cell types and states.
Huang Yan Zhao
To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.
Full Text Available Abstract Background Myocardial recovery with left ventricular assist device (LVAD therapy is highly variable and difficult to predict. Next generation ribonucleic acid (RNA sequencing is an innovative, rapid, and quantitative approach to gene expression profiling in small amounts of tissue. Our primary goal was to identify baseline transcriptional profiles in non-ischemic cardiomyopathies that predict myocardial recovery in response to LVAD therapy. We also sought to verify transcriptional differences between failing and non-failing human hearts. Methods RNA was isolated from failing (n = 16 and non-failing (n = 8 human hearts. RNA from each patient was reverse transcribed and quantitatively sequenced on the personal genome machine (PGM sequencer (Ion torrent for 95 heart failure candidate genes. Coverage analysis as well as mapping the reads and alignment was done using the Ion Torrent Browser Suite™. Differential expression analyses were conducted by empirical analysis of digital gene expression data in R (edgeR to identify differential expressed genes between failing and non-failing groups, and between responder and non-responder groups respectively. Targeted cardiac gene messenger RNA (mRNA expression was analyzed in proportion to the total number of reads. Gene expression profiles from the PGM sequencer were validated by performing RNA sequencing (RNAseq with the Illumina Hiseq2500 sequencing system. Results The failing sample population was 75% male with an average age of 50 and a left ventricular ejection fraction (LVEF of 16%. Myosin light chain kinase (MYLK and interleukin (IL-6 genes expression were significantly higher in LVAD responders compared to non-responders. Thirty-six cardiac genes were expressed differentially between failing and non-failing hearts (23 decreased, 13 elevated. MYLK, Beta-1 adrenergic receptor (ADRB1 and myosin heavy chain (MYH-6 expression were among those significantly decreased in failing hearts
Ghoshal, Asish; Shankar, Raghavendran; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali
MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar representations, primarily, as thermodynamic or sequence-based features. Further, a majority of these methods solely target canonical sites, which are sites with "seed" complementarity. Here, we present a machine-learning classification scheme, titled Avishkar, which captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves, separately for various input features, such as thermodynamic and sequence features. Further, we use a principled approach to uniformly model canonical and non-canonical seed matches, using a novel seed enrichment metric. We demonstrate that large number of seed-match patterns have high enrichment values, conserved across species, and that majority of miRNA binding sites involve non-canonical matches, corroborating recent findings. Using spatial curves and popular categorical features, such as target site length and location, we train a linear SVM model, utilizing experimental CLIP-seq data. Our model significantly outperforms all established methods, for both canonical and non-canonical sites. We achieve this while using a much larger candidate miRNA-mRNA interaction set than prior work. We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves, specifically about 20% better than the state-of-the-art, for different species (human or mouse), or different target types (canonical or non-canonical). To the best of our knowledge we provide the first distributed framework for microRNA target prediction based on Apache Hadoop and Spark. All source code and data is publicly available at https://bitbucket.org/cellsandmachines/avishkar.
Nureyev F. Rodrigues
Full Text Available Organellar RNA editing involves the modification of nucleotide sequences to maintain conserved protein functions, mainly by reverting non-neutral codon mutations. The loss of plastid editing events, resulting from mutations in RNA editing factors or through stress interference, leads to developmental, physiological and photosynthetic alterations. Recently, next generation sequencing technology has generated the massive discovery of sRNA sequences and expanded the number of sRNA data. Here, we present a method to screen chloroplast RNA editing using public sRNA libraries from Arabidopsis, soybean and rice. We mapped the sRNAs against the nuclear, mitochondrial and plastid genomes to confirm predicted cytosine to uracil (C-to-U editing events and identify new editing sites in plastids. Among the predicted editing sites, 40.57, 34.78, and 25.31% were confirmed using sRNAs from Arabidopsis, soybean and rice, respectively. SNP analysis revealed 58.2, 43.9, and 37.5% new C-to-U changes in the respective species and identified known and new putative adenosine to inosine (A-to-I RNA editing in tRNAs. The present method and data reveal the potential of sRNA as a reliable source to identify new and confirm known editing sites.
Rodrigues, Nureyev F; Christoff, Ana P; da Fonseca, Guilherme C; Kulcheski, Franceli R; Margis, Rogerio
Organellar RNA editing involves the modification of nucleotide sequences to maintain conserved protein functions, mainly by reverting non-neutral codon mutations. The loss of plastid editing events, resulting from mutations in RNA editing factors or through stress interference, leads to developmental, physiological and photosynthetic alterations. Recently, next generation sequencing technology has generated the massive discovery of sRNA sequences and expanded the number of sRNA data. Here, we present a method to screen chloroplast RNA editing using public sRNA libraries from Arabidopsis, soybean and rice. We mapped the sRNAs against the nuclear, mitochondrial and plastid genomes to confirm predicted cytosine to uracil (C-to-U) editing events and identify new editing sites in plastids. Among the predicted editing sites, 40.57, 34.78, and 25.31% were confirmed using sRNAs from Arabidopsis, soybean and rice, respectively. SNP analysis revealed 58.2, 43.9, and 37.5% new C-to-U changes in the respective species and identified known and new putative adenosine to inosine (A-to-I) RNA editing in tRNAs. The present method and data reveal the potential of sRNA as a reliable source to identify new and confirm known editing sites.
Rodrigues, Nureyev F.; Christoff, Ana P.; da Fonseca, Guilherme C.; Kulcheski, Franceli R.; Margis, Rogerio
Organellar RNA editing involves the modification of nucleotide sequences to maintain conserved protein functions, mainly by reverting non-neutral codon mutations. The loss of plastid editing events, resulting from mutations in RNA editing factors or through stress interference, leads to developmental, physiological and photosynthetic alterations. Recently, next generation sequencing technology has generated the massive discovery of sRNA sequences and expanded the number of sRNA data. Here, we present a method to screen chloroplast RNA editing using public sRNA libraries from Arabidopsis, soybean and rice. We mapped the sRNAs against the nuclear, mitochondrial and plastid genomes to confirm predicted cytosine to uracil (C-to-U) editing events and identify new editing sites in plastids. Among the predicted editing sites, 40.57, 34.78, and 25.31% were confirmed using sRNAs from Arabidopsis, soybean and rice, respectively. SNP analysis revealed 58.2, 43.9, and 37.5% new C-to-U changes in the respective species and identified known and new putative adenosine to inosine (A-to-I) RNA editing in tRNAs. The present method and data reveal the potential of sRNA as a reliable source to identify new and confirm known editing sites. PMID:29033962
Guo, Yan; Bosompem, Amma; Mohan, Sanjay; Erdogan, Begum; Ye, Fei; Vickers, Kasey C; Sheng, Quanhu; Zhao, Shilin; Li, Chung-I; Su, Pei-Fang; Jagasia, Madan; Strickland, Stephen A; Griffiths, Elizabeth A; Kim, Annette S
Although advances in sequencing technologies have popularized the use of microRNA (miRNA) sequencing (miRNA-seq) for the quantification of miRNA expression, questions remain concerning the optimal methodologies for analysis and utilization of the data. The construction of a miRNA sequencing library selects RNA by length rather than type. However, as we have previously described, miRNAs represent only a subset of the species obtained by size selection. Consequently, the libraries obtained for miRNA sequencing also contain a variety of additional species of small RNAs. This study looks at the prevalence of these other species obtained from bone marrow aspirate specimens and explores the predictive value of these small RNAs in the determination of response to therapy in myelodysplastic syndromes (MDS). Paired pre and post treatment bone marrow aspirate specimens were obtained from patients with MDS who were treated with either azacytidine or decitabine (24 pre-treatment specimens, 23 post-treatment specimens) with 22 additional non-MDS control specimens. Total RNA was extracted from these specimens and submitted for next generation sequencing after an additional size exclusion step to enrich for small RNAs. The species of small RNAs were enumerated, single nucleotide variants (SNVs) identified, and finally the differential expression of tRNA-derived species (tDRs) in the specimens correlated with diseasestatus and response to therapy. Using miRNA sequencing data generated from bone marrow aspirate samples of patients with known MDS (N = 47) and controls (N = 23), we demonstrated that transfer RNA (tRNA) fragments (specifically tRNA halves, tRHs) are one of the most common species of small RNA isolated from size selection. Using tRNA expression values extracted from miRNA sequencing data, we identified six tRNA fragments that are differentially expressed between MDS and normal samples. Using the elastic net method, we identified four tRNAs-derived small RNAs (t
Full Text Available Satellite RNAs (satRNAs are small noncoding subviral RNA pathogens in plants that depend on helper viruses for replication and spread. Despite many decades of research, the origin of satRNAs remains unknown. In this study we show that a β-glucuronidase (GUS transgene fused with a Cucumber mosaic virus (CMV Y satellite RNA (Y-Sat sequence (35S-GUS:Sat was transcriptionally repressed in N. tabacum in comparison to a 35S-GUS transgene that did not contain the Y-Sat sequence. This repression was not due to DNA methylation at the 35S promoter, but was associated with specific DNA methylation at the Y-Sat sequence. Both northern blot hybridization and small RNA deep sequencing detected 24-nt siRNAs in wild-type Nicotiana plants with sequence homology to Y-Sat, suggesting that the N. tabacum genome contains Y-Sat-like sequences that give rise to 24-nt sRNAs capable of guiding RNA-directed DNA methylation (RdDM to the Y-Sat sequence in the 35S-GUS:Sat transgene. Consistent with this, Southern blot hybridization detected multiple DNA bands in Nicotiana plants that had sequence homology to Y-Sat, suggesting that Y-Sat-like sequences exist in the Nicotiana genome as repetitive DNA, a DNA feature associated with 24-nt sRNAs. Our results point to a host genome origin for CMV satRNAs, and suggest novel approach of using small RNA sequences for finding the origin of other satRNAs.
A quaternion representation of nucleotides is proposed, with representation of RNA sequences by vectors whose elements are quaternions. Structure and transition matrices in quaternion representation are defined. Correspondence between diagrammatic technique in complex-number and quaternion representation of nucleotides is delineated.
Bentin, Thomas; Nielsen, Michael L
A recent paper in Science by Li et al. 2011(1) reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized. ...
Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup
S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...
Kirsty M Danielson
Full Text Available The presence and relative stability of extracellular RNAs (exRNAs in biofluids has led to an emerging recognition of their promise as 'liquid biopsies' for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species. Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms.
Burgos, Kasandra L; Van Keuren-Jensen, Kendall
There are a number of considerations when choosing protocols both upstream and downstream of Next-Generation Sequencing experiments. On the front end, purification methods, additives, and residuum can often inhibit the sensitive chemistries by which sequencing-by-synthesis is performed. On the back end, data handling, analysis software packages, and pipelines can also impact sequencing outcomes. The current chapter will describe stepwise how acellular biofluid samples are prepared for small RNA sequencing. With regard to purification methods, we found that small RNA yield can be improved considerably by following the total RNA isolation protocol included with Ambion's mirVana PARIS Kit but modifying the organic extraction step. Specifically, after transferring the upper aqueous phase to a fresh tube, water is added to the residual material (interphase and lower organic layer) and again phase-separated. In contrast, all the protocols provided with the commercially available kits at the time of this chapter publication require only one organic extraction. This simple yet, as it turns out, quite useful modification allows access to previously inaccessible material. Potential benefits from these changes are a more comprehensive sample profiling of small RNA, as well as wider access to small volume samples, such as is typically available for acellular biofluids, which now can be prepared for small RNA sequencing on the Illumina platform.
Myler Peter J
Full Text Available Abstract Background The protozoan pathogens Leishmania major, Trypanosoma brucei and Trypanosoma cruzi (the Tritryps are parasites that produce devastating human diseases. These organisms show very unusual mechanisms of gene expression, such as polycistronic transcription. We are interested in the study of tRNA genes, which are transcribed by RNA polymerase III (Pol III. To analyze the sequences and genomic organization of tRNA genes and other Pol III-transcribed genes, we have performed an in silico analysis of the Tritryps genome sequences. Results Our analysis indicated the presence of 83, 66 and 120 genes in L. major, T. brucei and T. cruzi, respectively. These numbers include several previously unannotated selenocysteine (Sec tRNA genes. Most tRNA genes are organized into clusters of 2 to 10 genes that may contain other Pol III-transcribed genes. The distribution of genes in the L. major genome does not seem to be totally random, like in most organisms. While the majority of the tRNA clusters do not show synteny (conservation of gene order between the Tritryps, a cluster of 13 Pol III genes that is highly syntenic was identified. We have determined consensus sequences for the putative promoter regions (Boxes A and B of the Tritryps tRNA genes, and specific changes were found in tRNA-Sec genes. Analysis of transcription termination signals of the tRNAs (clusters of Ts showed differences between T. cruzi and the other two species. We have also identified several tRNA isodecoder genes (having the same anticodon, but different sequences elsewhere in the tRNA body in the Tritryps. Conclusion A low number of tRNA genes is present in Tritryps. The overall weak synteny that they show indicates a reduced importance of genome location of Pol III genes compared to protein-coding genes. The fact that some of the differences between isodecoder genes occur in the internal promoter elements suggests that differential control of the expression of some
Full Text Available BACKGROUND: RNA interference (RNAi, mediated by small interfering RNA (siRNA, is an effective method used to silence gene expression at the post-transcriptional level. Upon introduction into target cells, siRNAs incorporate into the RNA-induced silencing complex (RISC. The antisense strand of the siRNA duplex then "guides" the RISC to the homologous mRNA, leading to target degradation and gene silencing. In recent years, various vector-based siRNA expression systems have been developed which utilize opposing polymerase III promoters to independently drive expression of the sense and antisense strands of the siRNA duplex from the same template. PRINCIPAL FINDINGS: We show here the use of a ligase chain reaction (LCR to develop a new vector system called pInv-H1 in which a DNA sequence encoding a specific siRNA is placed between two inverted minimal human H1 promoters (approximately 100 bp each. Expression of functional siRNAs from this construct has led to efficient silencing of both reporter and endogenous genes. Furthermore, the inverted H1 promoter-siRNA expression cassette was used to generate a retrovirus vector capable of transducing and silencing expression of the targeted protein by>80% in target cells. CONCLUSIONS: The unique design of this construct allows for the efficient exchange of siRNA sequences by the directional cloning of short oligonucleotides via asymmetric restriction sites. This provides a convenient way to test the functionality of different siRNA sequences. Delivery of the siRNA cassette by retroviral transduction suggests that a single copy of the siRNA expression cassette efficiently knocks down gene expression at the protein level. We note that this vector system can potentially be used to generate a random siRNA library. The flexibility of the ligase chain reaction suggests that additional control elements can easily be introduced into this siRNA expression cassette.
Vitsios, Dimitrios M; Enright, Anton J
Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. firstname.lastname@example.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Full Text Available Next-generation sequencing now for the first time allows researchers to gauge the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. miRNAs are 22 nucleotide small RNAs (sRNAs that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq, which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field.
Wang, R F; Cao, W W; Slavik, M F
In this study, modification of two methods of RNA sequencing resulted in more definitive sequencing bands. In one method of sequencing, the bands of lane A and lane G sometimes were not clear. Modifications of this method by changing the concentrations of ddATP and ddGTP resulted in the bands of lane A and lane G becoming more readable. Although a second sequencing method was found to have clearer bands than the first method, and the bases immediately downstream from the primer binding site could be read by using r-32P-labeled primer, the bands on the top of lane A still were not clear. Modifications of this second method by changing the ddATP/dATP ratio resulted in the bands of lane A becoming much clearer.
Gómez Lozano, María; Marvig, Rasmus Lykke; Molin, Søren
sequencing (RNA-seq) is described that involves the preparation and analysis of three different sequencing libraries. As a signifi cant number of unique sRNAs are identifi ed in each library, the libraries can be used either alone or in combination to increase the number of sRNAs identifi ed. The approach......Small regulatory RNAs (sRNAs) in bacteria are known to modulate gene expression and control a variety of processes including metabolic reactions, stress responses, and pathogenesis in response to environmental signals. A method to identify bacterial sRNAs on a genome-wide scale based on RNA...... may be applied to identify sRNAs in any bacterium under different growth and stress conditions....
Full Text Available Neonatal dried blood spots (DBS are routinely collected on standard Guthrie cards for all-comprising national newborn screening programs for inborn errors of metabolism, hypothyroidism and other diseases. In Denmark, the Guthrie cards are stored at −20 °C in the Danish Neonatal Screening Biobank and each sample is linked to elaborate social and medical registries. This provides a unique biospecimen repository to enable large population research at a perinatal level. Here, we demonstrate the feasibility to obtain gene expression data from DBS using next-generation RNA sequencing (RNA-seq. RNA-seq was performed on five males and five females. Sequencing results have an average of >30 million reads per sample. 26,799 annotated features can be identified with 64% features detectable without fragments per kilobase of transcript per million mapped reads (FPKM cutoff; number of detectable features dropped to 18% when FPKM ≥ 1. Sex can be discriminated using blood-based sex-specific gene set identified by the Genotype-Tissue Expression consortium. Here, we demonstrate the feasibility to acquire biologically-relevant gene expression from DBS using RNA-seq which provide a new avenue to investigate perinatal diseases in a high throughput manner.
Bartonek, Lukas; Zagrovic, Bojan
It has recently been demonstrated that the nucleobase-density profiles of mRNA coding sequences are related in a complementary manner to the nucleobase-affinity profiles of their cognate protein sequences. Based on this, it has been proposed that cognate mRNA/protein pairs may bind in a co-aligned manner, especially if unstructured. Here, we study the dependence of mRNA/protein sequence complementarity on the properties of the nucleobase/amino-acid affinity scales used. Specifically, we sample the space of randomly generated scales by employing a Monte Carlo strategy with a fitness function that depends directly on the level of complementarity. For model organisms representing all three domains of life, we show that even short searches reproducibly converge upon highly optimized scales, implying that the topology of the underlying fitness landscape is decidedly funnel-like. Furthermore, the optimized scales, generated without any consideration of the physicochemical attributes of nucleobases or amino acids, resemble closely the nucleobase/amino-acid binding affinity scales obtained from experimental structures of RNA-protein complexes. This provides support for the claim that mRNA/protein sequence complementarity may indeed be related to binding between the two. Finally, we characterize suboptimal scales and show that intermediate-to-high complementarity can be reached by substantially diverse scales, but with select amino acids contributing disproportionally. Our results expose the dependence of cognate mRNA/protein sequence complementarity on the properties of the underlying nucleobase/amino-acid affinity scales and provide quantitative constraints that any physical scales need to satisfy for the complementarity to hold.
restriction fragment length polymorphism; RAPD, random amplified polymorphic DNA; An. step, Anopheles stephensi; An. quad,. Anopheles ... interesting feature of the sequences was a stretch of Ts that distinguished between Aedes and Culex on the one hand, and ... genome structure and complexity of mosquito species.
Thomas Birkballe Hansen
Full Text Available BACKGROUND: miRNAs are key players in gene expression regulation. To fully understand the complex nature of cellular differentiation or initiation and progression of disease, it is important to assess the expression patterns of as many miRNAs as possible. Thereby, identifying novel miRNAs is an essential prerequisite to make possible a comprehensive and coherent understanding of cellular biology. METHODOLOGY/PRINCIPAL FINDINGS: Based on two extensive, but previously published, small RNA sequence datasets from human embryonic stem cells and human embroid bodies, respectively , we identified 112 novel miRNA-like structures and were able to validate miRNA processing in 12 out of 17 investigated cases. Several miRNA candidates were furthermore substantiated by including additional available small RNA datasets, thereby demonstrating the power of combining datasets to identify miRNAs that otherwise may be assigned as experimental noise. CONCLUSIONS/SIGNIFICANCE: Our analysis highlights that existing datasets are not yet exhaustedly studied and continuous re-analysis of the available data is important to uncover all features of small RNA sequencing.
Lama, Lodoe; Ryan, Kevin
Many high-throughput small RNA next-generation sequencing protocols use 5' preadenylylated DNA oligonucleotide adapters during cDNA library preparation. Preadenylylation of the DNA adapter's 5' end frees from ATP-dependence the ligation of the adapter to RNA collections, thereby avoiding ATP-dependent side reactions. However, preadenylylation of the DNA adapters can be costly and difficult. The currently available method for chemical adenylylation of DNA adapters is inefficient and uses techniques not typically practiced in laboratories profiling cellular RNA expression. An alternative enzymatic method using a commercial RNA ligase was recently introduced, but this enzyme works best as a stoichiometric adenylylating reagent rather than a catalyst and can therefore prove costly when several variant adapters are needed or during scale-up or high-throughput adenylylation procedures. Here, we describe a simple, scalable, and highly efficient method for the 5' adenylylation of DNA oligonucleotides using the thermostable RNA ligase 1 from bacteriophage TS2126. Adapters with 3' blocking groups are adenylylated at >95% yield at catalytic enzyme-to-adapter ratios and need not be gel purified before ligation to RNA acceptors. Experimental conditions are also reported that enable DNA adapters with free 3' ends to be 5' adenylylated at >90% efficiency. © 2015 Lama and Ryan; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Fan, Cuiqing; Xiong, Yuan; Zhu, Ning; Lu, Yabin; Zhang, Jiewen; Wang, Song; Liang, Zicai; Shen, Yan; Chen, Meihong
Cancers are characterized by poor differentiation. Differentiation therapy is a strategy to alleviate malignant phenotypes by inducing cancer cell differentiation. Here we carried out a combinatorial high-throughput screen with a random siRNA library on human erythroleukemia K-562 cell differentiation. Two siRNAs screened from the library were validated to be able to induce erythroid differentiation to varying degrees, determined by CD235 and globin up-regulation, GATA-2 down-regulation, and cell growth inhibition. The screen we performed here is the first trial of screening cancer differentiation-inducing agents from a random siRNA library, demonstrating that a random siRNA library can be considered as a new resource in efforts to seek new therapeutic agents for cancers. As a random siRNA library has a broad coverage for the entire genome, including known/unknown genes and protein coding/non-coding sequences, screening using a random siRNA library can be expected to greatly augment the repertoire of therapeutic siRNAs for cancers.
Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.
We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified......, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed, Example solutions, and comparisons with other...
Andersen, Jørgen E; Chekhov, Leonid O.; Penner, Robert C
In the present article, we review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the Hermitian matrix model with potential V(x)=x(2)/2 - stx/(1 - tx), where s and t are respective generating parameters for the number of RNA...... molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological recursion, which is a powerful new formalism arising from random matrix theory. These numbers of RNA complexes also have profound meaning in mathematics: they provide...
Guo, Li; Zhao, Yang; Zhang, Hui; Yang, Sheng; Chen, Feng
MicroRNAs (miRNAs) are crucial negative regulators of gene expression at the post-transcriptional level. Next-generation sequencing technologies have identified a series of miRNA variants (named isomiRs). In this study, paralogous isomiR assemblies (from the miRNA locus) were systematically analyzed based on data acquired from deep sequencing data sets. Evolutionary analysis of paralogous (members in miRNA gene family in a specific species) and orthologues (across different animal species) miRNAs was also performed. The sequence diversity of paralogous isomiRs was found to be similar to the diversity of paralogous and orthologues miRNAs. Additionally, both isomiRs and paralogous/orthologues miRNAs were implicated in 5' and 3' ends (especially 3' ends), nucleotide substitutions, and insertions and deletions. Generally, multiple isomiRs can be produced from a single miRNA locus, but most of them had lower enrichment levels, and only several dominant isomiR sequences were detected. These dominant isomiR groups were always stable, and one of them would be selected as the most abundant miRNA sequence in specific animal species. Some isomiRs might be consistent to miRNA sequences in some species but not the other. Homologous miRNAs were often detected in similar isomiR repertoires, and showed similar expression patterns, while dominant isomiRs showed complex evolutionary patterns from miRNA sequences across the animal kingdom. These results indicate that the phenomenon of multiple isomiRs is not a random event, but rather the result of evolutionary pressures. The existence of multiple isomiRs enables different species to express advantageous sequences in different environments. Thus, dominant sequences emerge in response to functional and evolutionary pressures, allowing an organism to adapt to complex intra- and extra-cellular events. © 2013.
Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos
MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.
Khodakova, Anastasia S.; Smith, Renee J.; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations. PMID:25111003
Anastasia S Khodakova
Full Text Available Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA and single arbitrarily primed DNA amplification (AP-PCR based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification and SEED Subsystems (metabolic classification databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER; similarity profile analysis (SIMPROF; non-metric multidimensional scaling (NMDS; and canonical analysis of principal coordinates (CAP at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Rebecca M. Davidson
Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.
Mazzoni, Gianluca; Kogelman, Lisette; Suravajhala, Prashanth
Next generation sequencing technologies have enabled the generation of huge quantities of biological data, and nowadays extensive datasets at different ‘omics levels have been generated. Systems genetics is a powerful approach that allows to integrate different ‘omics level and understand...... non-coding RNAs (ncRNAs). The integration of transcriptomics data with genomic data in a systems genetics context represents a valuable possibility to go deep into the causal and regulatory mechanisms that generate complex traits and diseases. However RNA-Seq data have to be treated carefully...... and the choice of the right methodology could have a great impact on the final results. Furthermore the integration of different level is not trivial. Here we give a comprehensive systems genetics overview of the methods and tools for analysis and the integration of RNA-Seq data including ncRNAs. We focused...
Van Goethem, Alan; Mestdagh, Pieter; Van Maerken, Tom; Vandesompele, Jo
miRNAs are small noncoding RNA molecules that function as regulators of gene expression. Deregulated miRNA expression has been reported in various diseases including cancer. Due to their small size and high degree of homology, accurate quantification of miRNA expression is technically challenging. In this chapter, we present two different technologies for miRNA quantification: small RNA sequencing and RT-qPCR.
Full Text Available Colorectal cancer (CRC is one of the leading causes of cancer related deaths and the search for prognostic biomarkers that might improve treatment decisions is warranted. MicroRNAs (miRNAs are short non-coding RNA molecules involved in regulating gene expression and have been proposed as possible biomarkers in CRC. In order to characterize the miRNA transcriptome, a large cohort including 88 CRC tumors with long-term follow-up was deep sequenced. 523 mature miRNAs were expressed in our cohort, and they exhibited largely uniform expression patterns across tumor samples. Few associations were found between clinical parameters and miRNA expression, among them, low expression of miR-592 and high expression of miR-10b-5p and miR-615-3p were associated with tumors located in the right colon relative to the left colon and rectum. High expression of miR-615-3p was also associated with poorly differentiated tumors. No prognostic biomarker candidates for overall and metastasis-free survival were identified by applying the LASSO method in a Cox proportional hazards model or univariate Cox. Examination of the five most abundantly expressed miRNAs in the cohort (miR-10a-5p, miR-21-5p, miR-22-3p, miR-143-3p and miR-192-5p revealed that their collective expression represented 54% of the detected miRNA sequences. Pathway analysis of the target genes regulated by the five most highly expressed miRNAs uncovered a significant number of genes involved in the CRC pathway, including APC, TGFβ and PI3K, thus suggesting that these miRNAs are relevant in CRC.
O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.
Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535
Bilichak, Andriy; Golubov, Andrey; Kovalchuk, Igor
The discovery of small RNAs in plants and animals almost two decades ago attracted a significant interest towards epigenetic regulation of gene expression and the practical implementation of the gained knowledge in applied studies. New and sometimes unexpected functions have been ascribed to sRNAs almost every couple of years since their discovery, hence indicating that the complete role of sRNAs in plant and animal physiology is still barely understood. Next-generation sequencing technologies allow to generate high-resolution profiles of sRNAs for the consequent analysis and possibly to discover novel functions of sRNAs. In this chapter, we provide brief guidelines for sRNA library preparation in plants and a practical approach that can be implemented to overcome possible difficulties with sequencing library generation.
Xie, Fuliang; Jones, Don C; Wang, Qinglian; Sun, Runrun; Zhang, Baohong
MicroRNAs (miRNAs) have been found to be differentially expressed during cotton fibre development. However, which specific miRNAs and how they are involved in fibre development is unclear. Here, using deep sequencing, 65 conserved miRNA families were identified and 32 families were differentially expressed between leaf and ovule. At least 40 miRNAs were either leaf or ovule specific, whereas 62 miRNAs were shared in both leaf and ovule. qRT-PCR confirmed these miRNAs were differentially expressed during fibre early development. A total of 820 genes were potentially targeted by the identified miRNAs, whose functions are involved in a series of biological processes including fibre development, metabolism and signal transduction. Many predicted miRNA-target pairs were subsequently validated by degradome sequencing analysis. GO and KEGG analyses showed that the identified miRNAs and their targets were classified to 1027 GO terms including 568 biological processes, 324 molecular functions and 135 cellular components and were enriched to 78 KEGG pathways. At least seven unique miRNAs participate in trichome regulatory interaction network. Eleven trans-acting siRNA (tasiRNA) candidate genes were also identified in cotton. One has never been found in other plant species and two of them were derived from MYB and ARF, both of which play important roles in cotton fibre development. Sixteen genes were predicted to be tasiRNA targets, including sucrose synthase and MYB2. Together, this study discovered new miRNAs in cotton and offered evidences that miRNAs play important roles in cotton ovule/fibre development. The identification of tasiRNA genes and their targets broadens our understanding of the complicated regulatory mechanism of miRNAs in cotton. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.
Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia
McCormick, Kevin P; Willmann, Matthew R; Meyers, Blake C
Prior to the advent of new, deep sequencing methods, small RNA (sRNA) discovery was dependent on Sanger sequencing, which was time-consuming and limited knowledge to only the most abundant sRNA. The innovation of large-scale, next-generation sequencing has exponentially increased knowledge of the biology, diversity and abundance of sRNA populations. In this review, we discuss issues involved in the design of sRNA sequencing experiments, including choosing a sequencing platform, inherent biase...
Qi, Xiaopeng; Bao, Forrest Sheng; Xie, Zhixin
RNA silencing functions as an important antiviral defense mechanism in a broad range of eukaryotes. In plants, biogenesis of several classes of endogenous small interfering RNAs (siRNAs) requires RNA-dependent RNA Polymerase (RDR) activities. Members of the RDR family proteins, including RDR1and RDR6, have also been implicated in antiviral defense, although a direct role for RDRs in viral siRNA biogenesis has yet to be demonstrated. Using a crucifer-infecting strain of Tobacco Mosaic Virus (T...
Zhang, Xiaoli; Liu, Shiyong
Detection of RNA-binding proteins (RBPs) is essential since the RNA-binding proteins play critical roles in post-transcriptional regulation and have diverse roles in various biological processes. Moreover, identifying RBPs by computational prediction is much more efficient than experimental methods and may have guiding significance on the experiment design. In this study, we present the RBPPred (an RNA-binding protein predictor), a new method based on the support vector machine, to predict whether a protein binds RNAs, based on a comprehensive feature representation. By integrating the physicochemical properties with the evolutionary information of protein sequences, the new approach RBPPred performed much better than state-of-the-art methods. The results show that RBPPred correctly predicted 83% of 2780 RBPs and 96% out of 7093 non-RBPs with MCC of 0.808 using the 10-fold cross validation. Furthermore, we achieved a sensitivity of 84%, specificity of 97% and MCC of 0.788 on the testing set of human proteome. In addition we tested the capability of RBPPred to identify new RBPs, which further confirmed the practicability and predictability of the method. RBPPred program can be accessed at: http://rnabinding.com/RBPPred.html . email@example.com. Supplementary data are available at Bioinformatics online.
Klerk, Eleonora de
The work described in this thesis focuses on the mechanisms that give rise to alternative mRNAs and their alternative translation into proteins. Each of the described studies has been based on a specific set of high-throughput RNA sequencing technologies. An overview of the available RNA sequencing
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: firstname.lastname@example.org.
Szymanski, Maciej; Karlowski, Wojciech M
In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
Filyukov, Alexander A.
A new strategy to recognize patterns in the DNA sequences with functional significance is proposed. The strategy is based on the general definition of any individual organism as a Gibbsian ensemble of identical personal DNA molecules. This approach provides application of the methods of statistical thermodynamics of irreversible steady processes to genome informatics. The random processes theory and its Markov chains approximation lead in this approach directly to the definition of the generalized concept of evolution entropy and to the genuine measure of text information content in the sequences. Computer-assisted proofs of the existence of the nonequilibrium steady state conditions in genome molecule were obtained by investigation of the special type balance relations in the vesicular stomatitis virus (VSV) RNA sequence. The main maxima of the text information content were decoded and denominated. The established coding principles are connected with deviations from equilibrium conditions and from equipartition.
den Dunnen Johan T; van Ommen Gertjan; Ariyurek Yavuz; Buermans Henk PJ; 't Hoen Peter AC
Abstract Background MicroRNAs are small non-coding RNA transcripts that regulate post-transcriptional gene expression. The millions of short sequence reads generated by next generation sequencing technologies make this technique explicitly suitable for profiling of known and novel microRNAs. A modification to the small-RNA expression kit (SREK, Ambion) library preparation method for the SOLiD sequencing platform is described to generate microRNA sequencing libraries that are compatible with t...
Xu, Yong; Li, Wuxian; Liu, Xueyan; Ma, Hualin; Tu, Zhiguang; Dai, Yong
Down syndrome (DS) is caused by trisomy of human chromosome 21 (Hsa21) and is associated with numerous deleterious phenotypes, including cognitive impairment, childhood leukemia and immune defects. Five Hsa21‑derived microRNAs (i.e., hsa-miR-99a, let-7c, miR-125b-2, miR-155 and miR-802) are involved in variable phenotypes of DS. However, the changes involved in the genome-wide microRNA expression of DS fetuses under the influence of trisomy 21 have yet to be determined. To investigate the expression characteristic of microRNAs during the development of DS fetuses and identify whether another microRNA gene resides in the Hsa21, Illumina high-throughput sequencing technology was employed to comprehensively characterize the microRNA expression profiles of the DS and normal fetal cord blood mononuclear cells (CBMCs). In total, 149 of 395 identified microRNAs were significantly differentially expressed (fold change >2.0 and Pgenome-wide microRNA expression profiles in the DS fetus. Differentially expressed microRNAs may be involved in hemopoietic abnormalities and the immune defects of DS fetuses and newborns.
van de Wiel, M.A.; Leday, G.G.R.; Pardo, L.M.; Rue, H; van der Vaart, A.W.; van Wieringen, W.N.
Next generation sequencing is quickly replacing microarrays as a technique to probe different molecular levels of the cell, such as DNA or RNA. The technology provides higher resolution, while reducing bias. RNA sequencing results in counts of RNA strands. This type of data imposes new statistical
Reiman, Mario; Laan, Maris; Rull, Kristiina; Sõber, Siim
RNA degradation is a ubiquitous process that occurs in living and dead cells, as well as during handling and storage of extracted RNA. Reduced RNA quality caused by degradation is an established source of uncertainty for all RNA-based gene expression quantification techniques. RNA sequencing is an increasingly preferred method for transcriptome analyses, and dependence of its results on input RNA integrity is of significant practical importance. This study aimed to characterize the effects of varying input RNA integrity [estimated as RNA integrity number (RIN)] on transcript level estimates and delineate the characteristic differences between transcripts that differ in degradation rate. The study used ribodepleted total RNA sequencing data from a real-life clinically collected set (n = 32) of human solid tissue (placenta) samples. RIN-dependent alterations in gene expression profiles were quantified by using DESeq2 software. Our results indicate that small differences in RNA integrity affect gene expression quantification by introducing a moderate and pervasive bias in expression level estimates that significantly affected 8.1% of studied genes. The rapidly degrading transcript pool was enriched in pseudogenes, short noncoding RNAs, and transcripts with extended 3' untranslated regions. Typical slowly degrading transcripts (median length, 2389 nt) represented protein coding genes with 4-10 exons and high guanine-cytosine content.-Reiman, M., Laan, M., Rull, K., Sõber, S. Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples. © FASEB.
de Klerk, Eleonora; 't Hoen, Peter A C
The human transcriptome comprises >80,000 protein-coding transcripts and the estimated number of proteins synthesized from these transcripts is in the range of 250,000 to 1 million. These transcripts and proteins are encoded by less than 20,000 genes, suggesting extensive regulation at the transcriptional, post-transcriptional, and translational level. Here we review how RNA sequencing (RNA-seq) technologies have increased our understanding of the mechanisms that give rise to alternative transcripts and their alternative translation. We highlight four different regulatory processes: alternative transcription initiation, alternative splicing, alternative polyadenylation, and alternative translation initiation. We discuss their transcriptome-wide distribution, their impact on protein expression, their biological relevance, and the possible molecular mechanisms leading to their alternative regulation. We conclude with a discussion of the coordination and the interdependence of these four regulatory layers. Copyright © 2015 Elsevier Ltd. All rights reserved.
Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav
Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615
Hafner, Markus; Renwick, Neil; Brown, Miguel; Mihailović, Aleksandra; Holoch, Daniel; Lin, Carolina; Pena, John T.G.; Nusbaum, Jeffrey D.; Morozov, Pavel; Ludwig, Janos; Ojo, Tolulope; Luo, Shujun; Schroth, Gary; Tuschl, Thomas
Sequencing of small RNA cDNA libraries is an important tool for the discovery of new RNAs and the analysis of their mutational status as well as expression changes across samples. It requires multiple enzyme-catalyzed steps, including sequential oligonucleotide adapter ligations to the 3′ and 5′ ends of the small RNAs, reverse transcription (RT), and PCR. We assessed biases in representation of miRNAs relative to their input concentration, using a pool of 770 synthetic miRNAs and 45 calibrator oligoribonucleotides, and tested the influence of Rnl1 and two variants of Rnl2, Rnl2(1–249) and Rnl2(1–249)K227Q, for 3′-adapter ligation. The use of the Rnl2 variants for adapter ligations yielded substantially fewer side products compared with Rnl1; however, the benefits of using Rnl2 remained largely obscured by additional biases in the 5′-adapter ligation step; RT and PCR steps did not have a significant impact on read frequencies. Intramolecular secondary structures of miRNA and/or miRNA/3′-adapter products contributed to these biases, which were highly reproducible under defined experimental conditions. We used the synthetic miRNA cocktail to derive correction factors for approximation of the absolute levels of individual miRNAs in biological samples. Finally, we evaluated the influence of 5′-terminal 5-nt barcode extensions for a set of 20 barcoded 3′ adapters and observed similar biases in miRNA read distribution, thereby enabling cost-saving multiplex analysis for large-scale miRNA profiling. PMID:21775473
Collins, John E; Wali, Neha; Sealy, Ian M; Morris, James A; White, Richard J; Leonard, Steven R; Jackson, David K; Jones, Matthew C; Smerdon, Nathalie C; Zamora, Jorge; Dooley, Christopher M; Carruthers, Samantha N; Barrett, Jeffrey C; Stemple, Derek L; Busch-Nentwich, Elisabeth M
We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements. Multiplex sequencing of transcript 3' ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts. This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.
Álvarez-Martos, Isabel; Ferapontova, Elena
of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained...... by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus...
Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.
Nathan D. Olson
Full Text Available This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1 identity of biologically conserved position, (2 ratio of 16S rRNA gene copies featuring identified variants, and (3 the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.
French, Andrew S
Deep sequencing technology provides efficient and economical production of large numbers of randomly positioned, relatively short, estimates of base identities in DNA molecules. Application of this technology to mRNA samples allows rapid examination of the molecular genetic environment in individual cells or tissues, the transcriptome. However, assembly of such short sequences into complete mRNA creates a challenge that limits the usefulness of the technology, particularly when no, or limited, genomic data is available. Several approaches to this problem have been developed, but there is still no general method to rapidly obtain an mRNA sequence from deep sequence data when a specific molecule, or family of molecules, are of interest. A frequent requirement is to identify specific mRNA molecules from tissues that are being investigated by methods such as electrophysiology, immunocytology and pharmacology. To be widely useful, any approach must be relatively simple to use in the laboratory by operators without extensive statistical or bioinformatics knowledge, and with readily available hardware. An approach was developed that allows de novo assembly of individual mRNA sequences in two linked stages: sequence discovery and sequence completion. Both stages rely on computer assisted, Graphical User Interface (GUI)-guided, user interaction with the data, but proceed relatively efficiently once discovery is complete. The method grows a discovered sequence by repeated passes through the complete raw data in a series of steps, and is hence termed 'transcriptome walking'. All of the operations required for transcriptome analysis are combined in one program that presents a relatively simple user interface and runs on a standard desktop, or laptop computer, but takes advantage of multi-core processors, when available. Complete mRNA sequence identifications usually require less than 24 hours. This approach has already identified previously unknown mRNA sequences in two animal
Wu, Xiaogang; Kim, Taek-Kyun; Baxter, David; Scherler, Kelsey; Gordon, Aaron; Fong, Olivia; Etheridge, Alton; Galas, David J; Wang, Kai
Although many tools have been developed to analyze small RNA sequencing (sRNA-Seq) data, it remains challenging to accurately analyze the small RNA population, mainly due to multiple sequence ID assignment caused by short read length. Additional issues in small RNA analysis include low consistency of microRNA (miRNA) measurement results across different platforms, miRNA mapping associated with miRNA sequence variation (isomiR) and RNA editing, and the origin of those unmapped reads after screening against all endogenous reference sequence databases. To address these issues, we built a comprehensive and customizable sRNA-Seq data analysis pipeline-sRNAnalyzer, which enables: (i) comprehensive miRNA profiling strategies to better handle isomiRs and summarization based on each nucleotide position to detect potential SNPs in miRNAs, (ii) different sequence mapping result assignment approaches to simulate results from microarray/qRT-PCR platforms and a local probabilistic model to assign mapping results to the most-likely IDs, (iii) comprehensive ribosomal RNA filtering for accurate mapping of exogenous RNAs and summarization based on taxonomy annotation. We evaluated our pipeline on both artificial samples (including synthetic miRNA and Escherichia coli cultures) and biological samples (human tissue and plasma). sRNAnalyzer is implemented in Perl and available at: http://srnanalyzer.systemsbiology.net/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Elizabeth M Batty
Full Text Available To date, very large scale sequencing of many clinically important RNA viruses has been complicated by their high population molecular variation, which creates challenges for polymerase chain reaction and sequencing primer design. Many RNA viruses are also difficult or currently not possible to culture, severely limiting the amount and purity of available starting material. Here, we describe a simple, novel, high-throughput approach to Norovirus and Hepatitis C virus whole genome sequence determination based on RNA shotgun sequencing (also known as RNA-Seq. We demonstrate the effectiveness of this method by sequencing three Norovirus samples from faeces and two Hepatitis C virus samples from blood, on an Illumina MiSeq benchtop sequencer. More than 97% of reference genomes were recovered. Compared with Sanger sequencing, our method had no nucleotide differences in 14,019 nucleotides (nt for Noroviruses (from a total of 2 Norovirus genomes obtained with Sanger sequencing, and 8 variants in 9,542 nt for Hepatitis C virus (1 variant per 1,193 nt. The three Norovirus samples had 2, 3, and 2 distinct positions called as heterozygous, while the two Hepatitis C virus samples had 117 and 131 positions called as heterozygous. To confirm that our sample and library preparation could be scaled to true high-throughput, we prepared and sequenced an additional 77 Norovirus samples in a single batch on an Illumina HiSeq 2000 sequencer, recovering >90% of the reference genome in all but one sample. No discrepancies were observed across 118,757 nt compared between Sanger and our custom RNA-Seq method in 16 samples. By generating viral genomic sequences that are not biased by primer-specific amplification or enrichment, this method offers the prospect of large-scale, affordable studies of RNA viruses which could be adapted to routine diagnostic laboratory workflows in the near future, with the potential to directly characterize within-host viral diversity.
Meyer, Fernando; Kurtz, Stefan; Beckstette, Michael
.... However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence data...
Melnik, S S
The goal of this paper is to develop an estimate for the entropy of random long-range correlated symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain. Supposing that the correlations between random elements of the chain are weak we express the differential entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the differential entropy of finite symbolic sequences. We show that the entropy contains two contributions, the correlation and fluctuation ones. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short- and weak long-range correlations.
Figurska, Małgorzata; Stańczyk, Maciej; Kulesza, Kamil
It is widely believed, that randomness exists in Nature. In fact such an assumption underlies many scientific theories and is embedded in the foundations of quantum mechanics. Assuming that this hypothesis is valid one can use natural phenomena, like radioactive decay, to generate random numbers. Today, computers are capable of generating the so-called pseudorandom numbers. Such series of numbers are only seemingly random (bias in the randomness quality can be observed). Question whether people can produce random numbers, has been investigated by many scientists in the recent years. The paper "Humans can consciously generate random numbers sequences..." published recently in Medical Hypotheses made claims that were in many ways contrary to state of art; it also stated far-reaching hypotheses. So, we decided to repeat the experiments reported, with special care being taken of proper laboratory procedures. Here, we present the results and discuss possible implications in computer and other sciences.
Piechotta, Michael; Wyler, Emanuel; Ohler, Uwe; Landthaler, Markus; Dieterich, Christoph
RNA editing is a co-transcriptional modification that increases the molecular diversity, alters secondary structure and protein coding sequences by changing the sequence of transcripts. The most common RNA editing modification is the single base substitution (A→I) that is catalyzed by the members of the Adenosine deaminases that act on RNA (ADAR) family. Typically, editing sites are identified as RNA-DNA-differences (RDDs) in a comparison of genome and transcriptome data from next-generation sequencing experiments. However, a method for robust detection of site-specific editing events from replicate RNA-seq data has not been published so far. Even more surprising, condition-specific editing events, which would show up as differences in RNA-RNA comparisons (RRDs) and depend on particular cellular states, are rarely discussed in the literature. We present JACUSA, a versatile one-stop solution to detect single nucleotide variant positions from comparing RNA-DNA and/or RNA-RNA sequencing samples. The performance of JACUSA has been carefully evaluated and compared to other variant callers in an in silico benchmark. JACUSA outperforms other algorithms in terms of the F measure, which combines precision and recall, in all benchmark scenarios. This performance margin is highest for the RNA-RNA comparison scenario. We further validated JACUSA's performance by testing its ability to detect A→I events using sequencing data from a human cell culture experiment and publicly available RNA-seq data from Drosophila melanogaster heads. To this end, we performed whole genome and RNA sequencing of HEK-293 cells on samples with lowered activity of candidate RNA editing enzymes. JACUSA has a higher recall and comparable precision for detecting true editing sites in RDD comparisons of HEK-293 data. Intriguingly, JACUSA captures most A→I events from RRD comparisons of RNA sequencing data derived from Drosophila and HEK-293 data sets. Our software JACUSA detects single nucleotide
Fernandes, Andrew D; Reid, Jennifer Ns; Macklaim, Jean M; McMurrough, Thomas A; Edgell, David R; Gloor, Gregory B
Experimental designs that take advantage of high-throughput sequencing to generate datasets include RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), sequencing of 16S rRNA gene fragments, metagenomic analysis and selective growth experiments. In each case the underlying data are similar and are composed of counts of sequencing reads mapped to a large number of features in each sample. Despite this underlying similarity, the data analysis methods used for these experimental designs are all different, and do not translate across experiments. Alternative methods have been developed in the physical and geological sciences that treat similar data as compositions. Compositional data analysis methods transform the data to relative abundances with the result that the analyses are more robust and reproducible. Data from an in vitro selective growth experiment, an RNA-seq experiment and the Human Microbiome Project 16S rRNA gene abundance dataset were examined by ALDEx2, a compositional data analysis tool that uses Bayesian methods to infer technical and statistical error. The ALDEx2 approach is shown to be suitable for all three types of data: it correctly identifies both the direction and differential abundance of features in the differential growth experiment, it identifies a substantially similar set of differentially expressed genes in the RNA-seq dataset as the leading tools and it identifies as differential the taxa that distinguish the tongue dorsum and buccal mucosa in the Human Microbiome Project dataset. The design of ALDEx2 reduces the number of false positive identifications that result from datasets composed of many features in few samples. Statistical analysis of high-throughput sequencing datasets composed of per feature counts showed that the ALDEx2 R package is a simple and robust tool, which can be applied to RNA-seq, 16S rRNA gene sequencing and differential growth datasets, and by extension to other techniques that use a
Nicolet, Charles; Paulson, Ariel; Shanker, Savita; Beckloff, N.; Bintzler, D.; Bivens, N. J.; Davis, R. R.; Donnelly, R. J.; Edenberg, H. J.; Gillaspy, A. F.; Grove, D.; Jafari, N.; Kerley-Hamilton, J. S.; Lashley, K.; Lyons, R. H.; Peak, A.; Perera, A.; Thimmapuram, J.; Wang, L.; Wright, C. L.; Alekseyev, Y.
Multiple recent publications on RNA-Seq have demonstrated the power of next generation sequencing technologies in whole transcriptome analysis. The vendor specific protocols used for RNA library construction typically require at least 100ng of total RNA. However, under certain conditions such as single cells, stem cells, difficult to isolate cell types, or fractionated cancer cells, only a small amount of material is available. In these cases, effective transcriptome profiling requires amplification of subnanogram amounts of RNA. Several RNA amplification kits are available for amplification prior to library construction and next generation sequencing but these kits have not been comprehensively field evaluated for accuracy and performance of RNA-Seq for picogram amounts of RNA. This study conducted by the DNA Sequencing Research Group (DSRG) focuses on the evaluation of amplification kits for RNA-Seq. Four commercial amplification kits were chosen: Ovation v2 (NuGEN Technologies), SMARTer (Clontech), Seqplex (Sigma Aldrich), and Super-AMP (Miltenyi Biotech). Starting material was 5ng, 500pg and 50pg of human total reference RNA (Clontech) spiked with Ambion ERCC control mix (Life Technologies) following the manufacturer's protocol. Each kit was tested at 3 different sites to assess reproducibility. Total RNA and ERCC RNA spike-in control mixes from the same lots were sent to 12 ABRF lab sites for amplification and cDNA generation. Ideally, this would have resulted in 36 different amplified samples, 3 from each input RNA. Libraries were constructed at one site from the amplified cDNAs using the TruSeq RNA library preparation kit on the Tecan Freedom EVO Liquid Handling Robot. As an unamplified control, ribosomal depletion and PolyA selection were performed separately using 5ng, 100ng and 1ug of total RNA prior to library construction. All libraries were pooled and sequenced using the Illumina HiSeq platform. An overview of the study and the results will be
Archival formalin-fixed paraffin-embedded (FFPE) samples offer a vast, untapped source of genomic data for biomarker discovery. However, the quality of FFPE samples is often highly variable, and conventional methods to assess RNA quality for RNA-sequencing (RNA-seq) are not infor...
Busch, Anke; Backofen, Rolf
INFO-RNA is a new web server for designing RNA sequences that fold into a user given secondary structure. Furthermore, constraints on the sequence can be specified, e.g. one can restrict sequence positions to a fixed nucleotide or to a set of nucleotides. Moreover, the user can allow violations of the constraints at some positions, which can be advantageous in complicated cases. The INFO-RNA web server allows biologists to design RNA sequences in an automatic manner. It is clearly and intuitively arranged and easy to use. The procedure is fast, as most applications are completed within seconds and it proceeds better and faster than other existing tools. The INFO-RNA web server is freely available at http://www.bioinf.uni-freiburg.de/Software/INFO-RNA/ PMID:17452349
ities of different datasets. Entropy cannot differentiate between chaotic and random sequences while ApEn and LZ cannot distinguish between weak and strong chaos. Figure 1. 95% confidence interval for mean LZ complexity of 50 samples of length. 20 using four bins. Pramana – J. Phys., Vol. 84, No. 3, March 2015. 367 ...
Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B M; Cornel, Martina C; Sistermans, Erik A
Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a "no-call." Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA. © 2016 Elsevier Inc. All rights reserved.
Full Text Available The horse is an optimal model organism for studying the genomic response to exercise-induced stress, due to its natural aptitude for athletic performance and the relative homogeneity of its genetic and environmental backgrounds. Here, we applied RNA-sequencing analysis through the use of SOLiD technology in an experimental framework centered on exercise-induced stress during endurance races in equine athletes. We monitored the transcriptional landscape by comparing gene expression levels between animals at rest and after competition. Overall, we observed a shift from coding to non-coding regions, suggesting that the stress response involves the differential expression of not annotated regions. Notably, we observed significant post-race increases of reads that correspond to repeats, especially the intergenic and intronic L1 and L2 transposable elements. We also observed increased expression of the antisense strands compared to the sense strands in intronic and regulatory regions (1 kb up- and downstream of the genes, suggesting that antisense transcription could be one of the main mechanisms for transposon regulation in the horse under stress conditions. We identified a large number of transcripts corresponding to intergenic and intronic regions putatively associated with new transcriptional elements. Gene expression and pathway analysis allowed us to identify several biological processes and molecular functions that may be involved with exercise-induced stress. Ontology clustering reflected mechanisms that are already known to be stress activated (e.g., chemokine-type cytokines, Toll-like receptors, and kinases, as well as "nucleic acid binding" and "signal transduction activity" functions. There was also a general and transient decrease in the global rates of protein synthesis, which would be expected after strenuous global stress. In sum, our network analysis points toward the involvement of specific gene clusters in equine exercise
Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.
Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286
Movassagh, Mercedeh; Alomran, Nawaf; Mudvari, Prakriti; Dede, Merve; Dede, Cem; Kowsari, Kamran; Restrepo, Paula; Cauley, Edmund; Bahl, Sonali; Li, Muzi; Waterhouse, Wesley; Tsaneva-Atanasova, Krasimira; Edwards, Nathan; Horvath, Anelia
We introduce RNA2DNAlign, a computational framework for quantitative assessment of allele counts across paired RNA and DNA sequencing datasets. RNA2DNAlign is based on quantitation of the relative abundance of variant and reference read counts, followed by binomial tests for genotype and allelic status at SNV positions between compatible sequences. RNA2DNAlign detects positions with differential allele distribution, suggesting asymmetries due to regulatory/structural events. Based on the type of asymmetry, RNA2DNAlign outlines positions likely to be implicated in RNA editing, allele-specific expression or loss, somatic mutagenesis or loss-of-heterozygosity (the first three also in a tumor-specific setting). We applied RNA2DNAlign on 360 matching normal and tumor exomes and transcriptomes from 90 breast cancer patients from TCGA. Under high-confidence settings, RNA2DNAlign identified 2038 distinct SNV sites associated with one of the aforementioned asymetries, the majority of which have not been linked to functionality before. The performance assessment shows very high specificity and sensitivity, due to the corroboration of signals across multiple matching datasets. RNA2DNAlign is freely available from http://github.com/HorvathLab/NGS as a self-contained binary package for 64-bit Linux systems. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Seemann, Stefan E; Richter, Andreas S; Gesell, Tanja; Backofen, Rolf; Gorodkin, Jan
Predicting RNA-RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA-RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA-RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences. PETcofold's ability to predict RNA-RNA interactions was evaluated on a carefully curated dataset of 32 bacterial small RNAs and their targets, which was manually extracted from the literature. For evaluation of both RNA-RNA interaction and structure prediction, we were able to extract only a few high-quality examples: one vertebrate small nucleolar RNA and four bacterial small RNAs. For these we show that the prediction can be improved by our comparative approach. Furthermore, PETcofold was evaluated on controlled data with phylogenetically simulated sequences enriched for covariance patterns at the interaction sites. We observed increased performance with increased amounts of covariance. The program PETcofold is available as source code and can be downloaded from http://rth.dk/resources/petcofold.
Qin, Yidan; Yao, Jun; Wu, Douglas C; Nottingham, Ryan M; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from RNA in RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. © 2015 Qin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra
Helal, Manal; Kong, Fanrong; Chen, Sharon C. A.; Bain, Michael; Christen, Richard; Sintchenko, Vitali
Background The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. Methods A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. Results The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as ‘centroids’ in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. Conclusion The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra
Full Text Available Abstract Background In recent years, the number of available RNA structures has rapidly grown reflecting the increased interest on RNA biology. Similarly to the studies carried out two decades ago for proteins, which gave the fundamental grounds for developing comparative protein structure prediction methods, we are now able to quantify the relationship between sequence and structure conservation in RNA. Results Here we introduce an all-against-all sequence- and three-dimensional (3D structure-based comparison of a representative set of RNA structures, which have allowed us to quantitatively confirm that: (i there is a measurable relationship between sequence and structure conservation that weakens for alignments resulting in below 60% sequence identity, (ii evolution tends to conserve more RNA structure than sequence, and (iii there is a twilight zone for RNA homology detection. Discussion The computational analysis here presented quantitatively describes the relationship between sequence and structure for RNA molecules and defines a twilight zone region for detecting RNA homology. Our work could represent the theoretical basis and limitations for future developments in comparative RNA 3D structure prediction.
Full Text Available Objective To construct the recombinant Japanese encephalitis virus ( JEV carrying brain-specific miRNA targeting sequences. Methods The target sequences of brain-specific miR-124 and miR-125 were introduced into the infectious cDNA clone of JEV to generate recombinant plasmids based on reverse genetics technology. The recombinant plasmids were linearized with Xho Ⅰ and served as templates of transcription with SP6 RNA polymerase to generate infectious viral RNA. The RNA transcripts were then transfected into BHK-21 cells, and the supernatant was obtained after incubated at 37℃, 5% CO2 for 3 days. The cytopathic changes of BHK-21 cells inoculated with the supernatant were observed after one passage. The rescued viruses carrying miRNA target sequences were validated by RT-PCR, standard plaque forming test on BHK-21 cells and growth curves analysis. Results Two recombinant viruses carrying miR-124 or miR-125 target sequence were rescued, respectively. The insertion of miRNA target sequences was confirmed by DNA sequencing. The rescued viruses yielded similar plaque morphology and replication efficiency compared with wild type JEV. Conclusion The recombinant JEV containing brain-specific miRNA target sequences can be obtained by reverse genetics technique, which could be used in further studies of miRNA-mediated tissue-specific attenuation mechanism of JEV. DOI: 10.11855/j.issn.0577-7402.2014.06.01
Walker, W F; Doolittle, W F
The 5S rRNA sequences from the basidiomycetes or fungi imperfecti Rhizoctonia crocorum, Rhizoctonia hiemalis, Exobasidium vaccinii, Trichosporon oryzae, Tilletia controversa, Tilletiaria anomala, Dacrymyces deliquescens and Coprinus radiatus were determined. With the exception of Exobasidium, these sequences conform to the association previously found between septal pore type and sequence. The sequence from the supposed ascomycete anamorph Rhizotonia hiemalis clearly is allied with basidiomycete sequences.
Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen; Rasmussen, Morten
The characterization of biomolecules from ancient samples can shed otherwise unobtainable insights into the past. Despite the fundamental role of transcriptomal change in evolution, the potential of ancient RNA remains unexploited - perhaps due to dogma associated with the fragility of RNA. We hy...
The heartworm Dirofilaria immitis is the causative agent of cardiopulmonary dirofilariosis in dogs and cats, which also infects a wide range of wild mammals and humans. The complex life cycle of D. immitis with several developmental stages in its invertebrate mosquito vectors and its vertebrate hosts indicates the importance of miRNA in growth and development, and their ability to regulate infection of mammalian hosts. This study identified the miRNA profiles of D. immitis of zoonotic significance by deep sequencing. A total of 1063 conserved miRNA candidates, including 68 anti-sense miRNA (miRNA*) sequences, were predicted by computational methods and could be grouped into 808 miRNA families. A significant bias towards family members, family abundance and sequence nucleotides was observed. Thirteen novel miRNA candidates were predicted by alignment with the Brugia malayi genome. Eleven out of 13 predicted miRNA candidates were verified by using a PCR-based method. Target genes of the novel miRNA candidates were predicted by using the heartworm transcriptome dataset. To our knowledge, this is the first report of miRNA profiles in D. immitis, which will contribute to a better understanding of the complex biology of this zoonotic filarial nematode and the molecular regulation roles of miRNA involved. Our findings may also become a useful resource for small RNA studies in other filarial parasitic nematodes. PMID:23331513
Sarah L Fordyce
Full Text Available The characterization of biomolecules from ancient samples can shed otherwise unobtainable insights into the past. Despite the fundamental role of transcriptomal change in evolution, the potential of ancient RNA remains unexploited - perhaps due to dogma associated with the fragility of RNA. We hypothesize that seeds offer a plausible refuge for long-term RNA survival, due to the fundamental role of RNA during seed germination. Using RNA-Seq on cDNA synthesized from nucleic acid extracts, we validate this hypothesis through demonstration of partial transcriptomal recovery from two sources of ancient maize kernels. The results suggest that ancient seed transcriptomics may offer a powerful new tool with which to study plant domestication.
Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan
RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential
Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K
Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.
Patrick D. Schloss
Full Text Available Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.
Klein, William H.; Murphy, William; Attardi, Giuseppe; Britten, Roy J.; Davidson, Eric H.
Polyadenylated messenger RNA extracted from HeLa cells was hybridized with a mass excess of HeLa DNA. The kinetics of the hybridization reaction demonstrated that most of the messenger RNA is transcribed from nonrepetitive DNA. The amount of messenger RNA hybridized to DNA was measured both with and without prior RNase treatment. Comparison of the results indicates that within the limits of detection, HeLa messenger RNA does not contain repetitive sequence elements covalently linked to nonrepetitive sequence transcripts. However, a small fraction of the HeLa messenger RNA preparation is transcribed entirely from repetitive DNA sequences. This fraction represents about 6% of the total polyadenylated messenger RNA preparation. PMID:4525461
Full Text Available RNA-sequencing is a powerful tool in studying RNomics. However, the highly abundance of ribosomal RNAs (rRNA and transfer RNA (tRNA have predominated in the sequencing reads, thereby hindering the study of lowly expressed genes. Therefore, rRNA depletion prior to sequencing is often performed in order to preserve the subtle alteration in gene expression especially those at relatively low expression levels. One of the commercially available methods is to use DNA or RNA probes to hybridize to the target RNAs. However, there is always a concern with the non-specific binding and unintended removal of messenger RNA (mRNA when the same set of probes is applied to different organisms. The degree of such unintended mRNA removal varies among organisms due to organism-specific genomic variation. We developed a computer-based method to design probes to deplete rRNA in an organism-specific manner. Based on the computation results, biotinylated-RNA-probes were produced by in vitro transcription and were used to perform rRNA depletion with subtractive hybridization. We demonstrated that the designed probes of 16S rRNAs and 23S rRNAs can efficiently remove rRNAs from Mycobacterium smegmatis. In comparison with a commercial subtractive hybridization-based rRNA removal kit, using organism-specific probes is better in preserving the RNA integrity and abundance. We believe the computer-based design approach can be used as a generic method in preparing RNA of any organisms for next-generation sequencing, particularly for the transcriptome analysis of microbes.
Zhao, Yongyun; Chen, Gangyi; Yuan, Yi; Li, Na; Dong, Juan; Huang, Xin; Cui, Xin; Tang, Zhuo
Constant efforts have been made to develop new method to realize sequence-specific RNA degradation, which could cause inhibition of the expression of targeted gene. Herein, by using an unmodified short DNA oligonucleotide for sequence recognition and endogenic small molecule, vitamin B2 (riboflavin) as photosensitizer, we report a simple strategy to realize the sequence-specific photocleavage of targeted RNA. The DNA strand is complimentary to the target sequence to form DNA/RNA duplex containing a G • U wobble in the middle. The cleavage reaction goes through oxidative elimination mechanism at the nucleoside downstream of U of the G • U wobble in duplex to obtain unnatural RNA terminal, and the whole process is under tight control by using light as switch, which means the cleavage could be carried out according to specific spatial and temporal requirements. The biocompatibility of this method makes the DNA strand in combination with riboflavin a promising molecular tool for RNA manipulation.
Full Text Available BACKGROUND: Upwards of 1200 miRNA loci have hitherto been annotated in the human genome. The specific features defining a miRNA precursor and deciding its recognition and subsequent processing are not yet exhaustively described and miRNA loci can thus not be computationally identified with sufficient confidence. RESULTS: We rendered pre-miRNA and non-pre-miRNA hairpins as strings of integrated sequence-structure information, and used the software Teiresias to identify sequence-structure motifs (ss-motifs of variable length in these data sets. Using only ss-motifs as features in a Support Vector Machine (SVM algorithm for pre-miRNA identification achieved 99.2% specificity and 97.6% sensitivity on a human test data set, which is comparable to previously published algorithms employing combinations of sequence-structure and additional features. Further analysis of the ss-motif information contents revealed strongly significant deviations from those of the respective training sets, revealing important potential clues as to how the sequence and structural information of RNA hairpins are utilized by the miRNA processing apparatus. CONCLUSION: Integrated sequence-structure motifs of variable length apparently capture nearly all information required to distinguish miRNA precursors from other stem-loop structures.
Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.
Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The
Daugaard, Iben; Venø, Morten T; Yan, Yan
The majority of lung cancer deaths are caused by metastatic disease. MicroRNAs (miRNAs) are posttranscriptional regulators of gene expression and miRNA dysregulation can contribute to metastatic progression. Here, small RNA sequencing was used to profile the miRNA and piwi-interacting RNA (piRNA......) transcriptomes in relation to lung cancer metastasis. RNA-seq was performed using RNA extracted from formalin-fixed paraffin embedded (FFPE) lung adenocarcinomas (LAC) and brain metastases from 8 patients, and LACs from 8 patients without detectable metastatic disease. Impact on miRNA and piRNA transcriptomes...... was subtle with 9 miRNAs and 8 piRNAs demonstrating differential expression between metastasizing and non-metastasizing LACs. For piRNAs, decreased expression of piR-57125 was the most significantly associated with distant metastasis. Validation by RT-qPCR in a LAC cohort comprising 52 patients confirmed...
Full Text Available With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5% with fast runtime (~2.85 seconds per library and efficient memory usage (~43 MB on average. In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were "ready-to-map" small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies.
Tsuji, Junko; Weng, Zhiping
With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were "ready-to-map" small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies.
Full Text Available A reverse transcriptase - polymerase chain reaction based assay for Borrelia species detection in ticks was developed. The method was based on amplification of 552 nucleotide bases long sequence of 16S rRNA, targeted by Borrelia specific primers. In the present study, total RNA extracted from Ixodes ricinus ticks was used as template. The results showed higher sensitivity for Borrelia detection as compared to standard dark-field microscopy. Method specificity was confirmed by cloning and sequencing of obtained 552 base pairs long amplicons. Phylogenetic analysis of obtained sequences showed that they belong to B. lusitaniae and B. afzelii genospecies. RT-PCR based method presented in this paper could be very useful as a screening test for detecting pathogen presence, especially when in investigations is required extraction of total RNA from ticks.
Lorenz, Ronny; Wolfinger, Michael T; Tanzer, Andrea; Hofacker, Ivo L
RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Cribbs, D L; Gillam, I C; Tener, G M
The nucleotide sequences of three serine tRNAs from Drosophila melanogaster, together capable of decoding the six serine codons, were determined. tRNA(Ser)2b has the anticodon GCU, tRNA(Ser)4 has CGA and tRNA(Ser)7 has IGA. tRNA(Ser)2b differs from the last two by about 25%. However, tRNA(Ser)4 and tRNA(Ser)7 are 96% homologous, differing only at the first position of the anticodon and two other sites. This unusual sequence relationship suggests, together with similar pairs in the yeasts Schizosaccharomyces pombe and Saccharomyces cerevisiae, that eukaryotic tRNA(Ser)UCN may be undergoing concerted evolution.
Holcomb, M; Ding, Y-H; Dai, D; McDonald, R J; McDonald, J S; Kallmes, D F; Kadirvel, R
Rabbit aneurysm models are used for the testing of embolization devices and elucidating the mechanisms of human intracranial aneurysm growth and healing. We used RNA-sequencing technology to identify genes relevant to induced rabbit aneurysm biology and to identify genes and pathways of potential clinical interest. This process included sequencing microRNAs, which are important regulatory noncoding RNAs. Elastase-induced saccular aneurysms were created at the origin of the right common carotid artery in 6 rabbits. Messenger RNA and microRNA were isolated from the aneurysm and from the control left common carotid artery at 12 weeks and processed by using RNA-sequencing technology. The results from RNA sequencing were analyzed by using the Ingenuity Pathway Analysis tool. A total of 9396 genes were analyzed by using RNA sequencing, 648 (6.9%) of which were found to be significantly differentially expressed between the aneurysms and control tissues (P 2 or rabbit aneurysms revealed differential regulation of some key pathways, including inflammation and antigen presentation. ANKRD1 and TACR1 were identified as genes of interest in the regulation of matrix metalloproteinases. © 2015 by American Journal of Neuroradiology.
Full Text Available The presence of high molecular weight double-stranded RNA (dsRNA within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV, a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.
Boesler, Carsten; Kruse, Janis; Söderbom, Fredrik; Hammann, Christian
The amoeba Dictyostelium discoideum is a well established model organism for studying numerous aspects of cellular and developmental functions. Its ribosomal RNA (rRNA) is encoded in an extrachromosomal palindrome that exists in ∼100 copies in the cell. In this study, we have set out to investigate the sequence of the expressed rRNA. For this, we have ligated the rRNA ends and performed RT-PCR on these circular RNAs. Sequencing revealed that the mature 26 S, 17 S, 5.8 S, and 5 S rRNAs have si...
Harjanto, Dewi; Papamarkou, Theodore; Oates, Chris J; Rayon-Estrada, Violeta; Papavasiliou, F Nina; Papavasiliou, Anastasia
RNA editing is a mutational mechanism that specifically alters the nucleotide content in transcribed RNA. However, editing rates vary widely, and could result from equivalent editing amongst individual cells, or represent an average of variable editing within a population. Here we present a hierarchical Bayesian model that quantifies the variance of editing rates at specific sites using RNA-seq data from both single cells, and a cognate bulk sample to distinguish between these two possibilities. The model predicts high variance for specific edited sites in murine macrophages and dendritic cells, findings that we validated experimentally by using targeted amplification of specific editable transcripts from single cells. The model also predicts changes in variance in editing rates for specific sites in dendritic cells during the course of LPS stimulation. Our data demonstrate substantial variance in editing signatures amongst single cells, supporting the notion that RNA editing generates diversity within cellular populations.
Susan M Huse
Full Text Available Massively parallel pyrosequencing of hypervariable regions from small subunit ribosomal RNA (SSU rRNA genes can sample a microbial community two or three orders of magnitude more deeply per dollar and per hour than capillary sequencing of full-length SSU rRNA. As with full-length rRNA surveys, each sequence read is a tag surrogate for a single microbe. However, rather than assigning taxonomy by creating gene trees de novo that include all experimental sequences and certain reference taxa, we compare the hypervariable region tags to an extensive database of rRNA sequences and assign taxonomy based on the best match in a Global Alignment for Sequence Taxonomy (GAST process. The resulting taxonomic census provides information on both composition and diversity of the microbial community. To determine the effectiveness of using only hypervariable region tags for assessing microbial community membership, we compared the taxonomy assigned to the V3 and V6 hypervariable regions with the taxonomy assigned to full-length SSU rRNA sequences isolated from both the human gut and a deep-sea hydrothermal vent. The hypervariable region tags and full-length rRNA sequences provided equivalent taxonomy and measures of relative abundance of microbial communities, even for tags up to 15% divergent from their nearest reference match. The greater sampling depth per dollar afforded by massively parallel pyrosequencing reveals many more members of the "rare biosphere" than does capillary sequencing of the full-length gene. In addition, tag sequencing eliminates cloning bias and the sequences are short enough to be completely sequenced in a single read, maximizing the number of organisms sampled in a run while minimizing chimera formation. This technique allows the cost-effective exploration of changes in microbial community structure, including the rare biosphere, over space and time and can be applied immediately to initiatives, such as the Human Microbiome Project.
Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768
Goldfarb, Katherine C; Cech, Thomas R
Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.
Full Text Available Abstract Background RNA-binding proteins (RBPs play diverse roles in eukaryotic RNA processing. Despite their pervasive functions in coding and noncoding RNA biogenesis and regulation, elucidating the sequence specificities that define protein-RNA interactions remains a major challenge. Recently, CLIP-seq (Cross-linking immunoprecipitation followed by high-throughput sequencing has been successfully implemented to study the transcriptome-wide binding patterns of SRSF1, PTBP1, NOVA and fox2 proteins. These studies either adopted traditional methods like Multiple EM for Motif Elicitation (MEME to discover the sequence consensus of RBP's binding sites or used Z-score statistics to search for the overrepresented nucleotides of a certain size. We argue that most of these methods are not well-suited for RNA motif identification, as they are unable to incorporate the RNA structural context of protein-RNA interactions, which may affect to binding specificity. Here, we describe a novel model-based approach--RNAMotifModeler to identify the consensus of protein-RNA binding regions by integrating sequence features and RNA secondary structures. Results As an example, we implemented RNAMotifModeler on SRSF1 (SF2/ASF CLIP-seq data. The sequence-structural consensus we identified is a purine-rich octamer 'AGAAGAAG' in a highly single-stranded RNA context. The unpaired probabilities, the probabilities of not forming pairs, are significantly higher than negative controls and the flanking sequence surrounding the binding site, indicating that SRSF1 proteins tend to bind on single-stranded RNA. Further statistical evaluations revealed that the second and fifth bases of SRSF1octamer motif have much stronger sequence specificities, but weaker single-strandedness, while the third, fourth, sixth and seventh bases are far more likely to be single-stranded, but have more degenerate sequence specificities. Therefore, we hypothesize that nucleotide specificity and
Olejniczak, Marta; Urbanek, Martyna O; Jaworska, Edyta; Witucki, Lukasz; Szczesniak, Michal W; Makalowska, Izabela; Krzyzosiak, Wlodzimierz J
RNA interference triggers such as short interfering RNA (siRNA) or genetically encoded short hairpin RNA (shRNA) and artificial miRNA (sh-miR) are widely used to silence the expression of specific genes. In addition to silencing selected targets, RNAi reagents may induce various side effects, including immune responses. To determine the molecular markers of immune response activation when using RNAi reagents, we analyzed the results of experiments gathered in the RNAimmuno (v 2.0) and GEO Profiles databases. To better characterize and compare cellular responses to various RNAi reagents in one experimental system, we designed a reagent series in corresponding siRNA, D-siRNA, shRNA and sh-miR forms. To exclude sequence-specific effects the reagents targeted 3 different transcripts (Luc, ATXN3 and HTT). We demonstrate that RNAi reagents induce a broad variety of sequence-non-specific effects, including the deregulation of cellular miRNA levels. Typical siRNAs are weak stimulators of interferon response but may saturate the miRNA biogenesis pathway, leading to the downregulation of highly expressed miRNAs, whereas plasmid-based reagents induce known markers of immune response and may alter miRNA levels and their isomiR composition. Copyright © 2015 Elsevier B.V. All rights reserved.
Full Text Available Massively parallel RNA sequencing (RNA-seq has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.
Burroughs, Alexander Maxwell; Ando, Yoshinari; Aravind, L
Our understanding of the pervasive involvement of small RNAs in regulating diverse biological processes has been greatly augmented by recent application of deep-sequencing technologies to small RNA across diverse eukaryotes. We review the currently-known small RNA classes and place them in context of the reconstructed evolutionary history of the RNAi protein machinery. This synthesis indicates the earliest versions of eukaryotic RNAi systems likely utilized small RNA processed from three types of precursors: 1) sense-antisense transcriptional products, 2) genome-encoded, imperfectly-complementary hairpin sequences, and 3) larger non-coding RNA precursor sequences. Structural dissection of PIWI proteins along with recent discovery of novel families (including Med13 of the Mediator complex) suggest that emergence of a distinct architecture with the N-terminal domains (also occurring separately fused to endoDNases in prokaryotes) formed via duplication of an ancestral unit was key to their recruitment as primary RNAi effectors and use of small RNAs of certain preferred lengths. Prokaryotic PIWI proteins are typically components of several RNA-directed DNA restriction or CRISPR/Cas systems. However, eukaryotic versions appear to have emerged from a subset that evolved RNA-directed RNA interference. They were recruited alongside RNaseIII domains and RdRP domains, also from prokaryotic systems, to form the core eukaryotic RNAi system. Like certain regulatory systems, RNAi diversified into two distinct but linked arms concomitant with eukaryotic nucleo-cytoplasmic compartmentalization. Subsequent elaboration of RNAi proceeded via diversification of the core protein machinery through lineage-specific expansions and recruitment of new components from prokaryotes (nucleases and small RNA-modifying enzymes), allowing for diversification of associating small RNAs. PMID:24311560
Liu, Yuan; Jing, Runyu; Xu, Junmei; Liu, Keqin; Xue, Jiwei; Wen, Zhining; Li, Menglong
Although RNA-sequencing has been widely used to identify the differentially expressed genes (DEGs) as biomarkers to guide the therapeutic treatment, it is necessary to investigate the concordance of DEGs identified by microarray and RNA-sequencing for the clinical prognosis. By using The Cancer Genome Atlas data sets, we thoroughly investigated the concordance of DEGs identified from microarray and RNA-sequencing data and their molecular functions. The DEGs identified by both technologies averaged ~98.6% overlap. The cancer-related gene sets were significantly enriched with the DEGs and consistent between two technologies. The highly consistency of DEGs in their regulation directionality and molecular functions indicated the good reproducibility between microarray and RNA-sequencing in identifying potential oncogenes for clinical prognosis.
French, R; Ahlquist, P
The genome of brome mosaic virus (BMV) is divided among messenger polarity RNA1, RNA2, and RNA3 (3.2, 2.9, and 2.1 kilobases, respectively). cis-Acting sequences required for BMV RNA amplification were investigated with RNA3. By using expressible cDNA clones, deletions were constructed throughout RNA3 and tested in barley protoplasts coinoculated with RNA1 and RNA2. In contrast to requirements for 5'- and 3'-terminal noncoding sequences, either of the two RNA3 coding regions can be deleted in...
Gong, Jing; Wu, Yuliang; Zhang, Xiantong; Liao, Yifang; Sibanda, Vusumuzi Leroy; Liu, Wei; Guo, An-Yuan
MicroRNAs (miRNAs) play key regulatory roles in various biological processes and diseases. A comprehensive analysis of large scale small RNA sequencing data (smRNA-seq) will be very helpful to explore tissue or disease specific miRNA markers and uncover miRNA variants. Here, we systematically analyzed 410 human smRNA-seq datasets, which samples are from 24 tissue/disease/cell lines. We tested the mapping strategies and found that it was necessary to make multiple-round mappings with different mismatch parameters. miRNA expression profiles revealed that on average ∼70% of known miRNAs were expressed at low level or not expressed (RPM 100). About 30% known miRNAs were not expressed in all of our used samples. The miRNA expression profiles were compiled into an online database (HMED, http://bioinfo.life.hust.edu.cn/smallRNA/). Dozens of tissue/disease specific miRNAs, disease/control dysregulated miRNAs and miRNAs with arm switching events were discovered. Further, we identified some highly confident editing sites including 24 A-to-I sites and 23 C-to-U sites. About half of them were widespread miRNA editing sites in different tissues. We characterized that the 2 types of editing sites have different features with regard to location, editing level and frequency. Our analyses for expression profiles, specific miRNA markers, arm switching, and editing sites, may provide valuable information for further studies of miRNA function and biomarker finding.
Swee Hoe Ong
Full Text Available The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90% in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.
Singleton, David R.; Furlong, Michelle A.; Rathbun, Stephen L.; Whitman, William B.
To determine the significance of differences between clonal libraries of environmental rRNA gene sequences, differences between homologous coverage curves, CX(D), and heterologous coverage curves, CXY(D), were calculated by a Cramér-von Mises-type statistic and compared by a Monte Carlo test procedure. This method successfully distinguished rRNA gene sequence libraries from soil and bioreactors and correctly failed to find differences between libraries of the same composition. PMID:11526051
Singleton, David R.; Furlong, Michelle A.; Rathbun, Stephen L; William B. Whitman
To determine the significance of differences between clonal libraries of environmental rRNA gene sequences, differences between homologous coverage curves, CX(D), and heterologous coverage curves, CXY(D), were calculated by a Cramér-von Mises-type statistic and compared by a Monte Carlo test procedure. This method successfully distinguished rRNA gene sequence libraries from soil and bioreactors and correctly failed to find differences between libraries of the same composition.
Our methodology attained a higher average accuracy of 0.88, average sensitivity and specificity of 0.81 and 0.94, respectively, and areas under the curves (AUCs) for all the four models scored above 0.9, suggesting better performance by our methodology and a possible role of flanking regions in microRNA targeting ...
Petersen, Christel H.; Hjort, Benjamin Benn; Tvedebrink, Torben
We report a new second generation sequencing method for identification micro-RNA (miRNA) that can be used to identify body fluids and tissues. Principal component analysis of 10 miRNAs with high expression in 16 samples of blood, saliva and semen showed clear differences in the expression of mi...
Aug 31, 2016 ... from various geographical regions were studied by analysis of the 23S rRNA sequences. .... used in PCR amplification and Big Dye Terminator v3.1 cycle ... Data analysis. The reference 23S rRNA gene of R. leguminosarum (AF207785.1) was retrieved from NCBI site (http://www.ncbi.nlm.nih.gov/). The.
Brown, S; Thon, G; Tolentino, E
A general strategy for cloning the functional homologs of an Escherichia coli gene was used to clone homologs of 4.5S RNA from other bacteria. The genes encoding these homologs were selected by their ability to complement a deletion of the gene for 4.5S RNA. DNA sequences of the regions encoding...
Navalkar, Krupa Arun; Johnston, Stephan Albert; Stafford, Phillip
Diagnostics using peptide ligands have been available for decades. However, their adoption in diagnostics has been limited, not because of poor sensitivity but in many cases due to diminished specificity. Numerous reports suggest that protein-based rather than peptide-based disease detection is more specific. We examined two different approaches to peptide-based diagnostics using Coccidioides (aka Valley Fever) as the disease model. Although the pathogen was discovered more than a century ago, a highly sensitive diagnostic remains unavailable. We present a case study where two different approaches to diagnosing Valley Fever were used: first, overlapping Valley Fever epitopes representing immunodominant Coccidioides antigens were tiled using a microarray format of presynthesized peptides. Second, a set of random sequence peptides identified using a 10,000 peptide immunosignaturing microarray was compared for sensitivity and specificity. The scientific hypothesis tested was that actual epitope peptides from Coccidioides would provide sufficient sensitivity and specificity as a diagnostic. Results demonstrated that random sequence peptides exhibited higher accuracy when classifying different stages of Valley Fever infection vs. epitope peptides. The epitope peptide array did provide better performance than the existing immunodiffusion array, but when directly compared to the random sequence peptides, reported lower overall accuracy. This study suggests that there are competing aspects of antibody recognition that involve conservation of pathogen sequence and aspects of mimotope recognition and amino acid substitutions. These factors may prove critical when developing the next generation of high-performance immunodiagnostics. Copyright © 2014 Elsevier B.V. All rights reserved.
Kuksa, Pavel P; Leung, Yuk Yee; Vandivier, Lee E; Anderson, Zachary; Gregory, Brian D; Wang, Li-San
RNA molecules are often altered post-transcriptionally by the covalent modification of their nucleotides. These modifications are known to modulate the structure, function, and activity of RNAs. When reverse transcribed into cDNA during RNA sequencing library preparation, atypical (modified) ribonucleotides that affect Watson-Crick base pairing will interfere with reverse transcriptase (RT), resulting in cDNA products with mis-incorporated bases or prematurely terminated RNA products. These interactions with RT can therefore be inferred from mismatch patterns in the sequencing reads, and are distinguishable from simple base-calling errors, single-nucleotide polymorphisms (SNPs), or RNA editing sites. Here, we describe a computational protocol for the in silico identification of modified ribonucleotides from RT-based RNA-seq read-out using the High-throughput Analysis of Modified Ribonucleotides (HAMR) software. HAMR can identify these modifications transcriptome-wide with single nucleotide resolution, and also differentiate between different types of modifications to predict modification identity. Researchers can use HAMR to identify and characterize RNA modifications using RNA-seq data from a variety of common RT-based sequencing protocols such as Poly(A), total RNA-seq, and small RNA-seq.
Full Text Available Circular RNAs (circRNAs are a large class of animal RNAs. To investigate possible circRNA functions, it is important to understand circRNA biogenesis. Besides human ALU repeats, sequence features that promote exon circularization are largely unknown. We experimentally identified circRNAs in C. elegans. Reverse complementary sequences between introns bracketing circRNAs were significantly enriched in comparison to linear controls. By scoring the presence of reverse complementary sequences in human introns, we predicted and experimentally validated circRNAs. We show that introns bracketing circRNAs are highly enriched in RNA editing or hyperediting events. Knockdown of the double-strand RNA-editing enzyme ADAR1 significantly and specifically upregulated circRNA expression. Together, our data support a model of animal circRNA biogenesis in which competing RNA-RNA interactions of introns form larger structures that promote circularization of embedded exons, whereas ADAR1 antagonizes circRNA expression by melting stems within these interactions.
Vangi, D.; Virga, A.; Gulino, M. S.
Low-power laser generated ultrasounds are lately gaining importance in the research world, thanks to the possibility of investigating a mechanical component structural integrity through a non-contact and Non-Destructive Testing (NDT) procedure. The ultrasounds are, however, very low in amplitude, making it necessary to use pre-processing and post-processing operations on the signals to detect them. The cross-correlation technique is used in this work, meaning that a random signal must be used as laser input. For this purpose, a highly random and simple-to-create code called T sequence, capable of enhancing the ultrasound detectability, is introduced (not previously available at the state of the art). Several important parameters which characterize the T sequence can influence the process: the number of pulses Npulses , the pulse duration δ and the distance between pulses dpulses . A Finite Element FE model of a 3 mm steel disk has been initially developed to analytically study the longitudinal ultrasound generation mechanism and the obtainable outputs. Later, experimental tests have shown that the T sequence is highly flexible for ultrasound detection purposes, making it optimal to use high Npulses and δ but low dpulses . In the end, apart from describing all phenomena that arise in the low-power laser generation process, the results of this study are also important for setting up an effective NDT procedure using this technology.
Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S
There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information.
Jeffrey R Johansen
Full Text Available A highly divergent 16S rRNA gene was found in one of the five ribosomal operons present in a species complex currently circumscribed as Scytonema hyalinum (Nostocales, Cyanobacteria using clone libraries. If 16S rRNA sequence macroheterogeneity among ribosomal operons due to insertions, deletions or truncation is excluded, the sequence heterogeneity observed in S. hyalinum was the highest observed in any prokaryotic species thus far (7.3-9.0%. The secondary structure of the 16S rRNA molecules encoded by the two divergent operons was nearly identical, indicating possible functionality. The 23S rRNA gene was examined for a few strains in this complex, and it was also found to be highly divergent from the gene in Type 2 operons (8.7%, and likewise had nearly identical secondary structure between the Type 1 and Type 2 operons. Furthermore, the 16S-23S ITS showed marked differences consistent between operons among numerous strains. Both operons have promoter sequences that satisfy consensus requirements for functional prokaryotic transcription initiation. Horizontal gene transfer from another unknown heterocytous cyanobacterium is considered the most likely explanation for the origin of this molecule, but does not explain the ultimate origin of this sequence, which is very divergent from all 16S rRNA sequences found thus far in cyanobacteria. The divergent sequence is highly conserved among numerous strains of S. hyalinum, suggesting adaptive advantage and selective constraint of the divergent sequence.
Full Text Available Abstract Background Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales. Results We describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters. Conclusions NiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA.
Lobrich, M.; Rydberg, B.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)
The random-breakage mapping method [Game et al. (1990) Nucleic Acids Res., 18, 4453-4461] was applied to DNA sequences in human fibroblasts. The methodology involves NotI restriction endonuclease digestion of DNA from irradiated calls, followed by pulsed-field gel electrophoresis, Southern blotting and hybridization with DNA probes recognizing the single copy sequences of interest. The Southern blots show a band for the unbroken restriction fragments and a smear below this band due to radiation induced random breaks. This smear pattern contains two discontinuities in intensity at positions that correspond to the distance of the hybridization site to each end of the restriction fragment. By analyzing the positions of those discontinuities we confirmed the previously mapped position of the probe DXS1327 within a NotI fragment on the X chromosome, thus demonstrating the validity of the technique. We were also able to position the probes D21S1 and D21S15 with respect to the ends of their corresponding NotI fragments on chromosome 21. A third chromosome 21 probe, D21S11, has previously been reported to be close to D21S1, although an uncertainty about a second possible location existed. Since both probes D21S1 and D21S11 hybridized to a single NotI fragment and yielded a similar smear pattern, this uncertainty is removed by the random-breakage mapping method.
Full Text Available Abstract Background During microRNA (miRNA maturation in humans and flies, Drosha and Dicer cut the precursor transcript, thereby producing a short RNA duplex. One strand of this duplex becomes a functional component of the RNA-Induced Silencing Complex (RISC, while the other is eliminated. While thermodynamic asymmetry of the duplex ends appears to play a decisive role in the strand selection process, the details of the selection mechanism are not yet understood. Results Here, we assess miRNA strand selection bias in humans and fruit flies by analyzing the sequence composition and relative expression levels of the two strands of the precursor duplex in these species. We find that the sequence elements associated with preferential miRNA strand selection and/or rejection differ between the two species. Further, we identify another feature that distinguishes human and fly miRNA processing machinery: the relative accuracy of the Drosha and Dicer enzymes. Conclusion Our result provides clues to the mechanistic aspects of miRNA strand selection in humans and other mammals. Further, it indicates that human and fly miRNA processing pathways are more distinct than currently recognized. Finally, the observed strand selection determinants are instrumental in the rational design of efficient miRNA-based expression regulators.
Villanueva, Francisco; Sabanadzovic, Sead; Valverde, Rodrigo A.
A number of avocado (Persea americana) cultivars are known to contain high-molecular-weight double-stranded RNA (dsRNA) molecules for which a viral nature has been suggested, although sequence data are not available. Here we report the cloning and complete sequencing of a 13.5-kbp dsRNA virus isolated from avocado and show that it corresponds to the genome of a new species of the genus Endornavirus (family Endornaviridae), tentatively named Persea americana endornavirus (PaEV). PMID:22205720
Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric
T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.
Full Text Available BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1, after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.
Tamaki, Hideyuki; Wright, Chris L; Li, Xiangzhen; Lin, Qiaoyan; Hwang, Chiachi; Wang, Shiping; Thimmapuram, Jyothi; Kamagata, Yoichi; Liu, Wen-Tso
16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method) is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1), after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming) but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies.
Desvignes, T.; Batzel, P.; Berezikov, E.; Eilbeck, K.; Eppig, J. T.; McAndrews, M. S.; Singer, A.; Postlethwait, J. H.
High-throughput sequencing of miRNAs has revealed the diversity and variability of mature and functional short noncoding RNAs, including their genomic origins, biogenesis pathways, sequence variability, and newly identified products such as miRNA-offset RNAs (moRs). Here we review known cases of
Kiyosawa, Hidenori; Okumura, Akio; Okui, Saya; Ushida, Chisato; Kawai, Gota
In order to find novel structured small RNAs, next-generation sequencing was applied to small RNA fractions with lengths ranging from 40 to 140 nt and secondary structure-based clustering was performed. Sequences of structured RNAs were effectively clustered and analyzed by secondary structure. Although more than 99% of the obtained sequences were known RNAs, 16 candidate mouse structured small non-coding RNAs (MsncRs) were isolated. Based on these results, the merits of secondary structure-based analysis are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.
Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Yu, Jun; Hu, Songnian
Exon recognition and splicing precisely and efficiently by spliceosome is the key to generate mature mRNAs. About one third or a half of disease-related mutations affect RNA splicing. Software PVAAS has been developed to identify variants associated with aberrant splicing by directly using RNA-seq data. However, it bases on the assumption that annotated splicing site is normal splicing, which is not true in fact. We develop the ISVASE, a tool for specifically identifying sequence variants associated with splicing events (SVASE) by using RNA-seq data. Comparing with PVAAS, our tool has several advantages, such as multi-pass stringent rule-dependent filters and statistical filters, only using split-reads, independent sequence variant identification in each part of splicing (junction), sequence variant detection for both of known and novel splicing event, additional exon-exon junction shift event detection if known splicing events provided, splicing signal evaluation, known DNA mutation and/or RNA editing data supported, higher precision and consistency, and short running time. Using a realistic RNA-seq dataset, we performed a case study to illustrate the functionality and effectiveness of our method. Moreover, the output of SVASEs can be used for downstream analysis such as splicing regulatory element study and sequence variant functional analysis. ISVASE is useful for researchers interested in sequence variants (DNA mutation and/or RNA editing) associated with splicing events. The package is freely available at https://sourceforge.net/projects/isvase/ .
Vihang Vithalrao Patil
Full Text Available Aim: This study was aimed at identifying Indian field isolates of Avibacterium paragallinarum on both molecular as well as serological levels that cause infectious coryza in chickens. Materials and Methods: Species-specific polymerase chain reaction (HPG-2 PCR, and 16S ribosomal RNA (rRNA sequencing were employed for molecular identification. Whereas, multiplex PCR technique was used for serological identification of Indian field isolates of A. paragallinarum. Results: All three field isolates were identified as A. paragallinarum using HPG-2 PCR. The species-specific PCR results were validated using 16S rRNA sequencing. The partial 16S rRNA sequences obtained from all three isolates showed 96-99% homology with the NCBI database reference strains of A. paragallinarum. The aligned partial sequences of 16S rRNA were submitted to GenBank, and accession numbers were obtained. Multiplex PCR-based molecular serotyping showed that there are three serotypes of field isolates of A. paragallinarum, namely, strain IND101 is serovar A, strain IND102 is serovar B, and strain IND103 is serovar C. Conclusion: HPG-2 PCR, 16S rRNA sequencing, and multiplex PCR are proved to be more accurate, sensitive, and reliable diagnostic tools for molecular and serological identification of A. paragallinarum field isolates. These diagnostic methods can substitute conventional cultural characterization and would be much valuable to formulate quick and correct prevention and control measures against this detrimental poultry pathogen.
Chang, Yao-Yin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Chuang, Eric Y
Deep sequencing is an advanced technology in genomic biology to detect the precise order of nucleotides in a strand of DNA/RNA molecule. The analysis of deep sequencing data also requires sophisticated knowledge in both computational software and bioinformatics. In this chapter, the procedures of deep sequencing analysis of microRNA (miRNA) transcriptome in triple-negative breast cancer and adjacent normal tissue are described in detail. As miRNAs are critical regulators of gene expression and many of them were previously reported to be associated with the malignant progression of human cancer, the analytical method that accurately identifies deregulated miRNAs in a specific type of cancer is thus important for the understanding of its tumor behavior. We obtained raw sequence reads of miRNA expression from 24 triple-negative breast cancers and 14 adjacent normal tissues using deep sequencing technology in this work. Expression data of miRNA reads were normalized with the quantile-quantile scaling method and were analyzed statistically. A miRNA expression signature composed of 25 differentially expressed miRNAs showed to be an effective classifier between triple-negative breast cancers and adjacent normal tissues in a hierarchical clustering analysis.
Li, Huasheng; Lu, Jinying; Sun, Qiao; Chen, Yu; He, Dacheng; Liu, Min
MicroRNA (miRNA) is a non-coding small RNA composed of 20 to 24 nucleotides that influences plant root development. This study analyzed the miRNA expression in Arabidopsis root tip cells using Illumina sequencing and real-time PCR before (sample 0) and 15 min after (sample 15) a 3-D clinostat rotational treatment was administered. After stimulation was performed, the expression levels of seven miRNA genes, including Arabidopsis miR160, miR161, miR394, miR402, miR403, miR408, and miR823, were significantly upregulated. Illumina sequencing results also revealed two novel miRNAsthat have not been previously reported, The target genes of these miRNAs included pentatricopeptide repeat-containing protein and diadenosine tetraphosphate hydrolase. An overexpression vector of Arabidopsis miR408 was constructed and transferred to Arabidopsis plant. The roots of plants over expressing miR408 exhibited a slower reorientation upon gravistimulation in comparison with those of wild-type. This result indicate that miR408 could play a role in root gravitropic response.
Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B. M.; Cornel, Martina C.; Sistermans, Erik A.
Cell-free DNA (cf DNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide
Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence ...
Boesler, Carsten; Kruse, Janis; Söderbom, Fredrik; Hammann, Christian
The amoeba Dictyostelium discoideum is a well established model organism for studying numerous aspects of cellular and developmental functions. Its ribosomal RNA (rRNA) is encoded in an extrachromosomal palindrome that exists in ∼100 copies in the cell. In this study, we have set out to investigate the sequence of the expressed rRNA. For this, we have ligated the rRNA ends and performed RT-PCR on these circular RNAs. Sequencing revealed that the mature 26 S, 17 S, 5.8 S, and 5 S rRNAs have sizes of 3741, 1871, 162, and 112 nucleotides, respectively. Unlike the published data, all mature rRNAs of the same type uniformly display the same start and end nucleotides in the analyzed AX2 strain. We show the existence of a short lived primary transcript covering the rRNA transcription unit of 17 S, 5.8 S, and 26 S rRNA. Northern blots and RT-PCR reveal that from this primary transcript two precursor molecules of the 17 S and two precursors of the 26 S rRNA are generated. We have also determined the sequences of these precursor molecules, and based on these data, we propose a model for the maturation of the rRNAs in Dictyostelium discoideum that we compare with the processing of the rRNA transcription unit of Saccharomyces cerevisiae. © 2011 by The American Society for Biochemistry and Molecular Biology, Inc.
Seemann, Ernst Stefan; Menzel, Karl Peter; Backofen, Rolf
gene. We present web servers to analyze multiple RNA sequences for common RNA structure and for RNA interaction sites. The web servers are based on the recent PET (Probabilistic Evolutionary and Thermodynamic) models PETfold and PETcofold, but add user friendly features ranging from a graphical layer...... to interactive usage of the predictors. Additionally, the web servers provide direct access to annotated RNA alignments, such as the Rfam 10.0 database and multiple alignments of 16 vertebrate genomes with human. The web servers are freely available at: http://rth.dk/resources/petfold/...
French, R; Ahlquist, P
The genome of brome mosaic virus (BMV) is divided among messenger polarity RNA1, RNA2, and RNA3 (3.2, 2.9, and 2.1 kilobases, respectively). cis-Acting sequences required for BMV RNA amplification were investigated with RNA3. By using expressible cDNA clones, deletions were constructed throughout RNA3 and tested in barley protoplasts coinoculated with RNA1 and RNA2. In contrast to requirements for 5'- and 3'-terminal noncoding sequences, either of the two RNA3 coding regions can be deleted individually and both can be simultaneously inactivated by N-terminal frameshift mutations without significantly interfering with amplification of RNA3 or production of its subgenomic mRNA. However, simultaneous major deletions in both coding regions greatly attenuate RNA3 accumulation. RNA3 levels can be largely restored by insertion of a heterologous, nonviral sequence in such mutants, suggesting that RNA3 requires physical separation of its terminal domains or a minimum overall size for normal replication or stability. Unexpectedly, deletions in a 150-base segment of the intercistronic noncoding region drastically reduce RNA3 accumulation. This segment contains a sequence element homologous to sequences found near the 5' ends of BMV RNA1 and RNA2 and in analogous positions in the three genomic RNAs of the related cucumber mosaic virus, suggesting a possible role in plus-strand synthesis.
Spornraft, Melanie; Kirchner, Benedikt; Haase, Bettina; Benes, Vladimir; Pfaffl, Michael W; Riedmaier, Irmgard
There are several protocols and kits for the extraction of circulating RNAs from plasma with a following quantification of specific genes via RT-qPCR. Due to the marginal amount of cell-free RNA in plasma samples, the total RNA yield is insufficient to perform Next-Generation Sequencing (NGS), the state-of-the-art technology in massive parallel sequencing that enables a comprehensive characterization of the whole transcriptome. Screening the transcriptome for biomarker signatures accelerates progress in biomarker profiling for molecular diagnostics, early disease detection or food safety. Therefore, the aim was to optimize a method that enables the extraction of sufficient amounts of total RNA from bovine plasma to generate good-quality small RNA Sequencing (small RNA-Seq) data. An increased volume of plasma (9 ml) was processed using the Qiagen miRNeasy Serum/Plasma Kit in combination with the QIAvac24 Plus system, a vacuum manifold that enables handling of high volumes during RNA isolation. 35 ng of total RNA were passed on to cDNA library preparation followed by small RNA high-throughput sequencing analysis on the Illumina HiSeq2000 platform. Raw sequencing reads were processed by a data analysis pipeline using different free software solutions. Seq-data was trimmed, quality checked, gradually selected for miRNAs/piRNAs and aligned to small RNA reference annotation indexes. Mapping to human reference indexes resulted in 4.8±2.8% of mature miRNAs and 1.4±0.8% of piRNAs and of 5.0±2.9% of mature miRNAs for bos taurus.
Clemens, Kristina; Bilanchone, Virginia; Beliakova-Bethell, Nadejda; Larsen, Liza S Z; Nguyen, Kim; Sandmeyer, Suzanne
Retroviruses and retrotransposons package genomic RNA into virus-like particles (VLPs) in a poorly understood process. Expression of the budding yeast retrotransposon Ty3 results in the formation of cytoplasmic Ty3 VLP assembly foci comprised of Ty3 RNA and proteins, and cellular factors associated with RNA processing body (PB) components, which modulate translation and effect nonsense-mediated decay (NMD). A series of Ty3 RNA variants were tested to understand the effects of read-through translation via programmed frameshifting on RNA localization and packaging into VLPs, and to identify the roles of coding and non-coding sequences in those processes. These experiments showed that a low level of read-through translation of the downstream open reading frame (as opposed to no translation or translation without frameshifting) is important for localization of full-length Ty3 RNA to foci. Ty3 RNA variants associated with PB components via independent determinants in the native Ty3 untranslated regions (UTRs) and in GAG3-POL3 sequences flanked by UTRs adapted from non-Ty3 transcripts. However, despite localization, RNAs containing GAG3-POL3 but lacking Ty3 UTRs were not packaged efficiently. Surprisingly, sequences within Ty3 UTRs, which bind the initiator tRNA(Met) proposed to provide the dimerization interface, were not required for packaging of full-length Ty3 RNA into VLPs. In summary, our results demonstrate that Gag3 is sufficient and required for localization and packaging of RNAs containing Ty3 UTRs and support a role for POL3 sequences, translation of which is attenuated by programmed frameshifting, in both localization and packaging of the Ty3 full-length gRNA. Copyright © 2012 Elsevier B.V. All rights reserved.
James, T C; Tata, J R
RNA from developing embryos of Artemia salina (5, 10, and 20 h after re-initiation of development) was translated 3-10 times more efficiently in a rabbit reticulocyte lysate cell-free protein synthesizing system than RNA from dormant gastrulae. The latter did not appear to contain any significant amount of translation inhibitor activity. Ninety percent of the translatable activity in dormant gastrulae was recovered as poly(A)--RNA, whereas 80% of that in post-gastrular developing embryos was present as poly(A)+-RNA. The size of most polypeptides coded for by dormant gastrular RNA was less than 130,000 daltons whereas the size of those coded for by developing embryonic RNA was up to 200,000 daltons, which correlated with a corresponding shift to poly A-containing RNA of higher molecular weight. Two major polypeptides of about 37,000 daltons coded for by dormant gastrular RNA disappeared at 20 h after resumption of development. Hybridization of complementary DNA (cDNA) to a 1000-fold excess of the homologous poly(A)+-RNA revealed the presence of three complexity classes of mRNA. Forty-five percent, 30%, and 25% of RNA in dormant gastrulae were present as high, middle, and low abundance classes comprising about 10, 80, and 9700 species, respectively whereas in the nauplii there were 10, 150, and 7900 species of high, middle, and low abundancy sequences, respectively. Heterologous hybridizations using cDNA complementary to highly abundant messenger population of nauplii (isolated by chromatography on hydroxyapatite) to poly(A)+-RNA from dormant cysts showed considerably divergence in this class of messengers from the two developmental stages. Re-initiation of development of dormant Artemia gastrulae is thus characterized by a "re-programming" seen as a simultaneous and rapid increase in the polyadenylation and translatability of poly(A)+-RNA accompanied by a qualitative change in its sequence complexity.
Xu, Xingye; Liu, Tao; Ren, Xianwen; Liu, Bo; Yang, Jian; Chen, Lihong; Wei, Candong; Zheng, Jianhua; Dong, Jie; Sun, Lilian; Zhu, Yafang; Jin, Qi
Infections caused by dermatophytes, Trichophyton rubrum in particular, are among the most common diseases in humans. In this study, we present a proteogenomic analysis of T. rubrum based on whole-genome proteomics and RNA-Seq studies. We confirmed 4291 expressed proteins in T. rubrum and validated their annotated gene structures based on 35 874 supporting peptides. In addition, we identified 323 novel peptides (not present in the current annotated protein database of T. rubrum) that can be used to enhance current T. rubrum annotations. A total of 104 predicted genes supported by novel peptides were identified, and 127 gene models suggested by the novel peptides that conflicted with existing annotations were manually assigned based on transcriptomic evidence. RNA-Seq confirmed the validity of 95% of the total peptides. Our study provides evidence that confirms and improves the genome annotation of T. rubrum and represents the first survey of T. rubrum genome annotations based on experimental evidence. Additionally, our integrated proteomics and multisourced transcriptomics approach provides stronger evidence for annotation refinement than proteomic data alone, which helps to address the dilemma of one-hit wonders (uncertainties supported by only one peptide).
Nakano, Masataka; Fukami, Tatsuki; Gotoh, Saki; Takamiya, Masataka; Aoki, Yasuhiro; Nakajima, Miki
Adenosine to inosine (A-to-I) RNA editing is the most frequent type of post-transcriptional nucleotide conversion in humans, and it is catalyzed by adenosine deaminase acting on RNA (ADAR) enzymes. In this study we investigated the effect of RNA editing on human aryl hydrocarbon receptor (AhR) expression because the AhR transcript potentially forms double-stranded structures, which are targets of ADAR enzymes. In human hepatocellular carcinoma-derived Huh-7 cells, the ADAR1 knockdown reduced the RNA editing levels in the 3'-untranslated region (3'-UTR) of the AhR transcript and increased the AhR protein levels. The ADAR1 knockdown enhanced the ligand-mediated induction of CYP1A1, a gene downstream of AhR. We investigated the possibility that A-to-I RNA editing creates miRNA targeting sites in the AhR mRNA and found that the miR-378-dependent down-regulation of AhR was abolished by ADAR1 knockdown. These results indicated that the ADAR1-mediated down-regulation of AhR could be attributed to the creation of a miR-378 recognition site in the AhR 3'-UTR. The interindividual differences in the RNA editing levels within the AhR 3'-UTR in a panel of 32 human liver samples were relatively small, whereas the differences in ADAR1 expression were large (220-fold). In the human liver samples a significant inverse association was observed between the miR-378 and AhR protein levels, suggesting that the RNA-editing-dependent down-regulation of AhR by miR-378 contributes to the variability in the constitutive hepatic expression of AhR. In conclusion, this study uncovered for the first time that A-to-I RNA editing modulates the potency of xenobiotic metabolism in the human liver. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Khalifa, Mahmoud E; Varsani, Arvind; Ganley, Austen R D; Pearson, Michael N
The advent of 'next generation sequencing' (NGS) technologies has led to the discovery of many novel mycoviruses, the majority of which are sufficiently different from previously sequenced viruses that there is no appropriate reference sequence on which to base the sequence assembly. Although many new genome sequences are generated by NGS, confirmation of the sequence by Sanger sequencing is still essential for formal classification by the International Committee for the Taxonomy of Viruses (ICTV), although this is currently under review. To empirically test the validity of de novo assembled mycovirus genomes from dsRNA extracts, we compared the results from Illumina sequencing with those from random cloning plus targeted PCR coupled with Sanger sequencing for viruses from five Sclerotinia sclerotiorum isolates. Through Sanger sequencing we detected nine viral genomes while through Illumina sequencing we detected the same nine viruses plus one additional virus from the same samples. Critically, the Illumina derived sequences share >99.3 % identity to those obtained by cloning and Sanger sequencing. Although, there is scope for errors in de novo assembled viral genomes, our results demonstrate that by maximising the proportion of viral sequence in the data and using sufficiently rigorous quality controls, it is possible to generate de novo genome sequences of comparable accuracy from Illumina sequencing to those obtained by Sanger sequencing. Copyright © 2015 Elsevier B.V. All rights reserved.
Full Text Available Coevolving residues in a multiple sequence alignment provide evolutionary clues of biophysical interactions in 3D structure. Despite a rich literature describing amino acid coevolution within or between proteins and nucleic acid coevolution within RNA, to date there has been no direct evidence of coevolution between protein and RNA. The ribosome, a structurally conserved macromolecular machine composed of over 50 interacting protein and RNA chains, provides a natural example of RNA/protein interactions that likely coevolved. We provide the first direct evidence of RNA/protein coevolution by characterizing the mutual information in residue triplets from a multiple sequence alignment of ribosomal protein L22 and neighboring 23S RNA. We define residue triplets as three positions in the multiple sequence alignment, where one position is from the 23S RNA and two positions are from the L22 protein. We show that residue triplets with high mutual information are more likely than residue doublets to be proximal in 3D space. Some high mutual information residue triplets cluster in a connected series across the L22 protein structure, similar to patterns seen in protein coevolution. We also describe RNA nucleotides for which switching from one nucleotide to another (or between purines and pyrimidines results in a change in amino acid distribution for proximal amino acid positions. Multiple crystal structures for evolutionarily distinct ribosome species can provide structural evidence for these differences. For one residue triplet, a pyrimidine in one species is a purine in another, and RNA/protein hydrogen bonds are present in one species but not the other. The results provide the first direct evidence of RNA/protein coevolution by using higher order mutual information, suggesting that biophysical constraints on interacting RNA and protein chains are indeed a driving force in their evolution.
Wu, Dong-Dong; Ye, Ling-Qun; Li, Yan; Sun, Yan-Bo; Shao, Yi; Chen, Chunyan; Zhu, Zhu; Zhong, Li; Wang, Lu; Irwin, David M; Zhang, Yong E; Zhang, Ya-Ping
Next-generation RNA sequencing has been successfully used for identification of transcript assembly, evaluation of gene expression levels, and detection of post-transcriptional modifications. Despite these large-scale studies, additional comprehensive RNA-seq data from different subregions of the human brain are required to fully evaluate the evolutionary patterns experienced by the human brain transcriptome. Here, we provide a total of 6.5 billion RNA-seq reads from different subregions of the human brain. A significant correlation was observed between the levels of alternative splicing and RNA editing, which might be explained by a competition between the molecular machineries responsible for the splicing and editing of RNA. Young human protein-coding genes demonstrate biased expression to the neocortical and non-neocortical regions during evolution on the lineage leading to humans. We also found that a significantly greater number of young human protein-coding genes are expressed in the putamen, a tissue that was also observed to have the highest level of RNA-editing activity. The putamen, which previously received little attention, plays an important role in cognitive ability, and our data suggest a potential contribution of the putamen to human evolution. © The Author (2015). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, IBCB, SIBS, CAS. All rights reserved.
Jones, Scott A; Clark, Daniel N; Cao, Feng; Tavis, John E; Hu, Jianming
Hepatitis B virus replicates a DNA genome through reverse transcription of a pregenomic RNA (pgRNA) by using a multifunctional polymerase (HP). A critical function of HP is its specific association with a viral RNA signal, termed ε (Hε), located on pgRNA, which is required for specific packaging of pgRNA into viral nucleocapsids and initiation of viral reverse transcription. HP initiates reverse transcription by using itself as a protein primer (protein priming) and Hε as the obligatory template. HP is made up of four domains, including the terminal protein (TP), the spacer, the reverse transcriptase (RT), and the RNase H domains. A recently developed, Hε-dependent, in vitro protein priming assay was used in this study to demonstrate that almost the entire TP and RT domains and most of the RNase H domain were required for protein priming. Specific residues within TP, RT, and the spacer were identified as being critical for HP-Hε binding and/or protein priming. Comparison of HP sequence requirements for Hε binding, pgRNA packaging, and protein priming allowed the classification of the HP mutants into five groups, each with distinct effects on these complex and related processes. Detailed characterization of HP requirements for these related and essential functions of HP will further elucidate the mechanisms of its multiple functions and aid in the targeting of these functions for antiviral therapy.
Full Text Available In theory and practice of information cryptographic protection one of the key problems is the forming a binary pseudo-random sequences (PRS with a maximum length with acceptable statistical characteristics. PRS generators are usually implemented by linear shift register (LSR of maximum period with linear feedback . In this paper we extend the concept of LSR, assuming that each of its rank (memory cell can be in one of the following condition. Let’s call such registers “generalized linear shift register.” The research goal is to develop algorithms for constructing Galois and Fibonacci generalized matrix of n-order over the field , which uniquely determined both the structure of corresponding generalized of n-order LSR maximal period, and formed on their basis Galois PRS generators of maximum length. Thus the article presents the questions of formation the primitive generalized Fibonacci and Galois arbitrary order matrix over the prime field . The synthesis of matrices is based on the use of irreducible polynomials of degree and primitive elements of the extended field generated by polynomial. The constructing methods of Galois and Fibonacci conjugated primitive matrices are suggested. The using possibilities of such matrices in solving the problem of constructing generalized generators of Galois pseudo-random sequences are discussed.
Yuan, Y; Altman, S
Any RNA, when in a complex with another oligoribonucleotide known as an external guide sequence (EGS), can become a substrate for ribonuclease P. Simulation of evolution in vitro was used to select EGSs that bind tightly to a target substrate messenger RNA and that increase the efficiency of cleavage of the target by human ribonuclease P to a level equal to that achieved with natural substrates. The most efficient EGSs form transfer RNA precursor-like structures with the target RNA, in which the analog of the anticodon stem has been disrupted, an indication that selection for the optimal substrate for ribonuclease P yields an RNA structure different from that of present-day transfer RNA precursors.
Bai, Baoyan; Laiho, Marikki
The nucleolus is a subcellular compartment with a key essential function in ribosome biogenesis. The nucleolus is rich in noncoding RNAs, mostly the ribosomal RNAs and small nucleolar RNAs. Surprisingly, also several miRNAs have been detected in the nucleolus, raising the question as to whether other small RNA species are present and functional in the nucleolus. We have developed a strategy for stepwise enrichment of nucleolar small RNAs from the total nucleolar RNA extracts and subsequent construction of nucleolar small RNA libraries which are suitable for deep sequencing. Our method successfully isolates the small RNA population from total RNAs and monitors the RNA quality in each step to ensure that small RNAs recovered represent the actual small RNA population in the nucleolus and not degradation products from larger RNAs. We have further applied this approach to characterize the distribution of small RNAs in different cellular compartments.
Abdul Fatah A. Samad
Full Text Available Persicaria minor (kesum is an important medicinal plant and commonly found in southeast countries; Malaysia, Thailand, Indonesia, and Vietnam. This plant is enriched with a variety of secondary metabolites (SMs, and among these SMs, terpenoids are in high abundance. Terpenoids are comprised of many valuable biomolecules which have well-established role in agriculture and pharmaceutical industry. In P. minor, for the first time, we have generated small RNAs data sets, which can be used as tool in deciphering their roles in terpenoid biosynthesis pathways. Fungal pathogen, Fusarium oxysporum was used as elicitor to trigger SMs biosynthesis in P. minor. Raw reads and small RNA analysis data have already been deposited at GenBank under the accessions; SRX2645684 (Fusarium-treated, SRX2645685 (Fusarium-treated, SRX2645686 (mock-infected, and SRX2645687 (mock-infected.
Full Text Available Conserved plant microRNAs (miRNAs modulate important biological processes but little is known about conserved cis-regulatory elements (CREs surrounding MIRNA genes. We developed a solution-based targeted genomic enrichment methodology to capture, enrich, and sequence flanking genomic regions surrounding conserved MIRNA genes with a locked-nucleic acid (LNA-modified, biotinylated probe complementary to the mature miRNA sequence. Genomic DNA bound by the probe is captured by streptavidin-coated magnetic beads, amplified, sequenced and assembled de novo to obtain genomic DNA sequences flanking MIRNA locus of interest. We demonstrate the sensitivity and specificity of this enrichment methodology in Arabidopsis thaliana to enrich targeted regions spanning 10-20 kb surrounding known MIR166 and MIR165 loci. Assembly of the sequencing reads successfully recovered all targeted loci. While further optimization for larger, more complex genomes is needed, this method may enable determination of flanking genomic DNA sequence surrounding a known core (like a conserved mature miRNA from multiple species that currently don't have a full genome assembly available.
Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua
Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.
Álvarez-Martos, Isabel; Ferapontova, Elena E
A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.
Xu, Yunpen; Zhou, Xuefeng; Zhang, Weixiong
MicroRNA (miRNAs) play essential roles in post-transcriptional gene regulation in animals and plants. Several existing computational approaches have been developed to complement experimental methods in discovery of miRNAs that express restrictively in specific environmental conditions or cell types. These computational methods require a sufficient number of characterized miRNAs as training samples, and rely on genome annotation to reduce the number of predicted putative miRNAs. However, most sequenced genomes have not been well annotated and many of them have a very few experimentally characterized miRNAs. As a result, the existing methods are not effective or even feasible for identifying miRNAs in these genomes. Aiming at identifying miRNAs from genomes with a few known miRNA and/or little annotation, we propose and develop a novel miRNA prediction method, miRank, based on our new random walks- based ranking algorithm. We first tested our method on Homo sapiens genome; using a very few known human miRNAs as samples, our method achieved a prediction accuracy greater than 95%. We then applied our method to predict 200 miRNAs in Anopheles gambiae, which is the most important vector of malaria in Africa. Our further study showed that 78 out of the 200 putative miRNA precursors encode mature miRNAs that are conserved in at least one other animal species. These conserved putative miRNAs are good candidates for further experimental study to understand malaria infection. MiRank is programmed in Matlab on Windows platform. The source code is available upon request.
Full Text Available Background/Aims: To analyze the long noncoding (lncRNA-mRNA expression network and potential roles in rat hepatic stellate cells (HSCs during activation. Methods: LncRNA expression was analyzed in quiescent and culture-activated HSCs by RNA sequencing, and differentially expressed lncRNAs verified by quantitative reverse transcription polymerase chain reaction (qRT-PCR were subjected to bioinformatics analysis. In vivo analyses of differential lncRNA-mRNA expression were performed on a rat model of liver fibrosis. Results: We identified upregulation of 12 lncRNAs and 155 mRNAs and downregulation of 12 lncRNAs and 374 mRNAs in activated HSCs. Additionally, we identified the differential expression of upregulated lncRNAs (NONRATT012636.2, NONRATT016788.2, and NONRATT021402.2 and downregulated lncRNAs (NONRATT007863.2, NONRATT019720.2, and NONRATT024061.2 in activated HSCs relative to levels observed in quiescent HSCs, and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses showed that changes in lncRNAs associated with HSC activation revealed 11 significantly enriched pathways according to their predicted targets. Moreover, based on the predicted co-expression network, the relative dynamic levels of NONRATT013819.2 and lysyl oxidase (Lox were compared during HSC activation both in vitro and in vivo. Our results confirmed the upregulation of lncRNA NONRATT013819.2 and Lox mRNA associated with the extracellular matrix (ECM-related signaling pathway in HSCs and fibrotic livers. Conclusion: Our results detailing a dysregulated lncRNA-mRNA network might provide new treatment strategies for hepatic fibrosis based on findings indicating potentially critical roles for NONRATT013819.2 and Lox in ECM remodeling during HSC activation.
Nov 27, 2015 ... In the matrix model of RNA [G Vernizzi, H Orland and A Zee, Phys. Rev. Lett. 94, 168103 (2005)] we introduce external interactions on n bases in the action of the partition function where ≤ and is the length of the polymer chain. The RNA structures found in the model can be separated into two ...
Abstract. In the matrix model of RNA [G Vernizzi, H Orland and A Zee, Phys. Rev. Lett. 94, 168103 (2005)] we introduce external interactions on n bases in the action of the partition function where n ≤ L and L is the length of the polymer chain. The RNA structures found in the model can be separated into two regimes: (i) 0 ...
Nolte-'t Hoen, Esther N M; Buermans, Henk P J; Waasdorp, Maaike; Stoorvogel, Willem; Wauben, Marca H M; 't Hoen, Peter A C
Cells release RNA-carrying vesicles and membrane-free RNA/protein complexes into the extracellular milieu. Horizontal vesicle-mediated transfer of such shuttle RNA between cells allows dissemination of genetically encoded messages, which may modify the function of target cells. Other studies used array analysis to establish the presence of microRNAs and mRNA in cell-derived vesicles from many sources. Here, we used an unbiased approach by deep sequencing of small RNA released by immune cells. We found a large variety of small non-coding RNA species representing pervasive transcripts or RNA cleavage products overlapping with protein coding regions, repeat sequences or structural RNAs. Many of these RNAs were enriched relative to cellular RNA, indicating that cells destine specific RNAs for extracellular release. Among the most abundant small RNAs in shuttle RNA were sequences derived from vault RNA, Y-RNA and specific tRNAs. Many of the highly abundant small non-coding transcripts in shuttle RNA are evolutionary well-conserved and have previously been associated to gene regulatory functions. These findings allude to a wider range of biological effects that could be mediated by shuttle RNA than previously expected. Moreover, the data present leads for unraveling how cells modify the function of other cells via transfer of specific non-coding RNA species.
Jensen, R C; Wang, Y; Hardin, S B; Stumph, W E
Most small nuclear RNA (snRNA) genes are transcribed by RNA polymerase II, but some (e.g., U6) are transcribed by RNA polymerase III. In vertebrates a TATA box at a fixed distance downstream of the proximal sequence element (PSE) acts as a dominant determinant for recruiting RNA polymerase III to U6 gene promoters. In contrast, vertebrate snRNA genes that contain a PSE but lack a TATA box are transcribed by RNA polymerase II. In plants, transcription of both classes of snRNA genes requires a TATA box in addition to an upstream sequence element (USE), and polymerase specificity is determined by the spacing between these two core promoter elements. In these examples, the PSE (or USE) is interchangeable between the two classes of snRNA genes. Here we report the surprising finding that the Drosophila U1 and U6 PSEs cannot functionally substitute for each other; rather, determination of RNA polymerase specificity is an intrinsic property of the PSE sequence itself. The alteration of two or three base pairs near the 3'-end of the U1 and U6 PSEs was sufficient to switch the RNA polymerase specificity of Drosophila snRNA promoters in vitro. These findings reveal a novel mechanism for achieving RNA polymerase specificity at insect snRNA promoters.
Cheng, Wei-Chung; Chung, I-Fang; Tsai, Cheng-Fong; Huang, Tse-Shun; Chen, Chen-Yang; Wang, Shao-Chuan; Chang, Ting-Yu; Sun, Hsing-Jen; Chao, Jeffrey Yung-Chuan; Cheng, Cheng-Chung; Wu, Cheng-Wen; Wang, Hsei-Wei
We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function 'Meta-analysis' is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lambert, Nicole; Robertson, Alex; Jangi, Mohini; McGeary, Sean; Sharp, Phillip A.; Burge, Christopher B.
Summary Specific protein-RNA interactions guide post-transcriptional gene regulation. Here we describe RNA Bind-n-Seq (RBNS), a method that comprehensively characterizes sequence and structural specificity of RNA binding proteins (RBPs), and its application to the developmental alternative splicing factors RBFOX2, CELF1/CUGBP1 and MBNL1. For each factor, we recovered both canonical motifs and additional near-optimal binding motifs. RNA secondary structure inhibits binding of RBFOX2 and CELF1, while MBNL1 favors unpaired Us but tolerates C/G pairing in motifs containing UGC and/or GCU. Dissociation constants calculated from RBNS data using a novel algorithm correlated highly with values measured by surface plasmon resonance. Motifs identified by RBNS were conserved, were bound and active in vivo, and distinguished the subset of motifs enriched by CLIP-Seq that had regulatory activity. Together, our data demonstrate that RBNS complements crosslinking-based methods and show that in vivo binding and activity of these splicing factors is driven largely by intrinsic RNA affinity. PMID:24837674
Willenbrock, Hanni; Salomon, Jesper; Søkilde, Rolf
Recently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two...... technologies have only been compared based on biological data, leading to the conclusion that, although they are somewhat correlated, expression values differ significantly. Here, we use synthetic RNA samples, resembling human microRNA samples, to find that microarray expression measures actually correlate...... better with sample RNA content than expression measures obtained from sequencing data. In addition, microarrays appear highly sensitive and perform equivalently to next-generation sequencing in terms of reproducibility and relative ratio quantification....
RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.
Nielsen, M; Hansen, J H; Hedegaard, J
MicroRNAs (miRNA) are short single-stranded RNA molecules that regulate gene expression post-transcriptionally by binding to complementary sequences in the 3' untranslated region (3' UTR) of target mRNAs. MiRNAs participate in the regulation of myogenesis, and identification of the complete set o...... that highly expressed miRNAs are involved in skeletal muscle development and regeneration, signal transduction, cell-cell and cell-extracellular matrix communication and neural development and function....
Wang Xinguo; Li Lang; Shen Changyu; Wang Guohua; Wang Xin; Mooney Sean D; Edenberg Howard J; Sanford Jeremy R; Liu Yunlong
Abstract Massively parallel pyrosequencing is a high-throughput technology that can sequence hundreds of thousands of DNA/RNA fragments in a single experiment. Combining it with immunoprecipitation-based biochemical assays, such as cross-linking immunoprecipitation (CLIP), provides a genome-wide method to detect the sites at which proteins bind DNA or RNA. In a CLIP-pyrosequencing experiment, the resolutions of the detected protein binding regions are partially determined by the length of the...
The main objectives of this work have been to use next generation sequencing (NGS) and develop bioinformatics tools for plant virus diagnostics and genome reconstruction as well as for investigation of RNA silencing-based antiviral defense. In virus-infected plants, the host Dicer-like (DCL) enzymes process viral double-stranded RNAs into 21-24 nucleotide (nt) short interfering RNAs (siRNAs) which can potentially associate with Argonaute (AGO) proteins and guide the resulting RNA-induce silen...
Magdy S Alabady
Full Text Available We describe restriction site associated RNA sequencing (RARseq, an RNAseq-based genotype by sequencing (GBS method. It includes the construction of RNAseq libraries from double stranded cDNA digested with selected restriction enzymes. To test this, we constructed six single- and six-dual-digested RARseq libraries from six F2 pitcher plant individuals and sequenced them on a half of a Miseq run. On average, the de novo approach of population genome analysis detected 544 and 570 RNA SNPs, whereas the reference transcriptome-based approach revealed an average of 1907 and 1876 RNA SNPs per individual, from single- and dual-digested RARseq data, respectively. The average numbers of RNA SNPs and alleles per loci are 1.89 and 2.17, respectively. Our results suggest that the RARseq protocol allows good depth of coverage per loci for detecting RNA SNPs and polymorphic loci for population genomics and mapping analyses. In non-model systems where complete genomes sequences are not always available, RARseq data can be analyzed in reference to the transcriptome. In addition to enriching for functional markers, this method may prove particularly useful in organisms where the genomes are not favorable for DNA GBS.
Drory Retwitzer, Matan; Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme; Barash, Danny
In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi
Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.
Seemann, Ernst Stefan; Richter, Andreas S.; Gorodkin, Jan
Background: Many regulatory non-coding RNAs (ncRNAs) function through complementary binding with mRNAs or other ncRNAs, e.g., microRNAs, snoRNAs and bacterial sRNAs. Predicting these RNA interactions is essential for functional studies of putative ncRNAs or for the design of artificial RNAs. Many...... pairs. As a proof of concept, we show an example and discuss the strengths and weaknesses of the approach....
Xu, Pinsan; Li, Huangai; Liu, Jiwen; Luan, Yushi; Yin, Yalei; Bai, Jianfang
The DNA sequence of the RNA-dependent RNA polymerase (RdRp) gene of lily symptomless virus (LSV), a lily-infecting member of the genus Carlavirus, was determined from nine overlapping cDNA fragments of different sizes. The complete sequence of this RdRp gene (HM070294) consisted of 5,847 nucleotides coding for a protein of 220 kDa. It had 97-98% sequence identity with RdRps of other known isolates at both the DNA and the amino acid level. Phylogenetic analysis indicated that this RdRp (designated as RdRp-DL) was closely related to the RdRp of the Korean isolate (AM516059), as well as to the RdRps from Passiflora latent virus (PLV) and Kalanchoe latent virus (KLV) of the genus Carlavirus. Hydrophobic analysis of RdRp-DL revealed a hydrophobic N-terminus and a hydrophilic C-terminus. Helices and Loops were the major secondary structures of RdRp-DL. In addition, RdRp-DL also had three coil structures. Four conserved domains were identified: typoviral methyltransferase, RNA-dependent RNA polymerase, P-loop-containing nucleoside triphosphate hydrolases and carlavirus endopeptidase. A model of the tertiary structure predicted by I-TASSER was obtained for each of these conserved domains. This is the first report of a detailed phylogenetic analysis of LSV RdRp with those of other members of the genus Carlavirus, and the first to predict the domain structures of LSV RdRp.
Chen, Ho-Ming; Wu, Shu-Hsing
Small nucleolar RNAs (snoRNAs) are noncoding RNAs that direct 2?-O-methylation or pseudouridylation on ribosomal RNAs or spliceosomal small nuclear RNAs. These modifications are needed to modulate the activity of ribosomes and spliceosomes. A comprehensive repertoire of snoRNAs is needed to expand the knowledge of these modifications. The sequences corresponding to snoRNAs in 18?26-nt small RNA sequencing data have been rarely explored and remain as a hidden treasure for snoRNA annotation. He...
Jiang, Yuchao; Zhang, Nancy R; Li, Mingyao
Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that cis control in gene expression overwhelmingly manifests as differences in burst frequency.
Chung, I-Fang; Chang, Shing-Jyh; Chen, Chen-Yang; Liu, Shu-Hsuan; Li, Chia-Yang; Chan, Chia-Hao; Shih, Chuan-Chi; Cheng, Wei-Chung
We previously presented the YM500 database, which contains >8000 small RNA sequencing (smRNA-seq) data sets and integrated analysis results for various cancer miRNome studies. In the updated YM500v3 database (http://ngs.ym.edu.tw/ym500/) presented herein, we not only focus on miRNAs but also on other functional small non-coding RNAs (sncRNAs), such as PIWI-interacting RNAs (piRNAs), tRNA-derived fragments (tRFs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). There is growing knowledge of the role of sncRNAs in gene regulation and tumorigenesis. We have also incorporated >10 000 cancer-related RNA-seq and >3000 more smRNA-seq data sets into the YM500v3 database. Furthermore, there are two main new sections, 'Survival' and 'Cancer', in this updated version. The 'Survival' section provides the survival analysis results in all cancer types or in a user-defined group of samples for a specific sncRNA. The 'Cancer' section provides the results of differential expression analyses, miRNA-gene interactions and cancer miRNA-related pathways. In the 'Expression' section, sncRNA expression profiles across cancer and sample types are newly provided. Cancer-related sncRNAs hold potential for both biotech applications and basic research. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Kumar, Ashwini; Kankainen, Matti; Parsons, Alun; Kallioniemi, Olli; Mattila, Pirkko; Heckman, Caroline A
RNA sequencing (RNA-seq) has become an indispensable tool to identify disease associated transcriptional profiles and determine the molecular underpinnings of diseases. However, the broad adaptation of the methodology into the clinic is still hampered by inconsistent results from different RNA-seq protocols and involves further evaluation of its analytical reliability using patient samples. Here, we applied two commonly used RNA-seq library preparation protocols to samples from acute leukemia patients to understand how poly-A-tailed mRNA selection (PA) and ribo-depletion (RD) based RNA-seq library preparation protocols affect gene fusion detection, variant calling, and gene expression profiling. Overall, the protocols produced similar results with consistent outcomes. Nevertheless, the PA protocol was more efficient in quantifying expression of leukemia marker genes and showed better performance in the expression-based classification of leukemia. Independent qRT-PCR experiments verified that the PA protocol better represented total RNA compared to the RD protocol. In contrast, the RD protocol detected a higher number of non-coding RNA features and had better alignment efficiency. The RD protocol also recovered more known fusion-gene events, although variability was seen in fusion gene predictions. The overall findings provide a framework for the use of RNA-seq in a precision medicine setting with limited number of samples and suggest that selection of the library preparation protocol should be based on the objectives of the analysis.
Skovgaard, Alf; Meyer, Stefan; Overton, Julia Lynne
An enigmatic protistan endoparasite found in eggs and larvae of cod Gadus morhua and turbot Psetta maxima was isolated from Baltic cod larvae, and DNA was extracted for sequencing of the parasite's small Subunit ribosomal RNA (SSU rRNA) gene. The endoparasite has previously been suggested...... to be related to Ichthyodinium chabelardi, a dinoflagellate-like protist that parasitizes yolk sacs of embryos and larvae of a variety of fish species. Comparison of a 1535 bp long fragment of the SSU rRNA gene of the cod endoparasite showed absolute identify with I. chabelardi, demonstrating that the 2...
Gautam, Aarti; Kumar, Raina; Dimitrov, George; Hoke, Allison; Hammamieh, Rasha; Jett, Marti
miRNAs act as important regulators of gene expression by promoting mRNA degradation or by attenuating protein translation. Since miRNAs are stably expressed in bodily fluids, there is growing interest in profiling these miRNAs, as it is minimally invasive and cost-effective as a diagnostic matrix. A technical hurdle in studying miRNA dynamics is the ability to reliably extract miRNA as small sample volumes and low RNA abundance create challenges for extraction and downstream applications. The purpose of this study was to develop a pipeline for the recovery of miRNA using small volumes of archived serum samples. The RNA was extracted employing several widely utilized RNA isolation kits/methods with and without addition of a carrier. The small RNA library preparation was carried out using Illumina TruSeq small RNA kit and sequencing was carried out using Illumina platform. A fraction of five microliters of total RNA was used for library preparation as quantification is below the detection limit. We were able to profile miRNA levels in serum from all the methods tested. We found out that addition of nucleic acid based carrier molecules had higher numbers of processed reads but it did not enhance the mapping of any miRBase annotated sequences. However, some of the extraction procedures offer certain advantages: RNA extracted by TRIzol seemed to align to the miRBase best; extractions using TRIzol with carrier yielded higher miRNA-to-small RNA ratios. Nuclease free glycogen can be carrier of choice for miRNA sequencing. Our findings illustrate that miRNA extraction and quantification is influenced by the choice of methodologies. Addition of nucleic acid- based carrier molecules during extraction procedure is not a good choice when assaying miRNA using sequencing. The careful selection of an extraction method permits the archived serum samples to become valuable resources for high-throughput applications.
Argyropoulos, Christos; Etheridge, Alton; Sakhanenko, Nikita; Galas, David
The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available The Chinese swamp buffalo (Bubalis bubalis is vital to the lives of small farmers and has tremendous economic importance. However, a lack of genomic information has hampered research on augmenting marker assisted breeding programs in this species. Thus, a high-throughput transcriptomic sequencing of B. bubalis was conducted to generate transcriptomic sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing generated a total of 54,109,173 raw reads. After trimming, de novo assembly was performed, which yielded 86,017 unigenes, with an average length of 972.41 bp, an N50 of 1,505 bp, and an average GC content of 49.92%. A total of 62,337 unigenes were successfully annotated. Among the annotated unigenes, 27,025 (43.35% and 23,232 (37.27% unigenes showed significant similarity to known proteins in NCBI non-redundant protein and Swiss-Prot databases (E-value < 1.0E-5, respectively. Of these annotated unigenes, 14,439 and 15,813 unigenes were assigned to the Gene Ontology (GO categories and EuKaryotic Ortholog Group (KOG cluster, respectively. In addition, a total of 14,167 unigenes were assigned to 331 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Furthermore, 17,401 simple sequence repeats (SSRs were identified as potential molecular markers. One hundred and fifteen primer pairs were randomly selected for amplification to detect polymorphisms. The results revealed that 110 primer pairs (95.65% yielded PCR amplicons and 69 primer pairs (60.00% presented polymorphisms in 35 individual buffaloes. A phylogenetic analysis showed that the five swamp buffalo populations were clustered together, whereas two river buffalo breeds clustered separately. In the present study, the Illumina RNA-seq technology was utilized to perform transcriptome analysis and SSR marker discovery in the swamp buffalo without using a reference genome. Our findings will enrich the current SSR markers resources and help spearhead
Full Text Available Abstract Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a
Full Text Available Small RNAs (sRNAs of 20 to 25 nucleotides (nt in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.. sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive.
Dewey Jonathan D
Full Text Available Abstract Background tmRNA acts first as a tRNA and then as an mRNA to rescue stalled ribosomes in eubacteria. Two unanswered questions about tmRNA function remain: how does tmRNA, lacking an anticodon, bypass the decoding machinery and enter the ribosome? Secondly, how does the ribosome choose the proper codon to resume translation on tmRNA? According to the -1 triplet hypothesis, the answer to both questions lies in the unique properties of the three nucleotides upstream of the first tmRNA codon. These nucleotides assume an A-form conformation that mimics the codon-anticodon interaction, leading to recognition by the decoding center and choice of the reading frame. The -1 triplet hypothesis is important because it is the most credible model in which direct binding and recognition by the ribosome sets the reading frame on tmRNA. Results Conformational analysis predicts that 18 triplets cannot form the correct structure to function as the -1 triplet of tmRNA. We tested the tmRNA activity of all possible -1 triplet mutants using a genetic assay in Escherichia coli. While many mutants displayed reduced activity, our findings do not match the predictions of this model. Additional mutagenesis identified sequences further upstream that are required for tmRNA function. An immunoblot assay for translation of the tmRNA tag revealed that certain mutations in U85, A86, and the -1 triplet sequence result in improper selection of the first codon and translation in the wrong frame (-1 or +1 in vivo. Conclusion Our findings disprove the -1 triplet hypothesis. The -1 triplet is not required for accommodation of tmRNA into the ribosome, although it plays a minor role in frame selection. Our results strongly disfavor direct ribosomal recognition of the upstream sequence, instead supporting a model in which the binding of a separate ligand to A86 is primarily responsible for frame selection.
Stocks, Matthew B; Moxon, Simon; Mapleson, Daniel; Woolfenden, Hugh C; Mohorianu, Irina; Folkes, Leighton; Schwach, Frank; Dalmay, Tamas; Moulton, Vincent
RNA silencing is a complex, highly conserved mechanism mediated by small RNAs (sRNAs), such as microRNAs (miRNAs), that is known to be involved in a diverse set of biological functions including development, pathogen control, genome maintenance and response to environmental change. Advances in next generation sequencing technologies are producing increasingly large numbers of sRNA reads per sample at a fraction of the cost of previous methods. However, many bioinformatics tools do not scale accordingly, are cumbersome, or require extensive support from bioinformatics experts. Therefore, researchers need user-friendly, robust tools, capable of not only processing large sRNA datasets in a reasonable time frame but also presenting the results in an intuitive fashion and visualizing sRNA genomic features. Herein, we present the UEA sRNA workbench, a suite of tools that is a successor to the web-based UEA sRNA Toolkit, but in downloadable format and with several enhanced and additional features. The program and help pages are available at http://srna-workbench.cmp.uea.ac.uk. email@example.com.
Kumar, Shiva; Ansari, Faraz A; Scaria, Vinod
MicroRNAs (small approximately 22 nucleotide long non-coding endogenous RNAs) have recently attracted immense attention as critical regulators of gene expression in multi-cellular eukaryotes, especially in humans. Recent studies have proved that viruses also express microRNAs, which are thought to contribute to the intricate mechanisms of host-pathogen interactions. Computational predictions have greatly accelerated the discovery of microRNAs. However, most of these widely used tools are dependent on structural features and sequence conservation which limits their use in discovering novel virus expressed microRNAs and non-conserved eukaryotic microRNAs. In this work an efficient prediction method is developed based on the hypothesis that sequence and structure features which discriminate between host microRNA precursor hairpins and pseudo microRNAs are shared by viral microRNA as they depend on host machinery for the processing of microRNA precursors. The proposed method has been found to be more efficient than recently reported ab-initio methods for predicting viral microRNAs and microRNAs expressed by mammals.
Kadri, Sabah; Hinman, Veronica F; Benos, Panayiotis V
microRNAs (miRNAs) are small (20-23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html. © 2011 Kadri et al.
Full Text Available microRNAs (miRNAs are small (20-23 nt, non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin and Patiria miniata (sea star are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc. to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads. Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common. We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html.
Kadri, Sabah; Hinman, Veronica F.; Benos, Panayiotis V.
microRNAs (miRNAs) are small (20–23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html. PMID:22216218
Lee, E; Nestorowicz, A; Marshall, I D; Weir, R C; Dalgarno, L
A method is described for direct sequence analysis of selected regions of dengue virus genomic RNA in infected tissues. Using specific primers, total high-molecular-weight infected-cell RNA is reverse transcribed to single-stranded (ss) complementary DNA, amplified using the polymerase chain reaction (PCR) and sequenced using ssDNA obtained after lambda exonuclease digestion of one strand of the PCR product (R.G. Higuchi and H. Ochman, Nucleic Acids Research, 17, 5865, 1989). Sequence data for the envelope protein gene of two dengue-3 virus isolates were obtained using RNA from small numbers (10(5)) of cultured mosquito or monkey kidney cells, from one mg of infected mouse brain and from 1/300th of an infected Toxorhynchites amboinensis mosquito. Independent determinations showed that errors occurring during reverse transcription or PCR were not represented to a significant degree in the sequence of the amplified DNA. The method does not depend on extensive passaging of virus or large-scale growth to generate material for sequencing and therefore provides a means of obtaining sequence data for unadapted dengue virus isolates.
Garcia, A; Montoya, R; Bello, H; Gonzalez, G; Dominguez, M; Zemelman, R
Isolates of Acinetobacter baumannii (32 strains) from blood samples obtained from patients in five Chilean hospitals were identified and biotyped according to their phenotypic properties. They were also submitted to random amplified polymorphic DNA (RAPD) using eight randomly designed 10-mers and the core sequence of M13 phage (15-mers) as well as amplification of the spacer regions between 16S and 23S genes in the prokaryotic rRNA genetic loci. With some primers, RAPD discriminated between biotypes, whereas with others each isolate showed a particular profile. When amplification of spacer regions was performed, a clear correlation between patterns and biotypes was found. This last technique allowed correct biotyping of clinical isolates. Both genetic methods might be used for the identification of A. baumannii biotypes.
My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.
Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.
Wang, Ruiying; Zheng, Han; Preamplume, Gan; Shao, Yaming; Li, Hong [FSU
The repeat-associated mysterious proteins (RAMPs) comprise the most abundant family of proteins involved in prokaryotic immunity against invading genetic elements conferred by the clustered regularly interspaced short palindromic repeat (CRISPR) system. Cas6 is one of the first characterized RAMP proteins and is a key enzyme required for CRISPR RNA maturation. Despite a strong structural homology with other RAMP proteins that bind hairpin RNA, Cas6 distinctly recognizes single-stranded RNA. Previous structural and biochemical studies show that Cas6 captures the 5' end while cleaving the 3' end of the CRISPR RNA. Here, we describe three structures and complementary biochemical analysis of a noncatalytic Cas6 homolog from Pyrococcus horikoshii bound to CRISPR repeat RNA of different sequences. Our study confirms the specificity of the Cas6 protein for single-stranded RNA and further reveals the importance of the bases at Positions 5-7 in Cas6-RNA interactions. Substitutions of these bases result in structural changes in the protein-RNA complex including its oligomerization state.
Ee Uli, Joey; Yong, Christina Seok Yien; Yeap, Swee Keong; Rovie-Ryan, Jeffrine J; Mat Isa, Nurulfiza; Tan, Soon Guan; Alitheen, Noorjahan Banu
The cynomolgus macaque ( Macaca fascicularis ) is an extensively utilised nonhuman primate model for biomedical research due to its biological, behavioural, and genetic similarities to humans. Genomic information of cynomolgus macaque is vital for research in various fields; however, there is presently a shortage of genomic information on the Malaysian cynomolgus macaque. This study aimed to sequence, assemble, annotate, and profile the Peninsular Malaysian cynomolgus macaque transcriptome derived from three tissues (lymph node, spleen, and thymus) using RNA sequencing (RNA-Seq) technology. A total of 174,208,078 paired end 70 base pair sequencing reads were obtained from the Illumina Hi-Seq 2500 sequencer. The overall mapping percentage of the sequencing reads to the M. fascicularis reference genome ranged from 53-63%. Categorisation of expressed genes to Gene Ontology (GO) and KEGG pathway categories revealed that GO terms with the highest number of associated expressed genes include Cellular process, Catalytic activity, and Cell part, while for pathway categorisation, the majority of expressed genes in lymph node, spleen, and thymus fall under the Global overview and maps pathway category, while 266, 221, and 138 genes from lymph node, spleen, and thymus were respectively enriched in the Immune system category. Enriched Immune system pathways include Platelet activation pathway, Antigen processing and presentation, B cell receptor signalling pathway, and Intestinal immune network for IgA production. Differential gene expression analysis among the three tissues revealed 574 differentially expressed genes (DEG) between lymph and spleen, 5402 DEGs between lymph and thymus, and 7008 DEGs between spleen and thymus. Venn diagram analysis of expressed genes revealed a total of 2,630, 253, and 279 tissue-specific genes respectively for lymph node, spleen, and thymus tissues. This is the first time the lymph node, spleen, and thymus transcriptome of the Peninsular
Full Text Available Extensive genome-wide transcriptome study mediated by high throughput sequencing technique has revolutionized the study of genetics and epigenetic at unprecedented resolution. The research has revealed that besides protein-coding RNAs, large proportions of mammalian transcriptome includes a heap of regulatory non protein-coding RNAs, the number encoded within human genome is enigmatic. Many taboos developed in the past categorized these non-coding RNAs as ââdark matterâ and âjunksâ. Breaking the myth, RNA-seq-- a recently developed experimental technique is widely being used for studying non-coding RNAs which has acquired the limelight due to their physiological and pathological significance. The longest member of the ncRNA family-- long non-coding RNAs, acts as stable and functional part of a genome, guiding towards the important clues about the varied biological events like cellular-, structural- processes governing the complexity of an organism. Here, we review the most recent and influential computational approach developed to identify and quantify the long non-coding RNAs serving as an assistant for the users to choose appropriate tools for their specific research. Keywords: Transcriptome, High throughput sequencing, Genetic and epigenetic, Long non-coding RNA, RNA-sequencing, RNA-seq
Leung, Yuk Yee; Ryvkin, Paul; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San
The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.
Full Text Available Animal miRNAs are a large class of small regulatory RNAs that are known to directly and negatively regulate the expression of a large fraction of all protein encoding genes. The identification and characterization of miRNA targets is thus a fundamental problem in biology. miRNAs regulate target genes by binding to 3' untranslated regions (3'UTRs of target mRNAs, and multiple binding sites for the same miRNA in 3'UTRs can strongly enhance the degree of regulation. Recent experiments have demonstrated that a large fraction of miRNA binding sites reside in coding sequences. Overall, miRNA binding sites in coding regions were shown to mediate smaller regulation than 3'UTR binding. However, possible interactions between target sites in coding sequences and 3'UTRs have not been studied. Using transcriptomics and proteomics data of ten miRNA mis-expression experiments as well as transcriptome-wide experimentally identified miRNA target sites, we found that mRNA and protein expression of genes containing target sites both in coding regions and 3'UTRs were in general mildly but significantly more regulated than those containing target sites in 3'UTRs only. These effects were stronger for conserved target sites of length 7-8 nt in coding regions compared to non-conserved sites. Combined with our other finding that miRNA target sites in coding regions are under negative selection, our results shed light on the functional importance of miRNA targeting in coding regions.
Christensen, Henrik; Dziva, Francis; Olsen, John Elmerdahl; Bisgaard, Magne
Forty-five strains mainly isolated from chickens in Zimbabwe and Denmark, two pig and three rat isolates all identified as Pasteurella gallinarum by conventional phenotypic tests were characterized by ribotyping, and selected strains were subsequently analysed by 16S rRNA gene sequencing. High genotypic diversity was observed, the number of ribotypes totalling 24. A major group of 47 isolates including the type strain of P. gallinarum clustered at 56% similarity and included 21 ribotypes. Ribotyping showed that some genotypes of P. gallinarum seem to be globally distributed. The three isolates from rodents did not share even a single common ribotype fragment with strains from birds and the pig isolates. Two avian isolates from Denmark and Zimbabwe and the pig strain showed from 97.6 to 99.8% 16S rRNA sequence similarity with the type strain of P. gallinarum and with type strains of Pasteurella volantium and Pasteurella avium. Two rat strains showed 98.6% 16S rRNA gene sequence similarity with each other, but were only related with P. gallinarum at 93% similarity. These isolates showed the highest similarity with [Actinobacillus] muris at 96.4 to 95.0% similarity. We suggest that conventional identification of P. gallinarum consequently should consider the source of isolation to obtain a correct diagnosis, and that isolation from animals other than fowl should be confirmed by genotypic analysis such as 16S rRNA gene sequence comparison.
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol...
Birkedal, Ulf; Christensen-Dalsgaard, Mikkel; Krogh, Nicolai
Ribose methylations are the most abundant chemical modifications of ribosomal RNA and are critical for ribosome assembly and fidelity of translation. Many aspects of ribose methylations have been difficult to study due to lack of efficient mapping methods. Here, we present a sequencing-based meth...
The phylogenetic relationships among thirteen Rhizobium leguminosarum bv. viciae isolates collected from various geographical regions were studied by analysis of the 23S rRNA sequences. The average of genetic distance among the studied isolates was very narrow (ranged from 0.00 to 0.04) and the studied isolates ...
Fantini, Elio; Gianese, Giulio; Giuliano, Giovanni; Fiore, Alessia
Ion Torrent is a next generation sequencing technology based on the detection of hydrogen ions produced during DNA chain elongation; this technology allows analyzing and characterizing genomes, genes, and species. Here, we describe an Ion Torrent procedure applied to the metagenomic analysis of 16S rRNA gene amplicons to study the bacterial diversity in food and environmental samples.
Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying
Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.
Candice N Hansey
Full Text Available Maize is rich in genetic and phenotypic diversity. Understanding the sequence, structural, and expression variation that contributes to phenotypic diversity would facilitate more efficient varietal improvement. RNA based sequencing (RNA-seq is a powerful approach for transcriptional analysis, assessing sequence variation, and identifying novel transcript sequences, particularly in large, complex, repetitive genomes such as maize. In this study, we sequenced RNA from whole seedlings of 21 maize inbred lines representing diverse North American and exotic germplasm. Single nucleotide polymorphism (SNP detection identified 351,710 polymorphic loci distributed throughout the genome covering 22,830 annotated genes. Tight clustering of two distinct heterotic groups and exotic lines was evident using these SNPs as genetic markers. Transcript abundance analysis revealed minimal variation in the total number of genes expressed across these 21 lines (57.1% to 66.0%. However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines. De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines. RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts. Intriguingly, 145 of the novel de novo assembled loci were present in lines from only one of the two heterotic groups consistent with the hypothesis that, in addition to sequence polymorphisms and transcript abundance, transcript presence/absence variation is present and, thereby, may be a mechanism contributing to the genetic basis of heterosis.
Li, Sung-Chou; Chan, Wen-Ching; Ho, Meng-Ru; Tsai, Kuo-Wang; Hu, Ling-Yueh; Lai, Chun-Hung; Hsu, Chun-Nan; Hwang, Pung-Pung; Lin, Wen-chang
MicroRNAs (miRNAs) are endogenous non-protein-coding RNA genes which exist in a wide variety of organisms, including animals, plants, virus and even unicellular organisms. Medaka (Oryzias latipes) is a useful model organism among vertebrate animals. However, no medaka miRNAs have been investigated systematically. It is beneficial to conduct a genome-wide miRNA discovery study using the next generation sequencing (NGS) technology, which has emerged as a powerful sequencing tool for high-throughput analysis. In this study, we adopted ABI SOLiD platform to generate small RNA sequence reads from medaka tissues, followed by mapping these sequence reads back to medaka genome. The mapped genomic loci were considered as candidate miRNAs and further processed by a support vector machine (SVM) classifier. As result, we identified 599 novel medaka pre-miRNAs, many of which were found to encode more than one isomiRs. Besides, additional minor miRNAs (also called miRNA star) can be also detected with the improvement of sequencing depth. These quantifiable isomiRs and minor miRNAs enable us to further characterize medaka miRNA genes in many aspects. First of all, many medaka candidate pre-miRNAs position close to each other, forming many miRNA clusters, some of which are also conserved across other vertebrate animals. Secondly, during miRNA maturation, there is an arm selection preference of mature miRNAs within precursors. We observed the differences on arm selection preference between our candidate pre-miRNAs and their orthologous ones. We classified these differences into three categories based on the distribution of NGS reads. Finally, we also investigated the relationship between conservation status and expression level of miRNA genes. We concluded that the evolutionally conserved miRNAs were usually the most abundant ones. Medaka is a widely used model animal and usually involved in many biomedical studies, including the ones on development biology. Identifying and
Le Rhun, Anaïs; Beer, Yan Yan; Reimegård, Johan; Chylinski, Krzysztof; Charpentier, Emmanuelle
Streptococcus pyogenes is a human pathogen responsible for a wide spectrum of diseases ranging from mild to life-threatening infections. During the infectious process, the temporal and spatial expression of pathogenicity factors is tightly controlled by a complex network of protein and RNA regulators acting in response to various environmental signals. Here, we focus on the class of small RNA regulators (sRNAs) and present the first complete analysis of sRNA sequencing data in S. pyogenes. In the SF370 clinical isolate (M1 serotype), we identified 197 and 428 putative regulatory RNAs by visual inspection and bioinformatics screening of the sequencing data, respectively. Only 35 from the 197 candidates identified by visual screening were assigned a predicted function (T-boxes, ribosomal protein leaders, characterized riboswitches or sRNAs), indicating how little is known about sRNA regulation in S. pyogenes. By comparing our list of predicted sRNAs with previous S. pyogenes sRNA screens using bioinformatics or microarrays, 92 novel sRNAs were revealed, including antisense RNAs that are for the first time shown to be expressed in this pathogen. We experimentally validated the expression of 30 novel sRNAs and antisense RNAs. We show that the expression profile of 9 sRNAs including 2 predicted regulatory elements is affected by the endoribonucleases RNase III and/or RNase Y, highlighting the critical role of these enzymes in sRNA regulation.
Shi, Jieming; Li, Xi; Dong, Min; Graham, Mitchell; Yadav, Nehul; Liang, Chun
Dong, Min; Graham, Mitchell; Yadav, Nehul
Sun, Ye; Fung, Kwok-Pui; Leung, Ping-Chung; Shi, Dawen; Shaw, Pang-Chui
Sequences of 5S rRNA gene spacer were used to identify Epimedium brevicornu Maxim., E. sagittatum (Sieb. et Zucc.) Maxim., E. wushanense T. S. Ying, E. pubescens Maxim., and E. koreanum Nakai. These species are listed as source plants of Chinese medicine 'Ying Yang Huo' in the Chinese Pharmacopoeia. The neighbor-joining method was used in a sequence analysis of Epimedium species. A position-specific nucleotide was found in the 5S rRNA gene spacer for E. pubescens, E. wushanense, and E. brevicornu. A 19-bp deletion was found for E. koreanum in the 5S rRNA gene spacer. E. koreanum was most divergent from the other four endemic Chinese species of Epimedium.
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch
Murray, M T; Schiller, D L; Franke, W W
Storage of maternal mRNAs as nontranslated ribonucleoprotein (RNP) complexes is an adaptive strategy in various vertebrate and invertebrate oocytes, for rapid translational recruitment during embryonic development. Previously, we showed that Xenopus laevis oocytes have a soluble cytoplasmic pool of mRNA-binding proteins and particles competent for messenger RNP assembly in vitro. Here we report the isolation of cDNAs for the most abundant messenger RNPs, the 54- and 56-kDa polypeptide (p54/p56) components of the approximately 6S mRNA-binding particle, from an ovarian expression library. The nucleotide sequence of p56 cDNA is almost identical to that recently reported for the putative Xenopus transcription factor FRG Y2. p54 and p56 are highly homologous and are smaller than expected by SDS/PAGE (36 kDa and 37 kDa) due to anomalous electrophoretic mobility. They lack the "RNP consensus motif" but contain four arginine-rich "basic/aromatic islands" that are similar to the RNA-binding domain of bacteriophage mRNA antiterminator proteins and of tat protein of human immunodeficiency virus. The basic/aromatic regions and a second conspicuous 100-amino acid "domain C" of p54 and p56 are conserved in the following DNA-binding proteins: human proteins dpbA, dpbB, and YB-1, rat protein EFIA, and Xenopus protein FRG Y1, all reported to bind to DNA; domain C is homologous to the major Escherichia coli cold-stress-response protein reportedly involved in translational control. Antibodies raised against a peptide of domain C have identified similar proteins in Xenopus somatic cells and in some mammalian cells and tissues. We conclude that p54 and p56 define a family of RNA-binding proteins, at least some of which may be involved in translational regulation.
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.
Rukov, Jakob Lewin; Gravesen, Eva; Mace, Maria L.
The development of vascular calcification (VC) in chronic uremia (CU) is a tightly regulated process controlled by factors promoting and inhibiting mineralization. Next-generation high-throughput RNA sequencing (RNA-seq) is a powerful and sensitive tool for quantitative gene expression profiling...... and the detection of differentially expressed genes. In the present study, we, for the first time, used RNA-seq to examine rat aorta transcriptomes from CU rats compared with control rats. Severe VC was induced in CU rats, which lead to extensive changes in the transcriptional profile. Among the 10,153 genes...... by circulating Klotho only or if Klotho is produced locally in the vasculature. We found that Klotho was neither expressed in the normal aorta nor calcified aorta by RNA-seq. In conclusion, we demonstrated extensive changes in the transcriptional profile of the uremic calcified aorta, which were consistent...
Full Text Available The complete convergence for weighted sums of sequences of negatively dependent random variables is investigated. By applying moment inequality and truncation methods, the equivalent conditions of complete convergence for weighted sums of sequences of negatively dependent random variables are established. These results not only extend the corresponding results obtained by Li et al. (1995, Gut (1993, and Liang (2000 to sequences of negatively dependent random variables, but also improve them.
Yan, Jing; Friedrich, Stefanie; Kurgan, Lukasz
Motivated by the pressing need to characterize protein-DNA and protein-RNA interactions on large scale, we review a comprehensive set of 30 computational methods for high-throughput prediction of RNA- or DNA-binding residues from protein sequences. We summarize these predictors from several significant perspectives including their design, outputs and availability. We perform empirical assessment of methods that offer web servers using a new benchmark data set characterized by a more complete annotation that includes binding residues transferred from the same or similar proteins. We show that predictors of DNA-binding (RNA-binding) residues offer relatively strong predictive performance but they are unable to properly separate DNA- from RNA-binding residues. We design and empirically assess several types of consensuses and demonstrate that machine learning (ML)-based approaches provide improved predictive performance when compared with the individual predictors of DNA-binding residues or RNA-binding residues. We also formulate and execute first-of-its-kind study that targets combined prediction of DNA- and RNA-binding residues. We design and test three types of consensuses for this prediction and conclude that this novel approach that relies on ML design provides better predictive quality than individual predictors when tested on prediction of DNA- and RNA-binding residues individually. It also substantially improves discrimination between these two types of nucleic acids. Our results suggest that development of a new generation of predictors would benefit from using training data sets that combine both RNA- and DNA-binding proteins, designing new inputs that specifically target either DNA- or RNA-binding residues and pursuing combined prediction of DNA- and RNA-binding residues. © The Author 2015. Published by Oxford University Press. For Permissions, please email: firstname.lastname@example.org.
ZENG, YAN; CULLEN, BRYAN R.
Most eukaryotes encode a substantial number of small noncoding RNAs termed micro RNAs (miRNAs). Previously, we have demonstrated that miR-30, a 22-nucleotide human miRNA, can be processed from a longer transcript bearing the proposed miR-30 stem-loop precursor and can translationally inhibit an mRNA-bearing artificial target sites. We also demonstrated that the miR-30 precursor stem can be substituted with a heterologous stem, which can be processed to yield novel miRNAs and can block the expression of endogenous mRNAs. Here, we show that a second human miRNA, termed miR-21, can also be effectively expressed when its precursor forms part of a longer mRNA. For both miR-30 and miR-21, mature miRNA production was highly dependent on the integrity of the precursor RNA stem, although the underlying sequence had little effect. In contrast, the sequence of the terminal loop affected miRNA production only moderately. Processing of the initial, miR-30-containing transcript led to the production of not only mature miR-30 but also to the largely nuclear excision of an ∼65-nucleotide RNA that is likely to represent an important intermediate in miR-30 processing. Consistent with this hypothesis, mutations that affected mature miR-30 production inhibited expression of this miR-30 pre-miRNA to an equivalent degree. Although point mutations could block the ability of both miR-30 and miR-21 to inhibit the translation of mRNAs bearing multiple artificial miRNA target sites, single point mutations only attenuated the miRNA-mediated inhibition of genes bearing single, fully complementary targets. These results suggest that miRNAs, and the closely similar small interfering RNAs, cannot totally discriminate between RNA targets differing by a single nucleotide. PMID:12554881
Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R
Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of m -sequences with pairwise-prime linear spans (AMPLS. Using m -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to 𝔽 2 , a signal set ( ( 2 n − 1 ( 2 m − 1 , ( 2 n + 1 ( 2 m + 1 , ( 2 ( n + 1 / 2 + 1 ( 2 ( m + 1 / 2 + 1 is constructed.
Poulsen, Line Dahl; Kielpinski, Lukasz Jan; Salama, Sofie R.; Krogh, Anders; Vinther, Jeppe
Selective 2′ Hydroxyl Acylation analyzed by Primer Extension (SHAPE) is an accurate method for probing of RNA secondary structure. In existing SHAPE methods, the SHAPE probing signal is normalized to a no-reagent control to correct for the background caused by premature termination of the reverse transcriptase. Here, we introduce a SHAPE Selection (SHAPES) reagent, N-propanone isatoic anhydride (NPIA), which retains the ability of SHAPE reagents to accurately probe RNA structure, but also allows covalent coupling between the SHAPES reagent and a biotin molecule. We demonstrate that SHAPES-based selection of cDNA–RNA hybrids on streptavidin beads effectively removes the large majority of background signal present in SHAPE probing data and that sequencing-based SHAPES data contain the same amount of RNA structure data as regular sequencing-based SHAPE data obtained through normalization to a no-reagent control. Moreover, the selection efficiently enriches for probed RNAs, suggesting that the SHAPES strategy will be useful for applications with high-background and low-probing signal such as in vivo RNA structure probing. PMID:25805860
RNA polymerase is an enzyme that transcribes genes from DNA onto strands of RNA and the transcription is a processive, accurate but discontinuous process. Despite extensive structural, biochemical and biophysical studies, the transcription elongation mechanism by the RNA polymerase is still not well determined. Here a new Brownian ratchet model is presented for this transcription elongation by the RNA polymerase. The structure's conformational changes observed in the RNAP translocation cycle are incorporated into the model. Using the model, the dynamic behaviors of continuous transcription elongation between two pauses and inhibition of next nucleotide addition after misincorporation are well explained. Moreover, the sequence-dependent short pauses result from site-specific interactions of RNAP with dsDNA and/or RNA-DNA hybrid. With this model, it is demonstrated that, at a given sequence, the lifetime distribution of the short pause has the single-exponential form at saturating nucleotide concentration, which is in contrast to the multi-exponential distribution of the dwell time during the continuous transcription elongation.
Zhang, Yu-Chen; Zhang, Shao-Wu; Liu, Lian; Liu, Hui; Zhang, Lin; Cui, Xiaodong; Huang, Yufei; Meng, Jia
With the development of new sequencing technology, the entire N6-methyl-adenosine (m(6)A) RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq), making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.
Full Text Available With the development of new sequencing technology, the entire N6-methyl-adenosine (m6A RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq, making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.
Matthew D MacManes
Full Text Available The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.
Full Text Available Cas9/CRISPR has been reported to efficiently induce targeted gene disruption and homologous recombination in both prokaryotic and eukaryotic cells. Thus, we developed a Guide RNA Sequence Design Platform for the Cas9/CRISPR silencing system for model organisms. The platform is easy to use for gRNA design with input query sequences. It finds potential targets by PAM and ranks them according to factors including uniqueness, SNP, RNA secondary structure, and AT content. The platform allows users to upload and share their experimental results. In addition, most guide RNA sequences from published papers have been put into our database.
Full Text Available Phytoplankton is the basis for aquatic food webs and mirrors the water quality. Conventionally, phytoplankton analysis has been done using time consuming and partly subjective microscopic observations, but next generation sequencing (NGS technologies provide promising potential for rapid automated examination of environmental samples. Because many phytoplankton species have tough cell walls, methods for cell lysis and DNA or RNA isolation need to be efficient to allow unbiased nucleic acid retrieval. Here, we analyzed how two phytoplankton preservation methods, three commercial DNA extraction kits and their improvements, three RNA extraction methods, and two data analysis procedures affected the results of the NGS analysis. A mock community was pooled from phytoplankton species with variation in nucleus size and cell wall hardness. Although the study showed potential for studying Lugol-preserved sample collections, it demonstrated critical challenges in the DNA-based phytoplankton analysis in overall. The 18S rRNA gene sequencing output was highly affected by the variation in the rRNA gene copy numbers per cell, while sample preservation and nucleic acid extraction methods formed another source of variation. At the top, sequence-specific variation in the data quality introduced unexpected bioinformatics bias when the sliding-window method was used for the quality trimming of the Ion Torrent data. While DNA-based analyses did not correlate with biomasses or cell numbers of the mock community, rRNA-based analyses were less affected by different RNA extraction procedures and had better match with the biomasses, dry weight and carbon contents, and are therefore recommended for quantitative phytoplankton analyses.
Full Text Available Abstract Background In addition to genome sequencing, accurate functional annotation of genomes is required in order to carry out comparative and evolutionary analyses between species. Among primates, the human genome is the most extensively annotated. Human miRNA gene annotation is based on multiple lines of evidence including evidence for expression as well as prediction of the characteristic hairpin structure. In contrast, most miRNA genes in non-human primates are annotated based on homology without any expression evidence. We have sequenced small-RNA libraries from chimpanzee, gorilla, orangutan and rhesus macaque from multiple individuals and tissues. Using patterns of miRNA expression in conjunction with a model of miRNA biogenesis we used these high-throughput sequencing data to identify novel miRNAs in non-human primates. Results We predicted 47 new miRNAs in chimpanzee, 240 in gorilla, 55 in orangutan and 47 in rhesus macaque. The algorithm we used was able to predict 64% of the previously known miRNAs in chimpanzee, 94% in gorilla, 61% in orangutan and 71% in rhesus macaque. We therefore added evidence for expression in between one and five tissues to miRNAs that were previously annotated based only on homology to human miRNAs. We increased from 60 to 175 the number miRNAs that are located in orthologous regions in humans and the four non-human primate species studied here. Conclusions In this study we provide expression evidence for homology-based annotated miRNAs and predict de novo miRNAs in four non-human primate species. We increased the number of annotated miRNA genes and provided evidence for their expression in four non-human primates. Similar approaches using different individuals and tissues would improve annotation in non-human primates and allow for further comparative studies in the future.
Full Text Available No special studies have been focused on the microRNA (miRNA in the fifth-instar posterior silk gland of Bombyx mori. Here, using next-generation sequencing, we acquired 93.2 million processed reads from 10 small RNA libraries. In this paper, we tried to thoroughly describe how our dataset generated from deep sequencing which was recently published in BMC genomics. Results showed that our findings are largely enriched silkworm miRNA depository and may benefit us to reveal the miRNA functions in the process of silk production.
Gavrilov, Kseniya; Seo, Young-Eun; Tietjen, Gregory T; Cui, Jiajia; Cheng, Christopher J; Saltzman, W Mark
Canonical siRNA design algorithms have become remarkably effective at predicting favorable binding regions within a target mRNA, but in some cases (e.g., a fusion junction site) region choice is restricted. In these instances, alternative approaches are necessary to obtain a highly potent silencing molecule. Here we focus on strategies for rational optimization of two siRNAs that target the junction sites of fusion oncogenes BCR-ABL and TMPRSS2-ERG. We demonstrate that modifying the termini of these siRNAs with a terminal G-U wobble pair or a carefully selected pair of terminal asymmetry-enhancing mismatches can result in an increase in potency at low doses. Importantly, we observed that improvements in silencing at the mRNA level do not necessarily translate to reductions in protein level and/or cell death. Decline in protein level is also heavily influenced by targeted protein half-life, and delivery vehicle toxicity can confound measures of cell death due to silencing. Therefore, for BCR-ABL, which has a long protein half-life that is difficult to overcome using siRNA, we also developed a nontoxic transfection vector: poly(lactic-coglycolic acid) nanoparticles that release siRNA over many days. We show that this system can achieve effective killing of leukemic cells. These findings provide insights into the implications of siRNA sequence for potency and suggest strategies for the design of more effective therapeutic siRNA molecules. Furthermore, this work points to the importance of integrating studies of siRNA design and delivery, while heeding and addressing potential limitations such as restricted targetable mRNA regions, long protein half-lives, and nonspecific toxicities.
Stephen J Coleman
Full Text Available Sequencing of equine mRNA (RNA-seq identified 428 putative transcripts which do not map to any previously annotated or predicted horse genes. Most of these encode the equine homologs of known protein-coding genes described in other species, yet the potential exists to identify novel and perhaps equine-specific gene structures. A set of 36 transcripts were prioritized for further study by filtering for levels of expression (depth of RNA-seq read coverage, distance from annotated features in the equine genome, the number of putative exons, and patterns of gene expression between tissues. From these, four were selected for further investigation based on predicted open reading frames of greater than or equal to 50 amino acids and lack of detectable homology to known genes across species. Sanger sequencing of RT-PCR amplicons from additional equine samples confirmed expression and structural annotation of each transcript. Functional predictions were made by conserved domain searches. A single transcript, expressed in the cerebellum, contains a putative kruppel-associated box (KRAB domain, suggesting a potential function associated with zinc finger proteins and transcriptional regulation. Overall levels of conserved synteny and sequence conservation across a 1MB region surrounding each transcript were approximately 73% compared to the human, canine, and bovine genomes; however, the four loci display some areas of low conservation and sequence inversion in regions that immediately flank these previously unannotated equine transcripts. Taken together, the evidence suggests that these four transcripts are likely to be equine-specific.
Wiedenheft, Blake; van Duijn, Esther; Bultema, Jelle B.; Waghmare, Sakharam P.; Zhou, Kaihong; Barendregt, Arjan; Westphal, Wiebke; Heck, Albert J. R.; Boekema, Egbert J.; Dickman, Mark J.; Doudna, Jennifer A.
Prokaryotes have evolved multiple versions of an RNA-guided adaptive immune system that targets foreign nucleic acids. In each case, transcripts derived from clustered regularly interspaced short palindromic repeats (CRISPRs) are thought to selectively target invading phage and plasmids in a sequence-specific process involving a variable cassette of CRISPR-associated (cas) genes. The CRISPR locus in Pseudomonas aeruginosa (PA14) includes four cas genes that are unique to and conserved in microorganisms harboring the Csy-type (CRISPR system yersinia) immune system. Here we show that the Csy proteins (Csy1–4) assemble into a 350 kDa ribonucleoprotein complex that facilitates target recognition by enhancing sequence-specific hybridization between the CRISPR RNA and complementary target sequences. Target recognition is enthalpically driven and localized to a “seed sequence” at the 5′ end of the CRISPR RNA spacer. Structural analysis of the complex by small-angle X-ray scattering and single particle electron microscopy reveals a crescent-shaped particle that bears striking resemblance to the architecture of a large CRISPR-associated complex from Escherichia coli, termed Cascade. Although similarity between these two complexes is not evident at the sequence level, their unequal subunit stoichiometry and quaternary architecture reveal conserved structural features that may be common among diverse CRISPR-mediated defense systems. PMID:21536913
Li, Xin-Feng; Cao, Rui-Bing; Luo, Jun; Fan, Jian-Ming; Wang, Jing-Man; Zhang, Yuan-Peng; Gu, Jin-Yan; Feng, Xiu-Li; Zhou, Bin; Chen, Pu-Yan
Japanese encephalitis (JE) is a mosquito borne viral disease, caused by Japanese encephalitis virus (JEV) infection producing severe neuroinflammation in the central nervous system (CNS) with the associated disruption of the blood brain barrier. MicroRNAs (miRNAs) are a family of 21-24 nt small non-coding RNAs that play important post-transcriptional regulatory roles in gene expression and have critical roles in virus pathogenesis. We examined the potential roles of miRNAs in JEV-infected suckling mice brains and found that JEV infection changed miRNA expression profiles when the suckling mice began showing nervous symptoms. A total of 1062 known and 71 novel miRNAs were detected in JEV-infected group, accompanied with 1088 known and 75 novel miRNAs in mock controls. Among these miRNAs, one novel and 25 known miRNAs were significantly differentially expressed, including 18 up-regulated and 8 down-regulated miRNAs which were further confirmed by real-time PCR. Gene ontology (GO) and signaling pathway analysis of the predicted target mRNAs of the modulated miRNAs showed that they are correlated with the regulation of apoptosis, neuron differentiation, antiviral immunity and infiltration of mouse brain, and the validated targets of 12 differentially expressed miRNAs were enriched for the regulation of cell programmed death, proliferation, transcription, muscle organ development, erythrocyte differentiation, gene expression, plasma membrane and protein domain specific binding. KEGG analysis further reveals that the validated target genes were involved in the Pathways in cancer, Neurotrophin signaling pathway, Toll like receptor signaling pathway, Endometrial cancer and Jak-STAT signaling pathway. We constructed the interaction networks of miRNAs and their target genes according to GO terms and KEGG pathways and the expression levels of several target genes were examined. Our data provides a valuable basis for further studies on the regulatory roles of miRNAs in JE
Bai, Yu; Ni, Min; Cooper, Blerta; Wei, Yi; Fury, Wen
Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data. We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq. PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications.
Rao, A L N; Cheng Kao, C
The 3' untranslated region in each of the three genomic RNAs of Brome mosaic virus (BMV) is highly homologous and contains a sequence that folds into a tRNA-like structure (TLS). Experiments performed over the past four decades revealed that the BMV 3' TLS regulates many important steps in BMV infection. This review summarizes in vitro and in vivo studies of the roles of the BMV 3' TLS functioning as a minus-strand promoter, in RNA recombination, and to nucleate virion assembly. Copyright © 2015 Elsevier B.V. All rights reserved.
Zheng, H-Y; Chen, J; Adams, M J; Chen, J-P
The complete sequence of an isolate of Narcissus common latent virus (NCLV) from Zhangzhou city, Fujian, China was determined from amplified fragments of purified viral RNA. Excluding the poly(A) tail, the genomic RNA of NCLV was 8539 nucleotides (nt) long and had the typical organization for a member of the genus Carlavirus. The most closely related species were Potato virus M, Hop latent virus and Aconitum latent virus, which had 58-59% nt identity to NCLV in their entire genomes. These relationships were confirmed by a phylogenetic analysis using a composite nucleotide alignment of all the open reading frames.
Butler Margaret I
Full Text Available Abstract Background Inteins are self-splicing protein elements. They are translated as inserts within host proteins that excise themselves and ligate the flanking portions of the host protein (exteins with a peptide bond. They are encoded as in-frame insertions within the genes for the host proteins. Inteins are found in all three domains of life and in viruses, but have a very sporadic distribution. Only a small number of intein coding sequences have been identified in eukaryotic nuclear genes, and all of these are from ascomycete or basidiomycete fungi. Results We identified seven intein coding sequences within nuclear genes coding for the second largest subunits of RNA polymerase. These sequences were found in diverse eukaryotes: one is in the second largest subunit of RNA polymerase I (RPA2 from the ascomycete fungus Phaeosphaeria nodorum, one is in the RNA polymerase III (RPC2 of the slime mould Dictyostelium discoideum and four intein coding sequences are in RNA polymerase II genes (RPB2, one each from the green alga Chlamydomonas reinhardtii, the zygomycete fungus Spiromyces aspiralis and the chytrid fungi Batrachochytrium dendrobatidis and Coelomomyces stegomyiae. The remaining intein coding sequence is in a viral relic embedded within the genome of the oomycete Phytophthora ramorum. The Chlamydomonas and Dictyostelium inteins are the first nuclear-encoded inteins found outside of the fungi. These new inteins represent a unique dataset: they are found in homologous proteins that form a paralogous group. Although these paralogues diverged early in eukaryotic evolution, their sequences can be aligned over most of their length. The inteins are inserted at multiple distinct sites, each of which corresponds to a highly conserved region of RNA polymerase. This dataset supports earlier work suggesting that inteins preferentially occur in highly conserved regions of their host proteins. Conclusion The identification of these new inteins
Liu, Zhan-Lin; Zhang, Da-Ming; Wang, Xiao-Ru
In higher plants the primary and the secondary structures of 5S ribosomal RNA gene are considered highly conservative. Little is known about the 5S rRNA gene structure, organization and variation in gyimnosperms. In this study we analyzed sequence and structure variation of 5S rRNA gene in Pinus through cloning and sequencing multiple copies of 5S rDNA repeats from individual trees of five pines, P. bungeana, P. tabulaeformis, P. yunnanensis, P. massoniana and P. densata. Pinus bungeana is from the subgenus Strobus while the other four are from the subgenus Pinus (diploxylon pines). Our results revealed variations in both primary and secondary structure among copies of 5S rDNA within individual genomes and between species. 5S rRNA gene in Pinus is 120 bp long in most of the 122 clones we sequenced except for one or two deletions in three clones. Among these clones 50 unique sequences were identified and they were shared by different pine species. Our sequences were compared to 13 sequences each representing a different gymnosperm species, and to six sequences representing both angiosperm monocots and dicots. Average sequence similarity was 97.1% among Pinus species and 94.3% between Pinus and other gymnosperms. Between gymnosperms and angiosperms the sequence similarity decreased to 88.1%. Similar to other molecular data, significant sequence divergence was found between the two Pinus subgenera. The 5S gene tree (neighbor-joining tree) grouped the four diploxylon pines together and separated them distinctly from P. bungeana. Comparison of sequence divergence within individuals and between species suggested that concerted evolution has been very weak especially after the divergence of the four diploxylon pines. The phylogenetic information contained in the 5S rRNA gene is limited due to its shorter length and the difficulties in identifying orthologous and paralogous copies of rDNA multigene family further complicate its phylogenetic application. Pinus densata is a
Der, Evan; Ranabothu, Saritha; Suryawanshi, Hemant; Akat, Kemal M.; Clancy, Robert; Kustagi, Manjunath; Czuppa, Mareike; Izmirly, Peter; Belmont, H. Michael; Wang, Tao; Jordan, Nicole; Bornkamp, Nicole; Nwaukoni, Janet; Martinez, July; Buyon, Jill P.; Tuschl, Thomas
Lupus nephritis is a leading cause of mortality among systemic lupus erythematosus (SLE) patients, and its heterogeneous nature poses a significant challenge to the development of effective diagnostics and treatments. Single cell RNA sequencing (scRNA-seq) offers a potential solution to dissect the heterogeneity of the disease and enables the study of similar cell types distant from the site of renal injury to identify novel biomarkers. We applied scRNA-seq to human renal and skin biopsy tissues and demonstrated that scRNA-seq can be performed on samples obtained during routine care. Chronicity index, IgG deposition, and quantity of proteinuria correlated with a transcriptomic-based score composed of IFN-inducible genes in renal tubular cells. Furthermore, analysis of cumulative expression profiles of single cell keratinocytes dissociated from nonlesional, non–sun-exposed skin of patients with lupus nephritis also revealed upregulation of IFN-inducible genes compared with keratinocytes isolated from healthy controls. This indicates the possible use of scRNA-seq analysis of skin biopsies as a biomarker of renal disease. These data support the potential utility of scRNA-seq to provide new insights into the pathogenesis of lupus nephritis and pave the way for exploiting a readily accessible tissue to reflect injury in the kidney. PMID:28469080
Jacob B Spangler
Full Text Available Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs, DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both nontranscribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression.
Ángela L. Riffo-Campos
Full Text Available MicroRNAs (miRNAs are defined as small non-coding RNAs ~22 nt in length. They regulate gene expression at a post-transcriptional level through complementary base pairing with the target mRNA, leading to mRNA degradation and therefore blocking translation. In the last decade, the dysfunction of miRNAs has been related to the development and progression of many diseases. Currently, researchers need a method to identify precisely the miRNA targets, prior to applying experimental approaches that allow a better functional characterization of miRNAs in biological processes and can thus predict their effects. Computational prediction tools provide a rapid method to identify putative miRNA targets. However, since a large number of tools for the prediction of miRNA:mRNA interactions have been developed, all with different algorithms, the biological researcher sometimes does not know which is the best choice for his study and many times does not understand the bioinformatic basis of these tools. This review describes the biological fundamentals of these prediction tools, characterizes the main sequence-based algorithms, and offers some insights into their uses by biologists.
Luehrsen, K. R.; Fox, G. E.
The primary sequence of the 5S ribosomal RNA isolated from the free-living bioluminescent marine bacterium Beneckea harveyi is reported and discussed in regard to indications of phylogenetic relationships with the bacteria Escherichia coli and Photobacterium phosphoreum. Sequences were determined for oligonucleotide products generated by digestion with ribonuclease T1, pancreatic ribonuclease and ribonuclease T2. The presence of heterogeneity is indicated for two sites. The B. harveyi sequence can be arranged into the same four helix secondary structures as E. coli and other prokaryotic 5S rRNAs. Examination of the 5S-RNS sequences of the three bacteria indicates that B. harveyi and P. phosphoreum are specifically related and share a common ancestor which diverged from an ancestor of E. coli at a somewhat earlier time, consistent with previous studies.
Ahmad, Tauqeer; AbouHaidar, Mounir; Hefferon, Kathleen L
Improved knowledge of the molecular biology of viruses, including recent gains in virus sequence data analysis, has greatly contributed to recent innovations in medical diagnostics, therapeutics, drug development and other related areas. Virus sequences have been used for the development of vaccines and antiviral agents to block the spread of viral infections, as well as to target and battle chronic diseases such as cancer. Virus sequences are now routinely employed in a wide array of RNA silencing technologies. Viruses can also be engineered into expression vectors which in turn can be used as protein production platforms as well as delivery vehicles for gene therapies. This review article outlines a number of patents that have been recently issued with respect to virus sequence data and describes some of their biotechnological applications.
Dette, Holger; Nagel, Jan
In this paper we define distributions on moment spaces corresponding to measures on the real line with an unbounded support. We identify these distributions as limiting distributions of random moment vectors defined on compact moment spaces and as distributions corresponding to random spectral measures associated with the Jacobi, Laguerre and Hermite ensemble from random matrix theory. For random vectors on the unbounded moment spaces we prove a central limit theorem where the centering vecto...
Hagen, Charles; Frizzi, Alessandra; Kao, John; Jia, Lijie; Huang, Mingya; Zhang, Yuanji; Huang, Shihshieh
In a virus-infected plant, small interfering RNAs (siRNAs) corresponding to the viral genome form a large proportion of the small RNA population. It is possible to reassemble significant portions of the virus sequence from overlapping siRNA sequences and use these to identify the virus. We tested this technique with a resistance-breaking and a non-resistance-breaking strain of tomato spotted wilt virus (TSWV). We were able to assemble contigs covering 99% of the genomes of both viruses. The abundance of TSWV siRNAs allowed us to detect TSWV at early time points before the onset of symptoms, at levels too low for conventional detection. Combining traditional and bioinformatic detection methods, we also measured how replication of the resistance-breaking strain differed from the non-resistance-breaking strain in susceptible and resistant tomato varieties. We repeated this technique in identification of a squash-infecting geminivirus and also used it to identify an unspecified tospovirus.
Full Text Available MicroRNAs are endogenous non-coding small RNAs playing crucial regulatory roles in plants. Tea, a globally popular non-alcoholic drink, is rich in health-enhancing catechins. In this study, 69 conserved and 47 novel miRNAs targeting 644 genes were identified by high-throughout sequencing. Predicted target genes of miRNAs were mainly involved in plant growth, signal transduction, morphogenesis and defense. To further identify targets of tea miRNAs, degradome sequencing and RNA ligase-mediated rapid amplification of 5'cDNA ends (RLM-RACE were applied. Using degradome sequencing, 26 genes mainly involved in transcription factor, resistance protein and signal transduction protein synthesis were identified as potential miRNA targets, with 5 genes subsequently verified. Quantitative real-time PCR (qRT-PCR revealed that the expression patterns of novel-miR1, novel-miR2, csn-miR160a, csn-miR162a, csn-miR394 and csn-miR396a were negatively correlated with catechin content. The expression of six miRNAs (csn-miRNA167a, csn-miR2593e, csn-miR4380a, csn-miR3444b, csn-miR5251 and csn-miR7777-5p.1 and their target genes involved in catechin biosynthesis were also analyzed by qRT-PCR. Negative and positive correlations were found between these miRNAs and catechin contents, while positive correlations were found between their target genes and catechin content. This result suggests that these miRNAs may negatively regulate catechin biosynthesis by down-regulating their biosynthesis-related target genes. Taken together, our results indicate that miRNAs are crucial regulators in tea, with the results of 5'-RLM-RACE and expression analyses revealing the important role of miRNAs in catechin anabolism. Our findings should facilitate future research to elucidate the function of miRNAs in catechin biosynthesis.
Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie; Weinberg, Marc S. [Antiviral Gene Therapy Research Unit, University of the Witwatersrand (South Africa); Arbuthnot, Patrick, E-mail: Patrick.Arbuthnot@wits.ac.za [Antiviral Gene Therapy Research Unit, University of the Witwatersrand (South Africa)
RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR) shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.
Heller, David; Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mendonça-Hagler, L C; Hagler, A N; Kurtzman, C P
Phylogenetic relationships of species assigned to the genus Metschnikowia were estimated from the extents of divergence among partial sequences of rRNA. The data suggest that the aquatic species (Metschnikowia australis, Metschnikowia bicuspidata, Metschnikowia krissii, and Metschnikowia zobellii) and the terrestrial species (Metschnikowia hawaiiensis, Metschnikowia lunata, Metschnikowia pulcherrima, and Metschnikowia reukaufii) form two groups within the genus. M. lunata and M. hawaiiensis are well separated from other members of the genus, and M. hawaiiensis may be sufficiently divergent that it could be placed in a new genus. Species of the genus Metschnikowia are unique compared with other ascomycetous yeasts because they have a deletion in the large-subunit rRNA sequence that includes nucleotides 434 to 483.
Full Text Available In this paper, we propose a novel method for increasing the entropy of a sequence of independent, discrete random variables with arbitrary distributions. The method uses an auxiliary table and a novel theorem that concerns the entropy of a sequence in which the elements are a bitwise exclusive-or sum of independent discrete random variables.
Pseudo-random number sequences are useful in many applications including Monte-Carlo simulation, spread spectrum ... a pseudo-random binary sequence from the two-dimensional chaotic Hénon map is explored. ... is the Hénon map, a two-dimensional discrete-time nonlinear dynamical system represented by the state ...
Kindgren, Peter; Yap, Aaron; Bond, Charles S
RNA editing factors of the pentatricopeptide repeat (PPR) family show a very high degree of sequence specificity in the recognition of their target sites. A molecular basis for target recognition by editing factors has been proposed based on statistical correlations but has not been tested...... experimentally. To achieve this, we systematically mutated the pentatricopeptide motifs in the Arabidopsis thaliana RNA editing factor CLB19 to investigate their individual contribution to RNA recognition. We find that the motifs contributing significantly to the specificity of binding follow the previously...... such that the final PPR motif aligns four nucleotides upstream of the edited cytidine. By altering S motifs in CLB19 and another editing factor, OTP82, and using the modified proteins to attempt to complement the respective mutants, we demonstrate that we can predictably alter the specificity of these factors in vivo....
Wiedenheft, Blake; van Duijn, Esther; Bultema, Jelle; Waghmare, Sakharam; Zhou, Kaihong; Barendregt, Arjan; Westphal, Wiebke; Heck, Albert; Boekema, Egbert; Dickman, Mark; Doudna, Jennifer A.
Prokaryotes have evolved multiple versions of an RNA-guided adaptive immune system that targets foreign nucleic acids. In each case, transcripts derived from clustered regularly interspaced short palindromic repeats (CRISPRs) are thought to selectively target invading phage and plasmids in a sequence-specific process involving a variable cassette of CRISPR-associated (cas) genes. The CRISPR locus in Pseudomonas aeruginosa (PA14) includes four cas genes that are unique to and conserved in micr...
Ozsolak, Fatih; Milos, Patrice M.
Methods for in-depth genome-wide characterization of transcriptomes and quantification of transcript levels using various microarray and next-generation sequencing technologies have emerged as valuable tools for understanding cellular physiology and human disease biology and have begun to be utilized in various clinical diagnostic applications. Current methods, however, typically require RNA to be converted to complementary DNA prior to measurements. This step has been shown to introduce many...
Le, H L; Perasso, R; Billard, R
"Fish" phylogeny has been studied using partial 28 S ribosomal RNA sequences of 14 species among which 12 are "fish" ranging from lamprey to perciforms. Our results are in good agreement with generally accepted cladograms based on anatomical and paleontological data. Two interesting conclusions emerged: a) Polypterus is the sister-group of all other actinopterygians; b) the divergences of the Clasdistia, Tetrapoda and Chondrichthyes seem to have occurred during a relatively short period of time.
473 Replication of individual DNA molecules under electronic control using a protein nanopore. Nat 474 Nanotechnol 5:798-806. 475 Potter J, Zheng...proofreading, and mismatch repair during DNA replication 483 in Escherichia coli. J Biol Chem 268:23762-23765. 484 Schmieder R, Edwards R. 2011. Quality...of 25 population. The observed frequencies included: shotgun sequencing from plasmid DNA or in 26 vitro transcribed RNA (as a basic “no
Tara G McDaneld
Full Text Available BACKGROUND: MicroRNA are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts through targeting of a microRNA-protein complex by base-pairing of the microRNA sequence to cognate recognition sequences in the 3' untranslated region (UTR of the mRNA. Target identification for a given microRNA sequence is generally accomplished by informatics analysis of predicted mRNA sequences present in the genome or in databases of transcript sequence for the tissue of interest. However, gene models for porcine skeletal muscle transcripts in current databases, specifically complete sequence of the 3' UTR, are inadequate for this exercise. METHODOLOGY/PRINCIPAL FINDINGS: To provide data necessary to identify gene targets for microRNA in porcine skeletal muscle, normalized cDNA libraries were sequenced using Roche 454 GS-FLX pyrosequencing and de novo assembly of transcripts enriched in the 3' UTR was performed using the MIRA sequence assembly program. Over 725 million bases of sequence were generated, which assembled into 18,202 contigs. Sequence reads were mapped to a 3' UTR database containing porcine sequences. The 3' UTR that mapped to the database were examined to predict targets for previously identified microRNA that had been separately sequenced from the same porcine muscle sample used to generate the cDNA libraries. For genes with microRNA-targeted 3' UTR, KEGG pathways were computationally determined in order to identify potential functional effects of these microRNA-targeted transcripts. CONCLUSIONS: Through next-generation sequencing of transcripts expressed in skeletal muscle, mapping reads to a 3' UTR database, and prediction of microRNA target sites in the 3' UTR, our results identified genes expressed in porcine skeletal muscle and predicted the microRNA that target these genes. Additionally, identification of pathways regulated by these microRNA-targeted genes provides us with a set of
Le Meuth-Metzinger Valerie
Full Text Available Abstract Background Specific cis-elements and the associated trans-acting factors have been implicated in the post-transcriptional regulation of gene expression. In the era of genome wide analyses identifying novel trans-acting factors and cis-regulatory elements is a step towards understanding coordinated gene expression. UV-crosslink analysis is a standard method used to identify RNA-binding proteins. Uridine is traditionally used to radiolabel substrate RNAs, however, proteins binding to cis-elments particularly uridine poor will be weakly or not detected. We evaluate here the possibility of using UV-crosslinking with RNA substrates radiolabeled with each of the four ribonucleotides as an approach for screening for novel sequence specific RNA-binding proteins. Results The radiolabeled RNA substrates were derived from the 3'UTRs of the cloned Eg and c-mos Xenopus laevis maternal mRNAs. Specific, but not identical, uv-crosslinking signals were obtained, some of which corresponded to already identified proteins. A signal for a novel 90 kDa protein was observed with the c-mos 3'UTR radiolabeled with both CTP and GTP but not with UTP. The binding site of the 90 kDa RNA-binding protein was localised to a 59-nucleotide portion of the c-mos 3'UTR. Conclusion That the 90 kDa signal was detected with RNAs radiolabeled with CTP or GTP but not UTP illustrates the advantage of radiolabeling all four nucleotides in a UV-crosslink based screen. This method can be used for both long and short RNAs and does not require knowledge of the cis-acting sequence. It should be amenable to high throughput screening for RNA binding proteins.
Leclercq, Mickael; Diallo, Abdoulaye Baniré; Blanchette, Mathieu
MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li
Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818
Honda, Tomoyuki; Sofuku, Kozue; Kojima, Shohei; Yamamoto, Yusuke; Ohtaki, Naohiro; Tomonaga, Keizo
The 3'-untranslated region (UTR) of the non-segmented, negative-strand (NNS) RNA viral genome is called the leader sequence, and functions as the promoter for viral replication and transcription. NNS RNA viruses also use the sequence as a template to synthesize leader RNAs (leRNAs) with unknown functions. Borna disease virus (BDV) is unique because it establishes a persistent infection and replicates in the nucleus. No report has yet demonstrated the presence of leRNAs during BDV infection. Here, we report that BDV synthesizes leRNAs from the 3'-UTR of the genome. They started at position 5 in the 3'-UTR and ended by the transcription start signal of the nucleoprotein gene. The level of leRNA production is not correlated with the levels of viral replication and transcription. On the other hand, mutation of the 3'-UTR affects leRNA production. Our findings add a novel viral transcript to the BDV life cycle and shed light on BDV replication and/or transcription. Copyright © 2017 Elsevier Inc. All rights reserved.
Full Text Available A substantial number of “retrogenes” that are derived from the mRNA of various intron-containing genes have been reported. A class of mammalian retroposons, long interspersed element-1 (LINE1, L1, has been shown to be involved in the reverse transcription of retrogenes (or processed pseudogenes and non-autonomous short interspersed elements (SINEs. The -end sequences of various SINEs originated from a corresponding LINE. As the -untranslated regions of several LINEs are essential for retroposition, these LINEs presumably require “stringent” recognition of the -end sequence of the RNA template. However, the -ends of mammalian L1s do not exhibit any similarity to SINEs, except for the presence of -poly(A repeats. Since the -poly(A repeats of L1 and Alu SINE are critical for their retroposition, L1 probably recognizes the poly(A repeats, thereby mobilizing not only Alu SINE but also cytosolic mRNA. Many flowering plants only harbor L1-clade LINEs and a significant number of SINEs with poly(A repeats, but no homology to the LINEs. Moreover, processed pseudogenes have also been found in flowering plants. I propose that the ancestral L1-clade LINE in the common ancestor of green plants may have recognized a specific RNA template, with stringent recognition then becoming relaxed during the course of plant evolution.
Wolfien, Markus; Rimmbach, Christian; Schmitz, Ulf; Jung, Julia Jeannine; Krebs, Stefan; Steinhoff, Gustav; David, Robert; Wolkenhauer, Olaf
Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa). Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs'). TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline---manual).
Full Text Available Combination of reverse transcription (RT and deep sequencing has emerged as a powerful instrument for the detection of RNA modifications, a field that has seen a recent surge in activity because of its importance in gene regulation. Recent studies yielded high-resolution RT signatures of modified ribonucleotides relying on both sequence-dependent mismatch patterns and reverse transcription arrests. Common alignment viewers lack specialized functionality, such as filtering, tailored visualization, image export and differential analysis. Consequently, the community will profit from a platform seamlessly connecting detailed visual inspection of RT signatures and automated screening for modification candidates. CoverageAnalyzer (CAn was developed in response to the demand for a powerful inspection tool. It is freely available for all three main operating systems. With SAM file format as standard input, CAn is an intuitive and user-friendly tool that is generally applicable to the large community of biomedical users, starting from simple visualization of RNA sequencing (RNA-Seq data, up to sophisticated modification analysis with significance-based modification candidate calling.
Sripada, Lakshmi; Tomar, Dhanendra; Prajapati, Paresh; Singh, Rochika; Singh, Arun Kumar; Singh, Rajesh
Mitochondria are one of the central regulators of many cellular processes beyond its well established role in energy metabolism. The inter-organellar crosstalk is critical for the optimal function of mitochondria. Many nuclear encoded proteins and RNA are imported to mitochondria. The translocation of small RNA (sRNA) including miRNA to mitochondria and other sub-cellular organelle is still not clear. We characterized here sRNA including miRNA associated with human mitochondria by cellular fractionation and deep sequencing approach. Mitochondria were purified from HEK293 and HeLa cells for RNA isolation. The sRNA library was generated and sequenced using Illumina system. The analysis showed the presence of unique population of sRNA associated with mitochondria including miRNA. Putative novel miRNAs were characterized from unannotated sRNA sequences. The study showed the association of 428 known, 196 putative novel miRNAs to mitochondria of HEK293 and 327 known, 13 putative novel miRNAs to mitochondria of HeLa cells. The alignment of sRNA to mitochondrial genome was also studied. The targets were analyzed using DAVID to classify them in unique networks using GO and KEGG tools. Analysis of identified targets showed that miRNA associated with mitochondria regulates critical cellular processes like RNA turnover, apoptosis, cell cycle and nucleotide metabolism. The six miRNAs (counts >1000) associated with mitochondria of both HEK293 and HeLa were validated by RT-qPCR. To our knowledge, this is the first systematic study demonstrating the associations of sRNA including miRNA with mitochondria that may regulate site-specific turnover of target mRNA important for mitochondrial related functions. PMID:22984580
Li, Ru; Tun, Hein Min; Jahan, Musarrat; Zhang, Zhengxiao; Kumar, Ayush; Fernando, Dilantha; Farenhorst, Annemieke; Khafipour, Ehsan
The limitation of 16S rRNA gene sequencing (DNA-based) for microbial community analyses in water is the inability to differentiate live (dormant cells as well as growing or non-growing metabolically active cells) and dead cells, which can lead to false positive results in the absence of live microbes. Propidium-monoazide (PMA) has been used to selectively remove DNA from dead cells during downstream sequencing process. In comparison, 16S rRNA sequencing (RNA-based) can target live microbial cells in water as both dormant and metabolically active cells produce rRNA. The objective of this study was to compare the efficiency and sensitivity of DNA-based, PMA-based and RNA-based 16S rRNA Illumina sequencing methodologies for live bacteria detection in water samples experimentally spiked with different combination of bacteria (2 gram-negative and 2 gram-positive/acid fast species either all live, all dead, or combinations of live and dead species) or obtained from different sources (First Nation community drinking water; city of Winnipeg tap water; water from Red River, Manitoba, Canada). The RNA-based method, while was superior for detection of live bacterial cells still identified a number of 16S rRNA targets in samples spiked with dead cells. In environmental water samples, the DNA- and PMA-based approaches perhaps overestimated the richness of microbial community compared to RNA-based method. Our results suggest that the RNA-based sequencing was superior to DNA- and PMA-based methods in detecting live bacterial cells in water.
Gow, J W; Behan, W M; Clements, G B; Woodall, C; Riding, M; Behan, P O
To determine the presence of enteroviral sequences in muscle of patients with the postviral fatigue syndrome. Detection of sequences with the polymerase chain reaction in a well defined group of patients with the syndrome and controls over the same period. Institute of Neurological Sciences, Glasgow. 60 consecutive patients admitted to the institute with the postviral fatigue syndrome who had undergone extensive investigation to exclude other conditions. 41 controls from the same catchment area without evidence of fatigue, all undergoing routine surgery. Routine investigations, serological screen for antibodies to a range of viruses, and presence of enteroviral RNA sequences in muscle biopsy specimens. 15 (25%) patients and 10 (24.4%) controls had important serological findings. 12 patients had neutralising antibody titres of greater than or equal to 256 to coxsackieviruses B1-5 (six positive for enteroviral RNA sequences, six negative); three were positive for Epstein-Barr virus specific IgM (two positive, one negative). Six controls had similar neutralising antibody titres to coxsackieviruses (all negative); one was positive for Epstein-Barr virus specific IgM (negative); and three had titres of complement fixing antibody greater than or equal to 256 to cytomegalovirus (all negative). Overall, significantly more patients than controls had enteroviral RNA sequences in muscle (32/60, 53% v 6/41, 15%; odds ratio 6.7, 95% confidence interval 2.4 to 18.2). This was not correlated with duration of disease, patient and age, or to raised titres of antibodies to coxsackieviruses B1-5. Persistent enteroviral infection of muscle may occur in some patients with postviral fatigue syndrome and may have an aetiological role.
Cuypers, Bart; Domagalska, Malgorzata A; Meysman, Pieter; Muylder, Géraldine de; Vanaerschot, Manu; Imamura, Hideo; Dumetz, Franck; Verdonckt, Thomas Wolf; Myler, Peter J; Ramasamy, Gowthaman; Laukens, Kris; Dujardin, Jean-Claude
High throughput sequencing techniques are poorly adapted for in vivo studies of parasites, which require prior in vitro culturing and purification. Trypanosomatids, a group of kinetoplastid protozoans, possess a distinctive feature in their transcriptional mechanism whereby a specific Spliced Leader (SL) sequence is added to the 5'end of each mRNA by trans-splicing. This allows to discriminate Trypansomatid RNA from mammalian RNA and forms the basis of our new multiplexed protocol for high-throughput, selective RNA-sequencing called SL-seq. We provided a proof-of-concept of SL-seq in Leishmania donovani, the main causative agent of visceral leishmaniasis in humans, and successfully applied the method to sequence Leishmania mRNA directly from infected macrophages and from highly diluted mixes with human RNA. mRNA profiles obtained with SL-seq corresponded largely to those obtained from conventional poly-A tail purification methods, indicating both enumerate the same mRNA pool. However, SL-seq offers additional advantages, including lower sequencing depth requirements, fast and simple library prep and high resolution splice site detection. SL-seq is therefore ideal for fast and massive parallel sequencing of parasite transcriptomes directly from host tissues. Since SLs are also present in Nematodes, Cnidaria and primitive chordates, this method could also have high potential for transcriptomics studies in other organisms.
Full Text Available Abstract Background Inverted repeat genes encode precursor RNAs characterized by hairpin structures. These RNA hairpins are then metabolized by biosynthetic pathways to produce functional small RNAs. In eukaryotic genomes, short non-autonomous transposable elements can have similar size and hairpin structures as non-coding precursor RNAs. This resemblance leads to problems annotating small RNAs. Results We mapped all microRNA precursors from miRBASE to several genomes and studied the repetition and dispersion of the corresponding loci. We then searched for repetitive elements overlapping these loci. We developed an automatic method called ncRNAclassifier to classify pre-ncRNAs according to their relationship with transposable elements (TEs. We showed that there is a correlation between the number of scattered occurrences of ncRNA precursor candidates and the presence of TEs. We applied ncRNAclassifier on six chordate genomes and report our findings. Among the 1,426 human and 721 mouse pre-miRNAs of miRBase, we identified 235 and 68 mis-annotated pre-miRNAs respectively corresponding completely to TEs. Conclusions We provide a tool enabling the identification of repetitive elements in precursor ncRNA sequences. ncRNAclassifier is available at http://EvryRNA.ibisc.univ-evry.fr.
Kong, Ka-Yiu Edwin; Tang, Hei-Man Vincent; Pan, Kewu; Huang, Zhe; Lee, Tsz-Hang Jimmy; Hinnebusch, Alan G.; Wong, Chi-Ming
Most unwanted RNA transcripts in the nucleus of eukaryotic cells, such as splicing-defective pre-mRNAs and spliced-out introns, are rapidly degraded by the nuclear exosome. In budding yeast, a number of these unwanted RNA transcripts, including spliced-out introns, are first recognized by the nuclear exosome cofactor Trf4/5p-Air1/2p-Mtr4p polyadenylation (TRAMP) complex before subsequent nuclear-exosome-mediated degradation. However, it remains unclear when spliced-out introns are recognized by TRAMP, and whether TRAMP may have any potential roles in pre-mRNA splicing. Here, we demonstrated that TRAMP is cotranscriptionally recruited to nascent RNA transcripts, with particular enrichment at intronic sequences. Deletion of TRAMP components led to further accumulation of unspliced pre-mRNAs even in a yeast strain defective in nuclear exosome activity, suggesting a novel stimulatory role of TRAMP in splicing. We also uncovered new genetic and physical interactions between TRAMP and several splicing factors, and further showed that TRAMP is required for optimal recruitment of the splicing factor Msl5p. Our study provided the first evidence that TRAMP facilitates pre-mRNA splicing, and we interpreted this as a fail-safe mechanism to ensure the cotranscriptional recruitment of TRAMP before or during splicing to prepare for the subsequent targeting of spliced-out introns to rapid degradation by the nuclear exosome. PMID:24097436
Xu, Chenxi; Sun, Xuepeng; Taylor, Angela; Jiao, Chen; Xu, Yimin; Cai, Xiaofeng; Wang, Xiaoli; Ge, Chenhui; Pan, Guanghui; Wang, Quanxi; Fei, Zhangjun; Wang, Quanhua
Tomato is a major vegetable crop that has tremendous popularity. However, viral disease is still a major factor limiting tomato production. Here, we report the tomato virome identified through sequencing small RNAs of 170 field-grown samples collected in China. A total of 22 viruses were identified, including both well-documented and newly detected viruses. The tomato viral community is dominated by a few species, and they exhibit polymorphisms and recombination in the genomes with cold spots and hot spots. Most samples were coinfected by multiple viruses, and the majority of identified viruses are positive-sense single-stranded RNA viruses. Evolutionary analysis of one of the most dominant tomato viruses, Tomato yellow leaf curl virus (TYLCV), predicts its origin and the time back to its most recent common ancestor. The broadly sampled data have enabled us to identify several unreported viruses in tomato, including a completely new virus, which has a genome of ∼13.4 kb and groups with aphid-transmitted viruses in the genus Cytorhabdovirus Although both DNA and RNA viruses can trigger the biogenesis of virus-derived small interfering RNAs (vsiRNAs), we show that features such as length distribution, paired distance, and base selection bias of vsiRNA sequences reflect different plant Dicer-like proteins and Argonautes involved in vsiRNA biogenesis. Collectively, this study offers insights into host-virus interaction in tomato and provides valuable information to facilitate the management of viral diseases. IMPORTANCE Tomato is an important source of micronutrients in the human diet and is extensively consumed around the world. Virus is among the major constraints on tomato production. Categorizing virus species that are capable of infecting tomato and understanding their diversity and evolution are challenging due to difficulties in detecting such fast-evolving biological entities. Here, we report the landscape of the tomato virome in China, the leading country in
Full Text Available The RNA interference (RNAi pathway, in which microprocessor and Dicer collaborate to process microRNAs (miRNA, was recently expanded by the description of alternative processing routes. In one of these noncanonical pathways, Dicer action is replaced by the Argonaute2 (Ago2 slicer function. It was recently shown that the stem-length of precursor-miRNA or short hairpin RNA (shRNA molecules is a major determinant for Dicer versus Ago2 processing. Here we present the results of a deep sequence study on the processing of shRNAs with different stem length and a top G·U wobble base pair (bp. This analysis revealed some unexpected properties of these so-called AgoshRNA molecules that are processed by Ago2 instead of Dicer. First, we confirmed the gradual shift from Dicer to Ago2 processing upon shortening of the hairpin length. Second, hairpins with a stem larger than 19 base pair are inefficiently cleaved by Ago2 and we noticed a shift in the cleavage site. Third, the introduction of a top G·U bp in a regular shRNA can promote Ago2-cleavage, which coincides with a loss of Ago2-loading of the Dicer-cleaved 3’ strand. Fourth, the Ago2-processed AgoshRNAs acquire a short 3’ tail of 1–3 A-nucleotides (nt and we present evidence that this product is subsequently trimmed by the poly(A-specific ribonuclease (PARN.
Lee, James W.; Thundat, Thomas G.
An apparatus and method for performing nucleic acid (DNA and/or RNA) sequencing on a single molecule. The genetic sequence information is obtained by probing through a DNA or RNA molecule base by base at nanometer scale as though looking through a strip of movie film. This DNA sequencing nanotechnology has the theoretical capability of performing DNA sequencing at a maximal rate of about 1,000,000 bases per second. This enhanced performance is made possible by a series of innovations including: novel applications of a fine-tuned nanometer gap for passage of a single DNA or RNA molecule; thin layer microfluidics for sample loading and delivery; and programmable electric fields for precise control of DNA or RNA movement. Detection methods include nanoelectrode-gated tunneling current measurements, dielectric molecular characterization, and atomic force microscopy/electrostatic force microscopy (AFM/EFM) probing for nanoscale reading of the nucleic acid sequences.
Santoni, Daniele; Felici, Giovanni; Vergni, Davide
Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. Copyright © 2015 Elsevier Ltd. All rights reserved.
Le, Binh Huy; Joo, Han Na; Hwang, Do Won; Kim, Kyu Wan; Seo, Young Jun
We have developed a AuNP-CTG based probing system that is applicable to the detection of many units of CAG repeat sequences which was synthesized by a rolling circle amplification (RCA) system with changes in fluorescence. We also demonstrate that our AuNP-CTG based probing system could transfect without using transfection reagent and detect target CAG repeat sequences in HeLa cells with dramatic changes in fluorescence. This AuNP-CTG based probing system could also be used, in conjunction with the CAG repeat RCA system, to detect target DNA. This system was so sensitive to the target DNA that it could detect even picomolar amounts with amplification of the fluorescence signal. Furthermore, we have used our gold-based CAG probing system for the detection of RNA CAG repeat sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
You, Feng; Liu, Jing; Zhang, Peijun; Xiang, Jianhai
A 605 bp section of mitochondrial 16S rRNA gene from Paralichthys olivaceus, Pseudorhombus cinnamomeus, Psetta maxima and Kareius bicoloratus, which represent 3 families of Order Pleuronectiformes was amplified by PCR and sequenced to show the molecular systematics of Pleuronectiformes for comparison with related gene sequences of other 6 flatfish downloaded from GenBank. Phylogenetic analysis based on genetic distance from related gene sequences of 10 flatfish showed that this method was ideal to explore the relationship between species, genera and families. Phylogenetic trees set-up is based on neighbor-joining, maximum parsimony and maximum likelihood methods that accords to the general rule of Pleuronectiformes evolution. But they also resulted in some confusion. Unlike data from morphological characters, P. olivaceus clustered with K. bicoloratus, but P. cinnamomeus did not cluster with P. olivaceus, which is worth further studying.
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Peter E Larsen
Full Text Available BACKGROUND: Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. METHODOLOGY: We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96% successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. CONCLUSIONS: 69% of expressed mycorrhizal JGI "best" gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene
Edgren, Henrik; Murumagi, Astrid; Kangaspeska, Sara; Nicorici, Daniel; Hongisto, Vesa; Kleivi, Kristine; Rye, Inga H; Nyberg, Sandra; Wolf, Maija; Borresen-Dale, Anne-Lise; Kallioniemi, Olli
Until recently, chromosomal translocations and fusion genes have been an underappreciated class of mutations in solid tumors. Next-generation sequencing technologies provide an opportunity for systematic characterization of cancer cell transcriptomes, including the discovery of expressed fusion genes resulting from underlying genomic rearrangements. We applied paired-end RNA-seq to identify 24 novel and 3 previously known fusion genes in breast cancer cells. Supported by an improved bioinformatic approach, we had a 95% success rate of validating gene fusions initially detected by RNA-seq. Fusion partner genes were found to contribute promoters (5' UTR), coding sequences and 3' UTRs. Most fusion genes were associated with copy number transitions and were particularly common in high-level DNA amplifications. This suggests that fusion events may contribute to the selective advantage provided by DNA amplifications and deletions. Some of the fusion partner genes, such as GSDMB in the TATDN1-GSDMB fusion and IKZF3 in the VAPB-IKZF3 fusion, were only detected as a fusion transcript, indicating activation of a dormant gene by the fusion event. A number of fusion gene partners have either been previously observed in oncogenic gene fusions, mostly in leukemias, or otherwise reported to be oncogenic. RNA interference-mediated knock-down of the VAPB-IKZF3 fusion gene indicated that it may be necessary for cancer cell growth and survival. In summary, using RNA-sequencing and improved bioinformatic stratification, we have discovered a number of novel fusion genes in breast cancer, and identified VAPB-IKZF3 as a potential fusion gene with importance for the growth and survival of breast cancer cells.
José P. Oliveira-Filho
Full Text Available The hypoferremia that is observed during systemic inflammatory processes is mediated by hepcidin, which is a peptide that is mainly synthesized in the livers of several mammalian species. Hepcidin plays a key role in iron metabolism and in the innate immune system. It's up-regulation is particularly useful during acute inflammation, and it restricts the iron availability that is necessary for the growth of pathogenic microorganisms. In this study, the hepcidin mRNA of Equus asinus has been characterized, and the expression of donkey hepcidin in the liver has been determined. The donkey hepcidin sequence has an open reading frame (ORF of 261 nucleotides, and the deduced corresponding protein sequence has 86 amino acids. The amino acid sequence of donkey hepcidin was most homologous to Equus caballus (98%. The mature donkey hepcidin sequence (25 amino acids was 100% homologous to the equine mature hepcidin and has eight conserved cysteine residues that are found in all of the investigated hepcidin sequences. The expression profile of donkey hepcidin in the liver was high and was similar to the reference gene expression. The donkey hepcidin sequence was deposited in GenBankTM (HQ902884 and may be useful for additional studies on iron metabolism and the inflammatory process in this species.
Ma, M Y; Jacob-Samuel, B; Dignam, J C; Pace, U; Goldberg, A R; George, S T
External guide sequences (EGSs) are short oligoribonucleotides, which are designed to bind to a given RNA target and form a precursor tRNA-like complex. This complex can be recognized by ribonuclease P (RNase P), resulting in specific cleavage of the RNA target. To explore the potential of this class of compounds as therapeutic agents and valuable tools for gene function analysis, various chemical modifications were introduced into an all-RNA EGS molecule to confer nuclease resistance. In particular, 2'-O-methyl substitutions were incorporated into the entire sequence (i.e., A-stem, D-stem, and T-stem) except the T-loop region without loss of cleavage-inducing activity. Replacement of rU (position 54) and rC (position 56) in the T-loop with their 2'-O-methyl counterparts caused pronounced decrease in activity. Moreover, phosphorothioate backbone modification of the T-loop did not provide sufficient protection against endonucleolytic attack at the ribopyrimidine residues. Systematic modification of the T-loop with a variety of modified nucleosides and the addition of a 3'-3' inverted T at the 3'-end have generated several lead EGS prototypes, which not only exhibit wild-type activity in inducing RNase P-mediated target cleavage as compared with the all-RNA control but also remain intact in human serum for more than 24 hours. These results should provide useful insights into the design and development of oligonucleotide-based EGSs as potential regulators of gene expression.
Kim, Jungeun; Park, June Hyun; Lim, Chan Ju; Lim, Jae Yun; Ryu, Jee-Youn; Lee, Bong-Woo; Choi, Jae-Pil; Kim, Woong Bom; Lee, Ha Yeon; Choi, Yourim; Kim, Donghyun; Hur, Cheol-Goo; Kim, Sukweon; Noh, Yoo-Sun; Shin, Chanseok; Kwon, Suk-Yoon
Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants--making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: 'Vital', 'Maroussia', and 'Sympathy' and Rosa rugosa Thunb., respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand
Sigurgeirsson, Benjamín; Emanuelsson, Olof; Lundeberg, Joakim
Strand specific RNA sequencing is rapidly replacing conventional cDNA sequencing as an approach for assessing information about the transcriptome. Alongside improved laboratory protocols the development of bioinformatical tools is steadily progressing. In the current procedure the Illumina TruSeq library preparation kit is used, along with additional reagents, to make stranded libraries in an automated fashion which are then sequenced on Illumina HiSeq 2000. By the use of freely available bioinformatical tools we show, through quality metrics, that the protocol is robust and reproducible. We further highlight the practicality of strand specific libraries by comparing expression of strand specific libraries to non-stranded libraries, by looking at known antisense transcription of pseudogenes and by identifying novel transcription. Furthermore, two ribosomal depletion kits, RiboMinus and RiboZero, are compared and two sequence aligners, Tophat2 and STAR, are also compared. The, non-stranded, Illumina TruSeq kit can be adapted to generate strand specific libraries and can be used to access detailed information on the transcriptome. The RiboZero kit is very effective in removing ribosomal RNA from total RNA and the STAR aligner produces high mapping yield in a short time. Strand specific data gives more detailed and correct results than does non-stranded data as we show when estimating expression values and in assembling transcripts. Even well annotated genomes need improvements and corrections which can be achieved using strand specific data. Researchers in the field should strive to use strand specific data; it allows for more confidence in the data analysis and is less likely to lead to false conclusions. If faced with analysing non-stranded data, researchers should be well aware of the caveats of that approach.
Li, Rongzhong; Ge, Heming W; Cho, Samuel S
The folding of bacterial tRNAs with disparate sequences has been observed to proceed in distinct folding mechanisms despite their structural similarity. To explore the folding landscapes of tRNA, we performed ion concentration-dependent coarse-grained TIS model MD simulations of several E. coli tRNAs to compare their thermodynamic melting profiles to the classical absorbance spectra of Crothers and co-workers. To independently validate our findings, we also performed atomistic empirical force field MD simulations of tRNAs, and we compared the base-to-base distances from coarse-grained and atomistic MD simulations to empirical base-stacking free energies. We then projected the free energies to the secondary structural elements of tRNA, and we observe distinct, parallel folding mechanisms whose differences can be inferred on the basis of their sequence-dependent base-stacking stabilities. In some cases, a premature, nonproductive folding intermediate corresponding to the Ψ hairpin loop must backtrack to the unfolded state before proceeding to the folded state. This observation suggests a possible explanation for the fast and slow phases observed in tRNA folding kinetics.
Slaton, Kendall P; Huffer, Michael D; Wikle, Edward J; Zhang, Jie; Morrow, Casey D; Rhodes, S Craig; Eleazer, Paul D
The rapid antibiotic sensitivity test (RAST) is a novel in-office culture and sensitivity system for endodontic infections. The purpose of this research was to validate the RAST system as a viable, in-office alternative to antibiotic sensitivity testing using turbidity to determine antibiotic sensitivities of endodontic infections. Aspirates were taken from the root canals of 9 necrotic human teeth at the initiation of root canal therapy. These samples were cultured in the RAST medium, and antibiotic sensitivity to 6 antibiotics was tested. Further analysis was performed using 16S ribosomal RNA (rRNA) gene sequencing. Thirty-one bacterial phyla were identified as well as 2 phyla of the kingdom Archaea. Augmentin (Dr. Reddy's Laboratories Ltd, Hyderabad, India) and ampicillin performed identically at 24 hours, inhibiting turbidity in 100% of the samples. At 48 hours in anaerobic conditions, Augmentin outperformed ampicillin by 13%. Ciprofloxacin was the least efficacious antibiotic. At 48 hours, only 22% of anaerobic ciprofloxacin cultures affectively inhibited bacterial growth. The RAST medium is a viable in-office alternative to antibiotic susceptibility testing in an off-site laboratory. It is able to support the growth of a wide variety of microorganisms in both aerobic and anaerobic environments, and, in combination with 16S rRNA gene sequencing, it led to the identification of a new archaebacterial phylum, Crenarchaeota, as part of the endodontic infection microbiome. Copyright © 2017 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
El Gawhary, Somaia; El-Anany, Mervat; Hassan, Reem; Ali, Doaa; El Gameel, El Qassem
Different molecular assays for the detection of bacterial DNA in the peripheral blood represented a diagnostic tool for neonatal sepsis. We targeted to evaluate the role of 16S rRNA gene sequencing to screen for bacteremia to confirm suspected neonatal sepsis (NS) and compare with risk factors and septic screen testing. Sixty-two neonates with suspected NS were enrolled. White blood cells count, I/T ratio, C-reactive protein, blood culture and 16S rRNA sequencing were performed. Blood culture was positive in 26% of cases, and PCR was positive in 26% of cases. Evaluation of PCR for the diagnosis of NS showed sensitivity 62.5%, specificity 86.9%, PPV 62.5%, NPV 86.9% and accuracy of 79.7%. 16S rRNA PCR increased the sensitivity of detecting bacterial DNA in newborns with signs of sepsis from 26 to 35.4%, and its use can be limited to cases with the most significant risk factors and positive septic screen. © The Author . Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Full Text Available In this study, we evaluated how gene expression differs in mature Pseudomonas aeruginosa biofilms as opposed to planktonic cells by the use of RNA sequencing technology that gives rise to both quantitative and qualitative information on the transcriptome. Although a large proportion of genes were consistently regulated in both the stationary phase and biofilm cultures as opposed to the late exponential growth phase cultures, the global biofilm gene expression pattern was clearly distinct indicating that biofilms are not just surface attached cells in stationary phase. A large amount of the genes found to be biofilm specific were involved in adaptation to microaerophilic growth conditions, repression of type three secretion and production of extracellular matrix components. Additionally, we found many small RNAs to be differentially regulated most of them similarly in stationary phase cultures and biofilms. A qualitative analysis of the RNA-seq data revealed more than 3000 putative transcriptional start sites (TSS. By the use of rapid amplification of cDNA ends (5'-RACE we confirmed the presence of three different TSS associated with the pqsABCDE operon, two in the promoter of pqsA and one upstream of the second gene, pqsB. Taken together, this study reports the first transcriptome study on P. aeruginosa that employs RNA sequencing technology and provides insights into the quantitative and qualitative transcriptome including the expression of small RNAs in P. aeruginosa biofilms.
Full Text Available Background: Single cell RNA sequencing (scRNA-seq has rapidly gained popularity for profiling transcriptomes of hundreds to thousands of single cells. This technology has led to the discovery of novel cell types and revealed insights into the development of complex tissues. However, many technical challenges need to be overcome during data generation. Due to minute amounts of starting material, samples undergo extensive amplification, increasing technical variability. A solution for mitigating amplification biases is to include unique molecular identifiers (UMIs, which tag individual molecules. Transcript abundances are then estimated from the number of unique UMIs aligning to a specific gene, with PCR duplicates resulting in copies of the UMI not included in expression estimates. Methods: Here we investigate the effect of gene length bias in scRNA-Seq across a variety of datasets that differ in terms of capture technology, library preparation, cell types and species. Results: We find that scRNA-seq datasets that have been sequenced using a full-length transcript protocol exhibit gene length bias akin to bulk RNA-seq data. Specifically, shorter genes tend to have lower counts and a higher rate of dropout. In contrast, protocols that include UMIs do not exhibit gene length bias, with a mostly uniform rate of dropout across genes of varying length. Across four different scRNA-Seq datasets profiling mouse embryonic stem cells (mESCs, we found the subset of genes that are only detected in the UMI datasets tended to be shorter, while the subset of genes detected only in the full-length datasets tended to be longer. Conclusions: We find that the choice of scRNA-seq protocol influences the detection rate of genes, and that full-length datasets exhibit gene-length bias. In addition, despite clear differences between UMI and full-length transcript data, we illustrate that full-length and UMI data can be combined to reveal the underlying biology
Full Text Available The recent nucleic acid sequencing revolution driven by shotgun and high-throughput technologies has led to a rapid increase in the number of sequences for microbial communities. The availability of 16S ribosomal RNA (rRNA gene sequences from a multitude of natural environments now offers a unique opportunity to study microbial diversity and community structure. The large volume of sequencing data however makes it time consuming to assign individual sequences to phylotypes by searching them against public databases. Since ribosomal sequences have diverged across prokaryotic species, they can be grouped into clusters that represent operational taxonomic units. However, available clustering programs suffer from overlap of sequence spaces in adjacent clusters. In natural environments, gene sequences are homogenous within species but divergent between species. This evolutionary constraint results in an uneven distribution of genetic distances of genes in sequence space. To cluster 16S rRNA sequences more accurately, it is therefore essential to select core sequences that are located at the centers of the distributions represented by the genetic distance of sequences in taxonomic units. Based on this idea, we here describe a novel sequence clustering algorithm named CLUSTOM that minimizes the overlaps between adjacent clusters. The performance of this algorithm was evaluated in a comparative exercise with existing programs, using the reference sequences of the SILVA database as well as published pyrosequencing datasets. The test revealed that our algorithm achieves higher accuracy than ESPRIT-Tree and mothur, few of the best clustering algorithms. Results indicate that the concept of an uneven distribution of sequence distances can effectively and successfully cluster 16S rRNA gene sequences. The algorithm of CLUSTOM has been implemented both as a web and as a standalone command line application, which are available at http://clustom.kribb.re.kr.
Spies, Daniel; Renz, Peter F; Beyer, Tobias A; Ciaudo, Constance
RNA sequencing (RNA-seq) has become a standard procedure to investigate transcriptional changes between conditions and is routinely used in research and clinics. While standard differential expression (DE) analysis between two conditions has been extensively studied, and improved over the past decades, RNA-seq time course (TC) DE analysis algorithms are still in their early stages. In this study, we compare, for the first time, existing TC RNA-seq tools on an extensive simulation data set and validated the best performing tools on published data. Surprisingly, TC tools were outperformed by the classical pairwise comparison approach on short time series (tools improved this shortcoming, as the majority of false-positive, but not true-positive, candidates were unique for each method. On longer time series, pairwise approach was less efficient on the overall performance compared with splineTC and maSigPro, which did not identify any false-positive candidate. © The Author 2017. Published by Oxford University Press.
Sanchez-Sandoval, Eugenia; Diaz-Quezada, Corina; Velazquez, Gilberto; Arroyo-Navarro, Luis F; Almanza-Martinez, Norineli; Trasviña-Arenas, Carlos H; Brieba, Luis G
Three proteins phylogenetically grouped with proteins from the T7 replisome localize to yeast mitochondria: DNA polymerase γ (Mip1), mitochondrial RNA polymerase (Rpo41), and a single-stranded binding protein (Rim1). Human and T7 bacteriophage RNA polymerases synthesize primers for their corresponding DNA polymerases. In contrast, DNA replication in yeast mitochondria is explained by two models: a transcription-dependent model in which Rpo41 primes Mip1 and a model in which double stranded breaks create free 3' OHs that are extended by Mip1. Herein we found that Rpo41 transcribes RNAs that can be extended by Mip1 on single and double-stranded DNA. In contrast to human mitochondrial RNA polymerase, which primes DNA polymerase γ using transcripts from the light-strand and heavy-strand origins of replication, Rpo41 primes Mip1 at replication origins and promoter sequences in vitro. Our results suggest that in ori1, short transcripts serve as primers, whereas in ori5 an RNA transcript longer than 29 nucleotides is used as primer. Copyright © 2015 © Elsevier B.V. and Mitochondria Research Society. Published by Elsevier B.V. All rights reserved.
Petkovic, Sonja; Badelt, Stefan; Block, Stephan; Flamm, Christoph; Delcea, Mihaela; Hofacker, Ivo; Müller, Sabine
Reversible chemistry allowing for assembly and disassembly of molecular entities is important for biological self-organization. Thus, ribozymes that support both cleavage and formation of phosphodiester bonds may have contributed to the emergence of functional diversity and increasing complexity of regulatory RNAs in early life. We have previously engineered a variant of the hairpin ribozyme that shows how ribozymes may have circularized or extended their own length by forming concatemers. Using the Vienna RNA package, we now optimized this hairpin ribozyme variant and selected four different RNA sequences that were expected to circularize more efficiently or form longer concatemers upon transcription. (Two-dimensional) PAGE analysis confirms that (i) all four selected ribozymes are catalytically active and (ii) high yields of cyclic species are obtained. AFM imaging in combination with RNA structure prediction enabled us to calculate the distributions of monomers and self-concatenated dimers and trimers. Our results show that computationally optimized molecules do form reasonable amounts of trimers, which has not been observed for the original system so far, and we demonstrate that the combination of theoretical prediction, biochemical and physical analysis is a promising approach toward accurate prediction of ribozyme behavior and design of ribozymes with predefined functions. © 2015 Petkovic et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Duan, Cuifang; Argout, Xavier; Gébelin, Virginie; Summo, Marilyne; Dufayard, Jean-François; Leclercq, Julie; Kuswanhadi; Piyatrakul, Piyanuch; Pirrello, Julien; Rio, Maryannick; Champion, Antony; Montoro, Pascal
Rubber tree (Hevea brasiliensis) laticifers are the source of natural rubber. Rubber production depends on endogenous and exogenous ethylene (ethephon). AP2/ERF transcription factors, and especially Ethylene-Response Factors, play a crucial role in plant development and response to biotic and abiotic stresses. This study set out to sequence transcript expressed in various tissues using next-generation sequencing and to identify AP2/ERF superfamily in the rubber tree. The 454 sequencing technique was used to produce five tissue-type transcript libraries (leaf, bark, latex, embryogenic tissues and root). Reads from all libraries were pooled and reassembled to improve mRNA lengths and produce a global library. One hundred and seventy-three AP2/ERF contigs were identified by in silico analysis based on the amino acid sequence of the conserved AP2 domain from the global library. The 142 contigs with the full AP2 domain were classified into three main families (20 AP2 members, 115 ERF members divided into 11 groups, and 4 RAV members) and 3 soloist members. Fifty-nine AP2/ERF transcripts were found in latex. Alongside the microRNA172 already described in plants, eleven additional microRNAs were predicted to inhibit Hevea AP2/ERF transcripts. Hevea has a similar number of AP2/ERF genes to that of other dicot species. We adapted the alignment and classification methods to data from next-generation sequencing techniques to provide reliable information. We observed several specific features for the ERF family. Three HbSoloist members form a group in Hevea. Several AP2/ERF genes highly expressed in latex suggest they have a specific function in Hevea. The analysis of AP2/ERF transcripts in Hevea presented here provides the basis for studying the molecular regulation of latex production in response to abiotic stresses and latex cell differentiation.
Naveed, Muhammad; Mubeen, Samavia; Khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad
In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization.
Sidorenko, Lyudmila V; Lee, Tzuu-Fen; Woosley, Aaron; Moskal, William A; Bevan, Scott A; Merlo, P Ann Owens; Walsh, Terence A; Wang, Xiujuan; Weaver, Staci; Glancy, Todd P; Wang, PoHao; Yang, Xiaozeng; Sriram, Shreedharan; Meyers, Blake C
The molecular basis of transgene susceptibility to silencing is poorly characterized in plants; thus, we evaluated several transgene design parameters as means to reduce heritable transgene silencing. Analyses of Arabidopsis plants with transgenes encoding a microalgal polyunsaturated fatty acid (PUFA) synthase revealed that small RNA (sRNA)-mediated silencing, combined with the use of repetitive regulatory elements, led to aggressive transposon-like silencing of canola-biased PUFA synthase transgenes. Diversifying regulatory sequences and using native microalgal coding sequences (CDSs) with higher GC content improved transgene expression and resulted in a remarkable trans-generational stability via reduced accumulation of sRNAs and DNA methylation. Further experiments in maize with transgenes individually expressing three crystal (Cry) proteins from Bacillus thuringiensis (Bt) tested the impact of CDS recoding using different codon bias tables. Transgenes with higher GC content exhibited increased transcript and protein accumulation. These results demonstrate that the sequence composition of transgene CDSs can directly impact silencing, providing design strategies for increasing transgene expression levels and reducing risks of heritable loss of transgene expression.
Full Text Available In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ. Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization.
Scolnick, Jonathan A; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C; Amorese, Douglas A
Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells.
Jonathan A Scolnick
Full Text Available Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET, for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE tissue RNA in both normal tissue and cancer cells.
Yassour, Moran; Grabherr, Manfred; Blood, Philip D.; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D.; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N.; Henschel, Robert; LeDuc, Richard D.; Friedman, Nir; Regev, Aviv
De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net. PMID:23845962
Hitrik, Anna; Abboud-Jarrous, Ghada; Orlovetskie, Natalie; Serruya, Raphael; Jarrous, Nayef
Human WRN, a RecQ helicase encoded by the Werner syndrome gene, is implicated in genome maintenance, including replication, recombination, excision repair and DNA damage response. These genetic processes and expression of WRN are concomitantly upregulated in many types of cancers. Therefore, targeted destruction of this helicase could be useful for elimination of cancer cells. Here, we provide a proof of concept for applying the external guide sequence (EGS) approach in directing an RNase P RNA to efficiently cleave the WRN mRNA in cultured human cell lines, thus abolishing translation and activity of this distinctive 3'-5' DNA helicase-nuclease. Remarkably, EGS-directed knockdown of WRN leads to severe inhibition of cell viability. Hence, further assessment of this targeting system could be beneficial for selective cancer therapies, particularly in the light of the recent improvements introduced into EGSs. Copyright © 2016 Elsevier B.V. All rights reserved.
Beckers, Matthew; Mohorianu, Irina; Stocks, Matthew; Applegate, Christopher; Dalmay, Tamas; Moulton, Vincent
Recently, high-throughput sequencing (HTS) has revealed compelling details about the small RNA (sRNA) population in eukaryotes. These 20 to 25 nt noncoding RNAs can influence gene expression by acting as guides for the sequence-specific regulatory mechanism known as RNA silencing. The increase in sequencing depth and number of samples per project enables a better understanding of the role sRNAs play by facilitating the study of expression patterns. However, the intricacy of the biological hypotheses coupled with a lack of appropriate tools often leads to inadequate mining of the available data and thus, an incomplete description of the biological mechanisms involved. To enable a comprehensive study of differential expression in sRNA data sets, we present a new interactive pipeline that guides researchers through the various stages of data preprocessing and analysis. This includes various tools, some of which we specifically developed for sRNA analysis, for quality checking and normalization of sRNA samples as well as tools for the detection of differentially expressed sRNAs and identification of the resulting expression patterns. The pipeline is available within the UEA sRNA Workbench, a user-friendly software package for the processing of sRNA data sets. We demonstrate the use of the pipeline on a H. sapiens data set; additional examples on a B. terrestris data set and on an A. thaliana data set are described in the Supplemental Information A comparison with existing approaches is also included, which exemplifies some of the issues that need to be addressed for sRNA analysis and how the new pipeline may be used to do this. © 2017 Beckers et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Producing a functional eukaryotic messenger RNA (mRNA) requires the coordinated activity of several large protein complexes to initiate transcription, elongate nascent transcripts, splice together exons, and cleave and polyadenylate the 3’ end. Kinetic competition between these various processes has been proposed to regulate mRNA maturation, but this model could lead to multiple, randomly determined, or stochastic, pathways or outcomes. Regulatory checkpoints have been suggested as a means of ensuring quality control. However, current methods have been unable to tease apart the contributions of these processes at a single gene or on a time scale that could provide mechanistic insight. To begin to investigate the kinetic relationship between transcription and splicing, Daniel Larson, Ph.D., of CCR’s Laboratory of Receptor Biology and Gene Expression, and his colleagues employed a single-molecule RNA imaging approach to monitor production and processing of a human β-globin reporter gene in living cells.
Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan
The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to
Full Text Available BACKGROUND: Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. METHODOLOGY/PRINCIPAL FINDINGS: We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE and reverse transcription polymerase chain reaction (RT-PCR analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. CONCLUSIONS/SIGNIFICANCE: A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.
Gu, Ying-Hong; Tao, Xiang; Lai, Xian-Jun; Wang, Hai-Yan; Zhang, Yi-Zheng
Viral diseases are the second most significant biotic stress for sweet potato, with yield losses reaching 20% to 40%. Over 30 viruses have been reported to infect sweet potato around the world, and 11 of these have been detected in China. Most of these viruses were detected by traditional detection approaches that show disadvantages in detection throughput. Next-generation sequencing technology provides a novel, high sensitive method for virus detection and diagnosis. We report the polyadenylated RNA virome of three sweet potato cultivars using a high throughput RNA sequencing approach. Transcripts of 15 different viruses were detected, 11 of which were detected in cultivar Xushu18, whilst 11 and 4 viruses were detected in Guangshu 87 and Jingshu 6, respectively. Four were detected in sweet potato for the first time, and 4 were found for the first time in China. The most prevalent virus was SPFMV, which constituted 88% of the total viral sequence reads. Virus transcripts with extremely low expression levels were also detected, such as transcripts of SPLCV, CMV and CymMV. Digital gene expression (DGE) and reverse transcription polymerase chain reaction (RT-PCR) analyses showed that the highest viral transcript expression levels were found in fibrous and tuberous roots, which suggest that these tissues should be optimum samples for virus detection. A total of 15 viruses were presumed to present in three sweet potato cultivars growing in China. This is the first insight into the sweet potato polyadenylated RNA virome. These results can serve as a basis for further work to investigate whether some of the 'new' viruses infecting sweet potato are pathogenic.
Gonzalo H Villarino
Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.
Qin, Li-Xuan; Zhou, Qin
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays.
Full Text Available In this study, we detected new sequence variations in LAMA2 and SGCG genes in 5 ethnic populations, and analysed their effect on enhancer composition and mRNA structure. PCR amplification and DNA sequencing were performed and followed by bioinformatics analyses using ESEfinder as well as MFOLD software. We found 3 novel sequence variations in the LAMA2 (c.3174+22_23insAT and c.6085 +12delA and SGCG (c.*102A/C genes. These variations were present in 210 tested healthy controls from Tunisian, Moroccan, Algerian, Lebanese and French populations suggesting that they represent novel polymorphisms within LAMA2 and SGCG genes sequences. ESEfinder showed that the c.*102A/C substitution created a new exon splicing enhancer in the 3'UTR of SGCG genes, whereas the c.6085 +12delA deletion was situated in the base pairing region between LAMA2 mRNA and the U1snRNA spliceosomal components. The RNA structure analyses showed that both variations modulated RNA secondary structure. Our results are suggestive of correlations between mRNA folding and the recruitment of spliceosomal components mediating splicing, including SR proteins. The contribution of common sequence variations to mRNA structural and functional diversity will contribute to a better study of gene expression.
Hong, Yoonki; Kim, Woo Jin; Bang, Chi Young; Lee, Jae Cheol; Oh, Yeon-Mok
Lung cancer is the most common cause of cancer related death. Alterations in gene sequence, structure, and expression have an important role in the pathogenesis of lung cancer. Fusion genes and alternative splicing of cancer-related genes have the potential to be oncogenic. In the current study, we performed RNA-sequencing (RNA-seq) to investigate potential fusion genes and alternative splicing in non-small cell lung cancer. RNA was isolated from lung tissues obtained from 86 subjects with lung cancer. The RNA samples from lung cancer and normal tissues were processed with RNA-seq using the HiSeq 2000 system. Fusion genes were evaluated using Defuse and ChimeraScan. Candidate fusion transcripts were validated by Sanger sequencing. Alternative splicing was analyzed using multivariate analysis of transcript sequencing and validated using quantitative real time polymerase chain reaction. RNA-seq data identified oncogenic fusion genes EML4-ALK and SLC34A2-ROS1 in three of 86 normal-cancer paired samples. Nine distinct fusion transcripts were selected using DeFuse and ChimeraScan; of which, four fusion transcripts were validated by Sanger sequencing. In 33 squamous cell carcinoma, 29 tumor specific skipped exon events and six mutually exclusive exon events were identified. ITGB4 and PYCR1 were top genes that showed significant tumor specific splice variants. In conclusion, RNA-seq data identified novel potential fusion transcripts and splice variants. Further evaluation of their functional significance in the pathogenesis of lung cancer is required.
Locati, M.D.; Pagano, J.F.B.; Ensink, W.A.; van Olst, M.; van Leeuwen, S.; Nehrdich, U.; Zhu, K.; Spaink, H.P.; Girard, G.; Rauwerda, H.; Jonker, M.J.; Dekker, R.J.; Breit, T.M.
5S rRNA is a ribosomal core component, transcribed from many gene copies organized in genomic repeats. Some eukaryotic species have two 5S rRNA types defined by their predominant expression in oogenesis or adult tissue. Our next-generation sequencing study on zebrafish egg, embryo and adult tissue,
Abdul-Redha, Rawaa Jalil; Balslew, Ulla; Christensen, Jens Jørgen
Globicatella sanguinis is a gram-positive coccus, resembling non-haemolytic streptococci. The organism has been isolated infrequently from normally sterile sites of humans. Three isolates obtained by blood culture could not be identified by Rapid 32 ID Strep, but partial sequencing of the 16S r......RNA gene revealed the identity of the isolated bacteria, and supplementary biochemical tests confirmed the species identification. The cases histories illustrate the dilemma of finding relevant, newly recognized, opportunistic pathogens and the identification achievement (s) that can be obtained by using...
Nierychlo, Marta; Saunders, Aaron Marc; Albertsen, Mads
Activated sludge is the most commonly applied bioprocess throughout the world for wastewater treatment. Microorganisms are key to the process, yet our knowledge of their identity and function is still limited. High-througput16S rRNA amplicon sequencing can reliably characterize microbial...... communities, and in this study activated sludge sampled from 32 Wastewater Treatment Plants (WWTPs) around the world was described and compared. The top abundant bacteria in the global activated sludge ecosystem were found and the core population shared by multiple samples was investigated. The results...
Koyano, Hitoshi; Kishino, Hirohisa
We present a methodology for quantifying biodiversity at the sequence level by developing the probability theory on a set of strings. Further, we apply our methodology to the problem of quantifying the population diversity of microorganisms in several extreme environments and digestive organs and reveal the relation between microbial diversity and various environmental parameters.
Wang, Luan; Wang, Shuangchao; Yang, Xiufen; Zeng, Hongmei; Qiu, Dewen; Guo, Lihua
The complete nucleotide sequence of a double-stranded RNA (dsRNA) mycovirus, Fusarium graminearum dsRNA virus 5 (FgV5), was identified and characterized. The FgV5 genome comprises two dsRNA genome segments of 2030 bp and 1740 bp. FgV5 dsRNA1 contains a single open reading frame (ORF1), which is predicted to encode a protein of 613 amino acids (aa) with a molecular mass of 70.4 kDa and has a conserved RNA-dependent RNA polymerase (RdRp) motif. FgV5 dsRNA2 is predicted to contain two discontinuous ORFs (ORF2 and ORF3) that code for products of unknown function. Sequence comparisons showed that FgV5 has the highest aa sequence identities to Fusarium graminearum virus 4 (FgV4) (83.01% for ORF1, 78.70% for ORF2, and 76.27% for ORF3), suggesting that FgV5 and FgV4 should be regarded as members of different species. Phylogenetic analysis indicated that FgV5 belongs to a taxonomically unassigned dsRNA mycovirus group that is related to the families Amalgaviridae and Partitiviridae. Here, we propose that FgV5 and related viruses are members of a yet to be named and formally recognized new family.
Devisetty, Upendra Kumar; Covington, Michael F; Tat, An V; Lekkala, Saradadevi; Maloof, Julin N
The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes-R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)-using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/. Copyright © 2014 Devisetty et al.
21 of 30 random amplified polymorphic DNA (RAPD) primers produced 220 reproducible bands with average of 10.47 bands per primer and 80.12% of polymorphism. OPR02 primer showed the highest number of effective allele (Ne), Shannon index (I) and genetic diversity (H). Some of the cultivars had specific bands, ...
Nome, Torfinn; Thomassen, Gard Os; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I
Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%-28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated.
Nome, Torfinn; Thomassen, Gard OS; Bruun, Jarle; Ahlquist, Terje; Bakken, Anne C; Hoff, Andreas M; Rognum, Torleiv; Nesbakken, Arild; Lorenz, Susanne; Sun, Jinchang; Barros-Silva, João Diogo; Lind, Guro E; Myklebost, Ola; Teixeira, Manuel R; Meza-Zepeda, Leonardo A; Lothe, Ragnhild A; Skotheim, Rolf I
Colorectal cancer (CRC) is the third most common cancer disease in the Western world, and about 40% of the patients die from this disease. The cancer cells are commonly genetically unstable, but only a few low-frequency recurrent fusion genes have so far been reported for this disease. In this study, we present a thorough search for novel fusion transcripts in CRC using high-throughput RNA sequencing. From altogether 220 million paired-end sequence reads from seven CRC cell lines, we identified 3391 candidate fused transcripts. By stringent requirements, we nominated 11 candidate fusion transcripts for further experimental validation, of which 10 were positive by reverse transcription-polymerase chain reaction and Sanger sequencing. Six were intrachromosomal fusion transcripts, and interestingly, three of these, AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2, were present in, respectively, 18, 18, and 20 of 21 analyzed cell lines and in, respectively, 18, 61, and 48 (17%-58%) of 106 primary cancer tissues. These three fusion transcripts were also detected in 2 to 4 of 14 normal colonic mucosa samples (14%–28%). Whole-genome sequencing identified a specific genomic breakpoint in COMMD10-AP3S1 and further indicates that both the COMMD10-AP3S1 and AKAP13-PDE8A fusion transcripts are due to genomic duplications in specific cell lines. In conclusion, we have identified AKAP13-PDE8A, COMMD10-AP3S1, and CTB-35F21.1-PSD2 as novel intrachromosomal fusion transcripts and the most highly recurring chimeric transcripts described for CRC to date. The functional and clinical relevance of these chimeric RNA molecules remains to be elucidated. PMID:24151535
Dutta, Tanmay; Deutscher, Murray P
RNase BN, the Escherichia coli homolog of RNase Z, was previously shown to act as both a distributive exoribonuclease and an endoribonuclease on model RNA substrates and to be inhibited by the presence of a 3'-terminal CCA sequence. Here, we examined the mode of action of RNase BN on bacteriophage and bacterial tRNA precursors, particularly in light of a recent report suggesting that RNase BN removes CCA sequences (Takaku, H., and Nashimoto, M. (2008) Genes Cells 13, 1087-1097). We show that purified RNase BN can process both CCA-less and CCA-containing tRNA precursors. On CCA-less precursors, RNase BN cleaved endonucleolytically after the discriminator nucleotide to allow subsequent CCA addition. On CCA-containing precursors, RNase BN acted as either an exoribonuclease or endoribonuclease depending on the nature of the added divalent cation. Addition of Co(2+) resulted in higher activity and predominantly exoribonucleolytic activity, whereas in the presence of Mg(2+), RNase BN was primarily an endoribonuclease. In no case was any evidence obtained for removal of the CCA sequence. Certain tRNA precursors were extremely poor substrates under any conditions tested. These findings provide important information on the ability of RNase BN to process tRNA precursors and help explain the known physiological properties of this enzyme. In addition, they call into question the removal of CCA sequences by RNase BN.
Snyman, Marius C; Solofoharivelo, Marie-Chrystine; Souza-Richards, Rose; Stephan, Dirk; Murray, Shane; Burger, Johan T
Phytoplasmas are cell wall-less plant pathogenic bacteria responsible for major crop losses throughout the world. In grapevine they cause grapevine yellows, a detrimental disease associated with a variety of symptoms. The high economic impact of this disease has sparked considerable interest among researchers to understand molecular mechanisms related to pathogenesis. Increasing evidence exist that a class of small non-coding endogenous RNAs, known as microRNAs (miRNAs), play an important role in post-transcriptional gene regulation during plant development and responses to biotic and abiotic stresses. Thus, we aimed to dissect complex high-throughput small RNA sequencing data for the genome-wide identification of known and novel differentially expressed miRNAs, using read libraries constructed from healthy and phytoplasma-infected Chardonnay leaf material. Furthermore, we utilised computational resources to predict putative miRNA targets to explore the involvement of possible pathogen response pathways. We identified multiple known miRNA sequence variants (isomiRs), likely generated through post-transcriptional modifications. Sequences of 13 known, canonical miRNAs were shown to be differentially expressed. A total of 175 novel miRNA precursor sequences, each derived from a unique genomic location, were predicted, of which 23 were differentially expressed. A homology search revealed that some of these novel miRNAs shared high sequence similarity with conserved miRNAs from other plant species, as well as known grapevine miRNAs. The relative expression of randomly selected known and novel miRNAs was determined with real-time RT-qPCR analysis, thereby validating the trend of expression seen in the normalised small RNA sequencing read count data. Among the putative miRNA targets, we identified genes involved in plant morphology, hormone signalling, nutrient homeostasis, as well as plant stress. Our results may assist in understanding the role that miRNA pathways play
Full Text Available Grasspea (Lathyrus sativus L., 2n = 14 has great agronomic potential because of its ability to survive under extreme conditions, such as drought and flood. However, this legume is less investigated because of its sparse genomic resources and very slow breeding process. In this study, 570 million quality-filtered and trimmed cDNA sequence reads with total length of over 82 billion bp were obtained using the Illumina NextSeqTM 500 platform. Approximately two million contigs and 142,053 transcripts were assembled from our RNA-Seq data, which resulted in 27,431 unigenes with an average length of 1,250 bp and maximum length of 48,515 bp. The unigenes were of high-quality. For example, the stay-green (SGR gene of grasspea was aligned with the SGR gene of pea with high similarity. Among these unigenes, 3,204 EST-SSR primers were designed, 284 of which were randomly chosen for validation. Of these validated unigenes, 87 (30.6% EST-SSR primers produced polymorphic amplicons among 43 grasspea accessions selected from different geographical locations. Meanwhile, 146,406 SNPs were screened and 50 SNP loci were randomly chosen for the kompetitive allele-specific PCR (KASP validation. Over 80% (42 SNP loci were successfully transformed to KASP markers. Comparison of the dendrograms according to the SSR and KASP markers showed that the different marker systems are partially consistent with the dendrogram constructed in our study.
Yu, Guoxian; Fu, Guangyuan; Lu, Chang; Ren, Yazhou; Wang, Jun
Increasing efforts have been done to figure out the association between lncRNAs and complex diseases. Many computational models construct various lncRNA similarity networks, disease similarity networks, along with known lncRNA-disease associations to infer novel associations. However, most of them neglect the structural difference between lncRNAs network and diseases network, hierarchical relationships between diseases and pattern of newly discovered associations. In this study, we developed a model that performs Bi-Random Walks to predict novel LncRNA-Disease Associations (BRWLDA in short). This model utilizes multiple heterogeneous data to construct the lncRNA functional similarity network, and Disease Ontology to construct a disease network. It then constructs a directed bi-relational network based on these two networks and available lncRNAs-disease associations. Next, it applies bi-random walks on the network to predict potential associations. BRWLDA achieves reliable and better performance than other comparing methods not only on experiment verified associations, but also on the simulated experiments with masked associations. Case studies further demonstrate the feasibility of BRWLDA in identifying new lncRNA-disease associations. PMID:28947982
Tellam, Judy T; Lekieffre, Lea; Zhong, Jie; Lynn, David J; Khanna, Rajiv
Unique purine-rich mRNA sequences embedded in the coding sequences of a distinct group of gammaherpesvirus maintenance proteins underlie the ability of the latently infected cell to minimize immune recognition. The Epstein-Barr virus nuclear antigen, EBNA1, a well characterized lymphocryptovirus maintenance protein has been shown to inhibit in cis antigen presentation, due in part to a large internal repeat domain encoding glycine and alanine residues (GAr) encoded by a purine-rich mRNA sequence. Recent studies have suggested that it is the purine-rich mRNA sequence of this repeat region rather than the encoded GAr polypeptide that directly inhibits EBNA1 self-synthesis and contributes to immune evasion. To test this hypothesis, we generated a series of EBNA1 internal repeat frameshift constructs and assessed their effects on cis-translation and endogenous antigen presentation. Diverse peptide sequences resulting from alternative repeat reading frames did not alleviate the translational inhibition characteristic of EBNA1 self-synthesis or the ensuing reduced surface presentation of EBNA1-specific peptide-MHC class I complexes. Human cells expressing the EBNA1 frameshift variants were also poorly recognized by antigen-specific T-cells. Furthermore, a comparative analysis of the mRNA sequences of the corresponding repeat regions of different viral maintenance homologues highlights the high degree of identity between the nucleotide sequences despite very little homology in the encoded amino acid sequences. Based on these combined observations, we propose that the cis-translational inhibitory effect of the EBNA1 internal repeat sequence operates mechanistically at the nucleotide level, potentially through RNA secondary structural elements, and is unlikely to be mediated through the GAr polypeptide. The demonstration that the EBNA1 repeat mRNA sequence and not the encoded protein sequence underlies immune evasion in this class of virus suggests a novel approach to
Williams, Claire R; Baccarella, Alyssa; Parrish, Jay Z; Kim, Charles C
High-throughput RNA-Sequencing (RNA-Seq) has become the preferred technique for studying gene expression differences between biological samples and for discovering novel isoforms, though the techniques to analyze the resulting data are still immature. One pre-processing step that is widely but heterogeneously applied is trimming, in which low quality bases, identified by the probability that they are called incorrectly, are removed. However, the impact of trimming on subsequent alignment to a genome could influence downstream analyses including gene expression estimation; we hypothesized that this might occur in an inconsistent manner across different genes, resulting in differential bias. To assess the effects of trimming on gene expression, we generated RNA-Seq data sets from four samples of larval Drosophila melanogaster sensory neurons, and used three trimming algorithms--SolexaQA, Trimmomatic, and ConDeTri-to perform quality-based trimming across a wide range of stringencies. After aligning the reads to the D. melanogaster genome with TopHat2, we used Cuffdiff2 to compare the original, untrimmed gene expression estimates to those following trimming. With the most aggressive trimming parameters, over ten percent of genes had significant changes in their estimated expression levels. This trend was seen with two additional RNA-Seq data sets and with alternative differential expression analysis pipelines. We found that the majority of the expression changes could be mitigated by imposing a minimum length filter following trimming, suggesting that the differential gene expression was primarily being driven by spurious mapping of short reads. Slight differences with the untrimmed data set remained after length filtering, which were associated with genes with low exon numbers and high GC content. Finally, an analysis of paired RNA-seq/microarray data sets suggests that no or modest trimming results in the most biologically accurate gene expression estimates. We find
Saraf, Shradha; Sanan-Mishra, Neeti; Gursanscky, Nial R; Carroll, Bernard J; Gupta, Dinesh; Mukherjee, Sunil Kumar
The processing of miRNA from its precursors is a precisely regulated process and after biogenesis, the miRNAs are amenable to different kinds of modifications by the addition or deletion of nucleotides at the terminal ends. However, the mechanism and functions of such modifications are not well studied in plants. In this study, we have specifically analysed the terminal end non-templated miRNA modifications, using NGS data of rice, tomato and Arabidopsis small RNA transcriptomes from different tissues and physiological conditions. Our analysis reveals template independent terminal end modifications in the mature as well as passenger strands of the miRNA duplex. Interestingly, it is also observed that miRNA sequences terminating with a cytosine (C) at the 3' end undergo a higher percentage of 5' end modifications. The terminal end modifications did not correlate with the miRNA abundances and are independent of tissue types, physiological conditions and plant species. Our analysis indicates that the addition of nucleotides at miRNA ends is not influenced by the absence of RNA dependent RNA polymerase 6. Moreover the terminal end modified miRNAs are also observed amongst AGO1 bound small RNAs and have potential to alter target, indicating its important functional role in repression of gene expression. Copyright © 2015 Elsevier Inc. All rights reserved.
Heera, Rajandas; Sivachandran, Parimannan; Chinni, Suresh V; Mason, Joanne; Croft, Larry; Ravichandran, Manickam; Yin, Lee Su
Next-generation transcriptome sequencing (RNA-Seq) has become the standard practice for studying gene splicing, mutations and changes in gene expression to obtain valuable, accurate biological conclusions. However, obtaining good sequencing coverage and depth to study these is impeded by the difficulties of obtaining high quality total RNA with minimal genomic DNA contamination. With this in mind, we evaluated the performance of Phenol-free total RNA purification kit (Amresco) in comparison with TRI Reagent (MRC) and RNeasy Mini (Qiagen) for the extraction of total RNA of Pseudomonas aeruginosa which was grown in glucose-supplemented (control) and polyethylene-supplemented (growth-limiting condition) minimal medium. All three extraction methods were coupled with an in-house DNase I treatment before the yield, integrity and size distribution of the purified RNA were assessed. RNA samples extracted with the best extraction kit were then sequenced using the Illumina HiSeq 2000 platform. TRI Reagent gave the lowest yield enriched with small RNAs (sRNAs), while RNeasy gave moderate yield of good quality RNA with trace amounts of sRNAs. The Phenol-free kit, on the other hand, gave the highest yield and the best quality RNA (RIN value of 9.85 ± 0.3) with good amounts of sRNAs. Subsequent bioinformatic analysis of the sequencing data revealed that 5435 coding genes, 452 sRNAs and 7 potential novel intergenic sRNAs were detected, indicating excellent sequencing coverage across RNA size ranges. In addition, detection of low abundance transcripts and consistency of their expression profiles across replicates from the same conditions demonstrated the reproducibility of the RNA extraction technique. Amresco's Phenol-free Total RNA purification kit coupled with DNase I treatment yielded the highest quality RNAs containing good ratios of high and low molecular weight transcripts with minimal genomic DNA. These RNA extracts gave excellent non-biased sequencing coverage useful
Holden, Todd; Gadura, N.; Dehipawala, S.; Cheung, E.; Tuffour, M.; Schneider, P.; Tremberger, G., Jr.; Lieberman, D.; Cheung, T.
Technologically important extremophiles including oil eating microbes, uranium and rocket fuel perchlorate reduction microbes, electron producing microbes and electrode electrons feeding microbes were compared in terms of their 16S rRNA sequences, a standard targeted sequence in comparative phylogeny studies. Microbes that were reported to have survived a prolonged dormant duration were also studied. Examples included the recently discovered microbe that survives after 34,000 years in a salty environment while feeding off organic compounds from other trapped dead microbes. Shannon entropy of the 16S rRNA nucleotide composition and fractal dimension of the nucleotide sequence in terms of its atomic number fluctuation analyses suggest a selected range for these extremophiles as compared to other microbes; consistent with the experience of relatively mild evolutionary pressure. However, most of the microbes that have been reported to survive in prolonged dormant duration carry sequences with fractal dimension between 1.995 and 2.005 (N = 10 out of 13). Similar results are observed for halophiles, red-shifted chlorophyll and radiation resistant microbes. The results suggest that prolonged dormant duration, in analogous to high salty or radiation environment, would select high fractal 16S rRNA sequences. Path analysis in structural equation modeling supports a causal relation between entropy and fractal dimension for the studied 16S rRNA sequences (N = 7). Candidate choices for high fractal 16S rRNA microbes could offer protection for prolonged spaceflights. BioBrick gene network manipulation could include extremophile 16S rRNA sequences in synthetic biology and shed more light on exobiology and future colonization in shielded spaceflights. Whether the high fractal 16S rRNA sequences contain an asteroidlike extra-terrestrial source could be speculative but interesting.
Li, Long; Chen, Guojin; Jin, Tingdu
As the pile of RNA-Protein complexes sequences mounted, in order to overcome time-consuming problem of the traditional identify RNA-Protein interaction sites (RPIS) method, it is urgent need develop intelligent recognition approach for quickly and reliable recognition of the RNA-Protein interaction sites (RPIS). To settle the question, we developed a new method named iRPIS-PseNNC, in which each sample is a nineteen nucleotides segment that for positive the centre of the segments is RPIS and for negative the segments centre is non-RPIS, and the sample was obtained by sliding window. The RNA sample was formulated by combining the dipeptide position-specific propensity into random forest approach, and by using the random sampling to balance the training dataset. According the voting system, we combine eleven random forest together to construct an ensemble classifier. It is shown that via the rigorous cross validations that the new predictor “iRPIS-PseNNC” achieved very high percentage of accuracy than any other existing algorithms in this field, indicating that the iRPIS-PseNNC predictor will be an effective tool for prediction RNA-Protein interaction sites.
Lavender, Christopher A; Lorenz, Ronny; Zhang, Ge; Tamayo, Rita; Hofacker, Ivo L; Weeks, Kevin M
Discovery and characterization of functional RNA structures remains challenging due to deficiencies in de novo secondary structure modeling. Here we describe a dynamic programming approach for model-free sequence comparison that incorporates high-throughput chemical probing data. Based on SHAPE probing data alone, ribosomal RNAs (rRNAs) from three diverse organisms--the eubacteria E. coli and C. difficile and the archeon H. volcanii--could be aligned with accuracies comparable to alignments based on actual sequence identity. When both base sequence identity and chemical probing reactivities were considered together, accuracies improved further. Derived sequence alignments and chemical probing data from protein-free RNAs were then used as pseudo-free energy constraints to model consensus secondary structures for the 16S and 23S rRNAs. There are critical differences between these experimentally-informed models and currently accepted models, including in the functionally important neck and decoding regions of the 16S rRNA. We infer that the 16S rRNA has evolved to undergo large-scale changes in base pairing as part of ribosome function. As high-quality RNA probing data become widely available, structurally-informed sequence alignment will become broadly useful for de novo motif and function discovery.
Christopher A Lavender
Full Text Available Discovery and characterization of functional RNA structures remains challenging due to deficiencies in de novo secondary structure modeling. Here we describe a dynamic programming approach for model-free sequence comparison that incorporates high-throughput chemical probing data. Based on SHAPE probing data alone, ribosomal RNAs (rRNAs from three diverse organisms--the eubacteria E. coli and C. difficile and the archeon H. volcanii--could be aligned with accuracies comparable to alignments based on actual sequence identity. When both base sequence identity and chemical probing reactivities were considered together, accuracies improved further. Derived sequence alignments and chemical probing data from protein-free RNAs were then used as pseudo-free energy constraints to model consensus secondary structures for the 16S and 23S rRNAs. There are critical differences between these experimentally-informed models and currently accepted models, including in the functionally important neck and decoding regions of the 16S rRNA. We infer that the 16S rRNA has evolved to undergo large-scale changes in base pairing as part of ribosome function. As high-quality RNA probing data become widely available, structurally-informed sequence alignment will become broadly useful for de novo motif and function discovery.
Fouhy, Fiona; Clooney, Adam G; Stanton, Catherine; Claesson, Marcus J; Cotter, Paul D
Next-generation sequencing platforms have revolutionised our ability to investigate the microbiota composition of complex environments, frequently through 16S rRNA gene sequencing of the bacterial component of the community. Numerous factors, including DNA extraction method, primer sequences and sequencing platform employed, can affect the accuracy of the results achieved. The aim of this study was to determine the impact of these three factors on 16S rRNA gene sequencing results, using mock communities and mock community DNA. The use of different primer sequences (V4-V5, V1-V2 and V1-V2 degenerate primers) resulted in differences in the genera and species detected. The V4-V5 primers gave the most comparable results across platforms. The three Ion PGM primer sets detected more of the 20 mock community species than the equivalent MiSeq primer sets. Data generated from DNA extracted using the 2 extraction methods were very similar. Microbiota compositional data differed depending on the primers and sequencing platform that were used. The results demonstrate the risks in comparing data generated using different sequencing approaches and highlight the merits of choosing a standardised approach for sequencing in situations where a comparison across multiple sequencing runs is required.
Full Text Available Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ∼8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.
Full Text Available Vitiligo is an idiopathic disorder characterized by depigmented patches on the skin due to a loss of melanocytes. The cause of melanocyte destruction is not fully understood. The aim of this study was to detect the potential pathways involved in the vitiligo pathogenesis to further understand the causes and entity of vitiligo. For that the transcriptome of peripheral blood mononuclear cells of 4 vitiligo patients and 4 control subjects was analyzed using the SOLiD System platform and whole transcriptome RNA sequencing application. Altogether 2,470 genes were expressed differently and GRID2IP showed the highest deviation in patients compared to controls. Using functional analysis, altogether 993 associations between the gene groups and diseases were found. The analysis revealed associations between vitiligo and diseases such as lichen planus, limb-girdle muscular dystrophy type 2B, and facioscapulohumeral muscular dystrophy. Additionally, the gene groups with an altered expression pattern are participating in processes such as cell death, survival and signaling, inflammation, and oxidative stress. In conclusion, vitiligo is rather a systemic than a local skin disease; the findings from an enormous amount of RNA sequencing data support the previous findings about vitiligo and should be further analyzed. © 2014 S. Karger AG, Basel
Conway, Tyrrell; Creecy, James P; Maddox, Scott M; Grissom, Joe E; Conkle, Trevor L; Shadid, Tyler M; Teramoto, Jun; San Miguel, Phillip; Shimada, Tomohiro; Ishihama, Akira; Mori, Hirotada; Wanner, Barry L
We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are
Full Text Available BACKGROUND: The ciliated protozoan Tetrahymena thermophila is a well-studied single-celled eukaryote model organism for cellular and molecular biology. However, the lack of extensive T. thermophila cDNA libraries or a large expressed sequence tag (EST database limited the quality of the original genome annotation. METHODOLOGY/PRINCIPAL FINDINGS: This RNA-seq study describes the first deep sequencing analysis of the T. thermophila transcriptome during the three major stages of the life cycle: growth, starvation and conjugation. Uniquely mapped reads covered more than 96% of the 24,725 predicted gene models in the somatic genome. More than 1,000 new transcribed regions were identified. The great dynamic range of RNA-seq allowed detection of a nearly six order-of-magnitude range of measurable gene expression orchestrated by this cell. RNA-seq also allowed the first prediction of transcript untranslated regions (UTRs and an updated (larger size estimate of the T. thermophila transcriptome: 57 Mb, or about 55% of the somatic genome. Our study identified nearly 1,500 alternative splicing (AS events distributed over 5.2% of T. thermophila genes. This percentage represents a two order-of-magnitude increase over previous EST-based estimates in Tetrahymena. Evidence of stage-specific regulation of alternative splicing was also obtained. Finally, our study allowed us to completely confirm about 26.8% of the genes originally predicted by the gene finder, to correct coding sequence boundaries and intron-exon junctions for about a third, and to reassign microarray probes and correct earlier microarray data. CONCLUSIONS/SIGNIFICANCE: RNA-seq data significantly improve the genome annotation and provide a fully comprehensive view of the global transcriptome of T. thermophila. To our knowledge, 5.2% of T. thermophila genes with AS is the highest percentage of genes showing AS reported in a unicellular eukaryote. Tetrahymena thus becomes an excellent unicellular
Amplification and sequencing of the complete M- and S-RNA segments of Tomato spotted wilt virus and Impatiens necrotic spot virus as a single fragment is useful for whole genome sequencing of tospoviruses co-infecting a single host plant. It avoids issues associated with overlapping amplicon-based ...
Nielsen, J. L.; Schramm, A.; Engh, G. van den
A How cytometry method was developed for rapid screening and recovery of cloned DNA containing common sequence motifs. This approach, termed fluorescence-activated cell sorting-assisted cloning, was used to recover sequences affiliated with a unique lineage within the Bacteroidetes not abundant i...... in a clone library of environmental 16S rRNA genes....
Chen, Bei Jun; Ueberham, Uwe; Mills, James D; Kirazov, Ludmil; Kirazov, Evgeni; Knobloch, Mara; Bochmann, Jana; Jendrek, Renate; Takenaka, Konii; Bliim, Nicola; Arendt, Thomas; Janitz, Michael
Normal aging is associated with impairments in cognitive functions. These alterations are caused by diminutive changes in the biology of synapses, and ineffective neurotransmission, rather than loss of neurons. Hitherto, only a few studies, exploring molecular mechanisms of healthy brain aging in higher vertebrates, utilized synaptosomal fractions to survey local changes in aging-related transcriptome dynamics. Here we present, for the first time, a comparative analysis of the synaptosomes transcriptome in the aging mouse brain using RNA sequencing. Our results show changes in the expression of genes contributing to biological pathways related to neurite guidance, synaptosomal physiology, and RNA splicing. More intriguingly, we also discovered alterations in the expression of thousands of novel, unannotated lincRNAs during aging. Further, detailed characterization of the cleavage and polyadenylation factor I subunit 1 (Clp1) mRNA and protein expression indicates its increased expression in neuronal processes of hippocampal stratum radiatum in aging mice. Together, our study uncovers a new layer of transcriptional regulation which is targeted by aging within the local environment of interconnecting neuronal cells. Copyright © 2017 Elsevier Inc. All rights reserved.
Petkovic, Sonja; Badelt, Stefan; Flamm, Christoph; Delcea, Mihaela
Reversible chemistry allowing for assembly and disassembly of molecular entities is important for biological self-organization. Thus, ribozymes that support both cleavage and formation of phosphodiester bonds may have contributed to the emergence of functional diversity and increasing complexity of regulatory RNAs in early life. We have previously engineered a variant of the hairpin ribozyme that shows how ribozymes may have circularized or extended their own length by forming concatemers. Using the Vienna RNA package, we now optimized this hairpin ribozyme variant and selected four different RNA sequences that were expected to circularize more efficiently or form longer concatemers upon transcription. (Two-dimensional) PAGE analysis confirms that (i) all four selected ribozymes are catalytically active and (ii) high yields of cyclic species are obtained. AFM imaging in combination with RNA structure prediction enabled us to calculate the distributions of monomers and self-concatenated dimers and trimers. Our results show that computationally optimized molecules do form reasonable amounts of trimers, which has not been observed for the original system so far, and we demonstrate that the combination of theoretical prediction, biochemical and physical analysis is a promising approach toward accurate prediction of ribozyme behavior and design of ribozymes with predefined functions. PMID:25999318
Shih, Chun-Liang; Luo, Ji-Dung; Chang, John Wen-Cheng; Chen, Tai-Long; Chien, Yu-Tzu; Yu, Chia-Jung; Chiou, Chiuan-Chian
Circulating mRNA is a less invasive and more easily accessed source of samples for biomedical research and clinical applications. However, it is of poor quality. We explored and compared the ability of two high-throughput platforms for the profiling of circulating mRNA regarding their ability to retrieve useful information out of this type of samples. Circulating mRNAs from three non-small cell lung cancer patients and three healthy controls were analyzed by the cDNA-mediated annealing, selection, extension, and ligation (DASL) assay and high-throughput RNA sequencing (RSEQ). Twelve genes were selected for further confirmation by reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The overall expression profiles derived from the two platforms showed modest-to-moderate correlation. Genes with higher expression levels had higher cross-platform concordance than those of medium- and low-expression levels. In addition, the pathway signatures identified by gene set enrichment analysis from both platforms were in agreement. The RT-q PCR results for the selected genes correlated well with that of RSEQ. Genes with higher expression levels have cross-platform concordance and can be potential biomarkers. Furthermore, RSEQ is a better tool for profiling circulating mRNAs. Copyright© 2015, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.
Stephen M. Lanno
Full Text Available The dietary specialist fruit fly Drosophila sechellia has evolved to specialize on the toxic fruit of its host plant Morinda citrifolia. Toxicity of Morinda fruit is primarily due to high levels of octanoic acid (OA. Using RNA interference (RNAi, prior work found that knockdown of Osiris family genes Osiris 6 (Osi6, Osi7, and Osi8 led to increased susceptibility to OA in adult D. melanogaster flies, likely representing genes underlying a Quantitative Trait Locus (QTL for OA resistance in D. sechellia. While genes in this major effect locus are beginning to be revealed, prior work has shown at least five regions of the genome contribute to OA resistance. Here, we identify new candidate OA resistance genes by performing differential gene expression analysis using RNA-sequencing (RNA-seq on control and OA-exposed D. sechellia flies. We found 104 significantly differentially expressed genes with annotated orthologs in D. melanogaster, including six Osiris gene family members, consistent with previous functional studies and gene expression analyses. Gene ontology (GO term enrichment showed significant enrichment for cuticle development in upregulated genes and significant enrichment of immune and defense responses in downregulated genes, suggesting important aspects of the physiology of D. sechellia that may play a role in OA resistance. In addition, we identified five candidate OA resistance genes that potentially underlie QTL peaks outside of the major effect region, representing promising new candidate genes for future functional studies.
Cui, Xiaodong; Meng, Jia; Rao, Manjeet K; Chen, Yidong; Huang, Yufei
Methylated RNA Immunoprecipatation combined with RNA sequencing (MeRIP-seq) is revolutionizing the de novo study of RNA epigenomics at a higher resolution. However, this new technology poses unique bioinformatics problems that call for novel and sophisticated statistical computational solutions, aiming at identifying and characterizing transcriptome-wide methyltranscriptome. We developed HEP, a Hidden Markov Model (HMM)-based Exome Peak-finding algorithm for predicting transcriptome methylation sites using MeRIP-seq data. In contrast to exomePeak, our previously developed MeRIP-seq peak calling algorithm, HEPeak models the correlation between continuous bins in an m6A peak region and it is a model-based approach, which admits rigorous statistical inference. HEPeak was evaluated on a simulated MeRIP-seq dataset and achieved higher sensitivity and specificity than exomePeak. HEPeak was also applied to real MeRIP-seq datasets from human HEK293T cell line and mouse midbrain cells and was shown to be able to recapitulate known m6A distribution in transcripts and identify novel m6A sites in long non-coding RNAs. In this paper, a novel HMM-based peak calling algorithm, HEPeak, was developed for peak calling for MeRIP-seq data. HEPeak is written in R and is publicly available.
Werner, M; Rosa, E; Nordstrom, J L; Goldberg, A R; George, S T
Human RNase P recognizes a small model substrate consisting of only the 5' leader sequence, aminoacyl acceptor stem, and T stem and loop of a tRNA precursor. It was demonstrated here that a bimolecular construct in which the T loop is opened between G57 and A58 (tRNA numbering system) is still processed by RNase P. The strand that is cleaved can be considered the target RNA, whereas the other strand serves as an external guide sequence (EGS). The nucleotides corresponding to nt 58-60 in the T loop could be deleted without affecting cleavage of the substrate. Thus, the complete T loop can be replaced by the single-stranded sequence UUCG or UUCA (nt 55-57 in the T loop). The four nucleotides UUCR possibly form a structure that resembles the uridine turn in the T loop of tRNA. Because recognition by RNase P is independent of the helical sequence, this motif can be used for targeting RNA molecules for EGS-directed cleavage by human RNase P. Chemically modified EGSs with 2'-O-methyl groups also showed activity in inducing RNase P cleavage. Several 13-mer EGSs targeted to the 2.1-kb surface antigen mRNA of hepatitis B virus (HBV) were designed and tested using a co-transcriptional cleavage assay with a 2.1-kb HBV transcript. Some of the new EGSs were capable of inducing cleavage of the HBV RNA by RNase P.
Lee, Yun-Gyoo; Kim, Inho; Oh, Somi; Shin, Dong-Yeop; Koh, Youngil; Lee, Keun-Wook
To evaluate and select microRNAs relevant to acute myeloid leukemia (AML) pathogenesis, we analyzed differential microRNA expression by quantitative small RNA next-generation sequencing using duplicate marrow samples from individual AML patients. For this study, we obtained paired marrow samples at two different time points (initial diagnosis and first complete remission status) in patients with AML. Bone marrow microRNAs were profiled by next-generation small RNA sequencing. Quantification of microRNA expression was performed by counting aligned reads to microRNA genes. Among 38 samples (32 paired samples from 16 AML patients and 6 normal marrow controls), 27 were eligible for sequencing. Small RNA sequencing showed that 12 microRNAs were selectively expressed at higher levels in AML patients than in normal controls. Among these 12 microRNAs, mir-181, mir-221, and mir-3154 were more highly expressed at initial AML diagnosis as compared to first complete remission. Significant correlations were found between higher expression levels of mir-221, mir-146, and mir-155 and higher marrow blast counts. Our results demonstrate that mir-221 and mir-181 are selectively enriched in AML marrow and reflect disease activity. mir-3154 is a novel microRNA that is relevant to AML but needs further validation.
Full Text Available Metazoan genomes encode hundreds of RNA-binding proteins (RBPs. These proteins regulate post-transcriptional gene expression and have critical roles in numerous cellular processes including mRNA splicing, export, stability and translation. Despite their ubiquity and importance, the binding preferences for most RBPs are not well characterized. In vitro and in vivo studies, using affinity selection-based approaches, have successfully identified RNA sequence associated with specific RBPs; however, it is difficult to infer RBP sequence and structural preferences without specifically designed motif finding methods. In this study, we introduce a new motif-finding method, RNAcontext, designed to elucidate RBP-specific sequence and structural preferences with greater accuracy than existing approaches. We evaluated RNAcontext on recently published in vitro and in vivo RNA affinity selected data and demonstrate that RNAcontext identifies known binding preferences for several control proteins including HuR, PTB, and Vts1p and predicts new RNA structure preferences for SF2/ASF, RBM4, FUSIP1 and SLM2. The predicted preferences for SF2/ASF are consistent with its recently reported in vivo binding sites. RNAcontext is an accurate and efficient motif finding method ideally suited for using large-scale RNA-binding affinity datasets to determine the relative binding preferences of RBPs for a wide range of RNA sequences and structures.
Full Text Available Abstract Background Roses (Rosa sp., which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO terms, Plant Ontology (PO terms, and MIPS Functional Catalogue (FunCat terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a
Littlewood, D T; Johnston, D A
Partial 28S ribosomal RNA (rRNA) gene sequences, including the variable domains D1, D2 and D3, were determined for representative species from the 4 Schistosoma species groups. On an alignment of 1345 bp from S. mansoni, S. haematobium, S. spindale and S. japonicum (with Heterobilharzia americana chosen as an outgroup), both maximum likelihood and maximum parsimony analyses provide a robust molecular phylogeny for the genus; ((((S. haematobium, S. spindale), S. mansoni), S. japonicum), H. americana). When analysed separately, both domain D1 and domain D2 yielded similarly informative data whereas D3 failed to resolve the phylogeny. These results confirm a phylogeny previously suggested by 18S rRNA gene sequences, corroborating the status of S. spindale as a sister taxon to S. haematobium, and demonstrate the utility of 28S rRNA gene sequence data for resolving phylogenies within the Schistosomatidae.
Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC...COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES...sequences which are generalizations of the Fibonacci sequences. 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16
Full Text Available Abstract Background MicroRNAs (miRNAs are small, non-coding RNAs that regulate gene expression and play a critical role in development, homeostasis, and disease. Despite their demonstrated roles in age-associated pathologies, little is known about the role of miRNAs in human aging and longevity. Results We employed massively parallel sequencing technology to identify miRNAs expressed in B-cells from Ashkenazi Jewish centenarians, i.e., those living to a hundred and a human model of exceptional longevity, and younger controls without a family history of longevity. With data from 26.7 million reads comprising 9.4 × 108 bp from 3 centenarian and 3 control individuals, we discovered a total of 276 known miRNAs and 8 unknown miRNAs ranging several orders of magnitude in expression levels, a typical characteristics of saturated miRNA-sequencing. A total of 22 miRNAs were found to be significantly upregulated, with only 2 miRNAs downregulated, in centenarians as compared to controls. Gene Ontology analysis of the predicted and validated targets of the 24 differentially expressed miRNAs indicated enrichment of functional pathways involved in cell metabolism, cell cycle, cell signaling, and cell differentiation. A cross sectional expression analysis of the differentially expressed miRNAs in B-cells from Ashkenazi Jewish individuals between the 50th and 100th years of age indicated that expression levels of miR-363* declined significantly with age. Centenarians, however, maintained the youthful expression level. This result suggests that miR-363* may be a candidate longevity-associated miRNA. Conclusion Our comprehensive miRNA data provide a resource for further studies to identify genetic pathways associated with aging and longevity in humans.
Huang, Sai; Feng, Cong; Chen, Li; Huang, Zhi; Zhou, Xuan; Li, Bei; Wang, Li-Li; Chen, Wei; Lv, Fa-Qin; Li, Tan-Shi
BACKGROUND This study aimed to uncover the molecular mechanisms underlying mild and severe pneumonia by use of mRNA sequencing (RNA-seq). MATERIAL AND METHODS RNA was extracted from the peripheral blood of patients with mild pneumonia, severe pneumonia, and healthy controls. Sequencing was performed on the HiSeq4000 platform. After filtering, clean reads were mapped to the human reference genome hg19. Differentially expressed genes (DEGs) were identified between the control group and the mild or severe group. A transcription factor-gene network was constructed for each group. Biological process (BP) terms enriched by DEGs in the network were analyzed and these genes were also mapped to the Connectivity map to search for small-molecule drugs. RESULTS A total of 199 and 560 DEGs were identified from the mild group and severe group, respectively. A transcription factor-gene network consisting of 215 nodes and another network consisting of 451 nodes were constructed in the mild group and severe group, respectively, and 54 DEGs (e.g., S100A9 and S100A12) were found to be common, with consistent differential expression changes in the 2 groups. Genes in the transcription factor-gene network for the mild group were mainly enriched in 13 BP terms, especially defense and inflammatory response (e.g., S100A8) and spermatogenesis, while the top BP terms enriched by genes in the severe group include response to oxidative stress (CCL5), wound healing, and regulation of cell differentiation (CCL5), and of the cellular protein metabolic process. CONCLUSIONS S100A9 and S100A12 may have a role in the pathogenesis of pneumonia: S100A9 and CXCL1 may contribute solely in mild pneumonia, and CCL5 and CXCL11 may contribute in severe pneumonia.
Heather P McLaughlin
Full Text Available Coxiella burnetii is a human pathogen that causes the serious zoonotic disease Q fever. It is ubiquitous in the environment and due to its wide host range, long-range dispersal potential and classification as a bioterrorism agent, this microorganism is considered an HHS Select Agent. In the event of an outbreak or intentional release, laboratory strain typing methods can contribute to epidemiological investigations, law enforcement investigation and the public health response by providing critical information about the relatedness between C. burnetii isolates collected from different sources. Laboratory cultivation of C. burnetii is both time-consuming and challenging. Availability of strain collections is often limited and while several strain typing methods have been described over the years, a true gold-standard method is still elusive. Building upon epidemiological knowledge from limited, historical strain collections and typing data is essential to more accurately infer C. burnetii phylogeny. Harmonization of auspicious high-resolution laboratory typing techniques is critical to support epidemiological and law enforcement investigation. The single nucleotide polymorphism (SNP -based genotyping approach offers simplicity, rapidity and robustness. Herein, we demonstrate SNPs identified within 16S rRNA gene sequences can differentiate C. burnetii strains. Using this method, 55 isolates were assigned to six groups based on six polymorphisms. These 16S rRNA SNP-based genotyping results were largely congruent with those obtained by analyzing restriction-endonuclease (RE-digested DNA separated by SDS-PAGE and by the high-resolution approach based on SNPs within multispacer sequence typing (MST loci. The SNPs identified within the 16S rRNA gene can be used as targets for the development of additional SNP-based genotyping assays for C. burnetii.
Full Text Available In females, X chromosome inactivation (XCI is an epigenetic, gene dosage compensatory mechanism by inactivation of one copy of X in cells. Random XCI of one of the parental chromosomes results in an approximately equal proportion of cells expressing alleles from either the maternally or paternally inherited active X, and is defined by the XCI ratio. Skewed XCI ratio is suggestive of non-random inactivation, which can play an important role in X-linked genetic conditions. Current methods rely on indirect, semi-quantitative DNA methylation-based assay to estimate XCI ratio. Here we report a direct approach to estimate XCI ratio by integrated, family-trio based whole-exome and mRNA sequencing using phase-by-transmission of alleles coupled with allele-specific expression analysis. We applied this method to in silico data and to a clinical patient with mild cognitive impairment but no clear diagnosis or understanding molecular mechanism underlying the phenotype. Simulation showed that phased and unphased heterozygous allele expression can be used to estimate XCI ratio. Segregation analysis of the patient's exome uncovered a de novo, interstitial, 1.7 Mb deletion on Xp22.31 that originated on the paternally inherited X and previously been associated with heterogeneous, neurological phenotype. Phased, allelic expression data suggested an 83∶20 moderately skewed XCI that favored the expression of the maternally inherited, cytogenetically normal X and suggested that the deleterious affect of the de novo event on the paternal copy may be offset by skewed XCI that favors expression of the wild-type X. This study shows the utility of integrated sequencing approach in XCI ratio estimation.
Dias, M M; Cánovas, A; Mantilla-Rojas, C; Riley, D G; Luna-Nevarez, P; Coleman, S J; Speidel, S E; Enns, R M; Islas-Trejo, A; Medrano, J F; Moore, S S; Fortes, M R S; Nguyen, L T; Venus, B; Diaz, I S D P; Souza, F R P; Fonseca, L F S; Baldi, F; Albuquerque, L G; Thomas, M G; Oliveira, H N
Fertility traits, such as heifer pregnancy, are economically important in cattle production systems, and are therefore, used in genetic selection programs. The aim of this study was to identify single nucleotide polymorphisms (SNPs) using RNA-sequencing (RNA-Seq) data from ovary, uterus, endometrium, pituitary gland, hypothalamus, liver, longissimus dorsi muscle, and adipose tissue in 62 candidate genes associated with heifer puberty in cattle. RNA-Seq reads were assembled to the bovine reference genome (UMD 3.1.1) and analyzed in five cattle breeds; Brangus, Brahman, Nellore, Angus, and Holstein. Two approaches used the Brangus data for SNP discovery 1) pooling all samples, and 2) within each individual sample. These approaches revealed 1157 SNPs. These were compared with those identified in the pooled samples of the other breeds. Overall, 172 SNPs within 13 genes (CPNE5, FAM19A4, FOXN4, KLF1, LOC777593, MGC157266, NEBL, NRXN3, PEPT-1, PPP3CA, SCG5, TSG101, and TSHR) were concordant in the five breeds. Using Ensembl's Variant Effector Predictor, we determined that 12% of SNPs were in exons (71% synonymous, 29% nonsynonymous), 1% were in untranslated regions (UTRs), 86% were in introns, and 1% were in intergenic regions. Since these SNPs were discovered in RNA, the variants were predicted to be within exons or UTRs. Overall, 160 novel transcripts in 42 candidate genes and five novel genes overlapping five candidate genes were observed. In conclusion, 1157 SNPs were identified in 62 candidate genes associated with puberty in Brangus cattle, of which, 172 were concordant in the five cattle breeds. Novel transcripts and genes were also identified.
Gim, Jungsoo; Won, Sungho; Park, Taesung
RNA-Sequencing (RNA-Seq) provides valuable information for characterizing the molecular nature of the cells, in particular, identification of differentially expressed transcripts on a genome-wide scale. Unfortunately, cost and limited specimen availability often lead to studies with small sample sizes, and hypothesis testing on differential expression between classes with a small number of samples is generally limited. The problem is especially challenging when only one sample per each class exists. In this case, only a few methods among many that have been developed are applicable for identifying differentially expressed transcripts. Thus, the aim of this study was to develop a method able to accurately test differential expression with a limited number of samples, in particular non-replicated samples. We propose a local-pooled-error method for RNA-Seq data (LPEseq) to account for non-replicated samples in the analysis of differential expression. Our LPEseq method extends the existing LPE method, which was proposed for microarray data, to allow examination of non-replicated RNA-Seq experiments. We demonstrated the validity of the LPEseq method using both real and simulated datasets. By comparing the results obtained using the LPEseq method with those obtained from other methods, we found that the LPEseq method outperformed the others for non-replicated datasets, and showed a similar performance with replicated samples; LPEseq consistently showed high true discovery rate while not increasing the rate of false positives regardless of the number of samples. Our proposed LPEseq method can be effectively used to conduct differential expression analysis as a preliminary design step or for investigation of a rare specimen, for which a limited number of samples is available.
Ames, E G; Lawson, M J; Mackey, A J; Holmes, J W
Cardiac hypertrophy has been well-characterized at the level of transcription. During cardiac hypertrophy, genes normally expressed primarily during fetal heart development are re-expressed, and this fetal gene program is believed to be a critical component of the hypertrophic process. Recently, alternative splicing of mRNA transcripts has been shown to be temporally regulated during heart development, leading us to consider whether fetal patterns of splicing also reappear during hypertrophy. We hypothesized that patterns of alternative splicing occurring during heart development are recapitulated during cardiac hypertrophy. Here we present a study of isoform expression during pressure-overload cardiac hypertrophy induced by 10 days of transverse aortic constriction (TAC) in rats and in developing fetal rat hearts compared to sham-operated adult rat hearts, using high-throughput sequencing of poly(A) tail mRNA. We find a striking degree of overlap between the isoforms expressed differentially in fetal and pressure-overloaded hearts compared to control: forty-four percent of the isoforms with significantly altered expression in TAC hearts are also expressed at significantly different levels in fetal hearts compared to control (Phypertrophy and fetal heart development are significantly enriched for genes involved in cytoskeletal organization, RNA processing, developmental processes, and metabolic enzymes. Our data strongly support the concept that mRNA splicing patterns normally associated with heart development recur as part of the hypertrophic response to pressure overload. These findings suggest that cardiac hypertrophy shares post-transcriptional as well as transcriptional regulatory mechanisms with fetal heart development. Copyright © 2013 Elsevier Ltd. All rights reserved.
Kovalenko O. P.
Full Text Available Aim. Cloning and sequencing of the T. thermophilus leucyl-tRNA synthetase (LeuRSTT followed by the creation of genetically engineered construct for protein expression in E.coli cells and its purification. Methods. Searching for the LeuRSTT gene was performed by Southern blot hybridization with chromosomal DNA, where digoxigenin-labeled PCR fragments of DNA were used as probes. Results. The gene of T. thermophilus HB27 leucyl-tRNA synthetase was cloned and sequenced. The open reading frame encodes a polypeptide chain of 878 amino acid residues in length (molecular mass 101 kDa. Comparison of the amino acid sequence of T. thermophilus LeuRS with that of the enzymes from other organisms showed that LeuRSTT was a part of the group of similar enzymes of prokaryotes, formed by the proteins of protobacteriae, rickettsia and mitochondria of eukaryotes. The resulting phylogenetic tree of LeuRSs reveals dichotomous branching into two lines: prokaryotic/eukaryotic mitochondrial and arhaeal/eukaryotic cytosolic proteins. Differences between prokaryotic and arhaeal branches of the LeuRSs phylogenetic tree are primarily due to the structure of two domains of the enzyme – the editing and the C-terminal. T. thermophilus LeuRS was expressed in E. coli cells by cloning the corresponding gene into pET29b vector. Conclusions. The cloned T. thermophilus leuS gene and expressed recombinant protein will be used for structural and functional studies on LeuRSTT, including X-ray analysis of the enzyme and its mutant forms in complex with different substrates
Rafalski, J A; Wiewiorowski, M; Söll, D
Genomic blots of yellow lupin (Lupinus luteus) DNA digested with restriction nucleases and probed with 32P-labelled Lupinus 5S RNA reveal that 5S DNA is organized as tandemly repeated sequences of one size class, 342 bp. The DNA is extensively methylated. Two cloned BamHI ribosomal repeats were sequenced, revealing sequence divergence within both the coding and spacer regions. Images PMID:7155897
Manaka, Akihiro; Tokue, Yutaka; Murakami, Masami
Nosocomial infection is one of the most common complications within health care facilities. Certain studies have reported outbreaks resulting from contaminated hospital environments. Although the identification of bacteria in the environment can readily be achieved using culturing methods, these methods detect live bacteria. Sequencing of the 16S ribosomal RNA (16S rRNA) gene is recognized to be effective for bacterial identification. In this study, we surveyed wards where drug-resistant bacteria had been isolated and compared conventional culture methods with 16S rRNA gene sequencing methods. Samples were collected using sterile swabs from two wards (northern and southern) at Gunma University Hospital contaminated by Acinetobacter sp.. We extracted DNA directly from the swabs. Following extraction, the DNA was amplified using polymerase chain reaction (PCR). The PCR products were cloned using the plasmid vector. The plasmid DNA were sequenced, and identification were performed using database. 16S rRNA gene sequence analyses were compared conventional culture methods. In the northern ward, Acinetobacter sp. was detected from only two of 14 samples using the culture method. In contrast, 16S rRNA gene sequencing analysis detected Acinetobacter sp. from seven of 14 samples. Drug-resistant Acinetobacter sp. was isolated from bathrooms of the southern ward and was detected from four of seven samples using the culture method in comparison with six of seven samples by 16S rRNA gene sequencing analysis. Molecular biological analysis showed a higher sensitivity to detect specific bacteria and detected a greater number of species than the culture method. Our results suggest that 16S rRNA gene sequencing analysis is useful to identify range of contamination which were not found in conventional culture method. When a nosocomial outbreak cannot be adequately controlled, molecular biological analysis may serve as a useful tool for environmental surveys in hospitals.
Gådin, Jesper R.; van't Hooft, Ferdinand M.; Eriksson, Per
Background: One aspect in which RNA sequencing is more valuable than microarray-based methods is the ability to examine the allelic imbalance of the expression of a gene. This process is often a complex task that entails quality control, alignment, and the counting of reads over heterozygous single......-nucleotide polymorphisms. Allelic imbalance analysis is subject to technical biases, due to differences in the sequences of the measured alleles. Flexible bioinformatics tools are needed to ease the workflow while retaining as much RNA sequencing information as possible throughout the analysis to detect and address...... the possible biases. Results: We present AllelicImblance, a software program that is designed to detect, manage, and visualize allelic imbalances comprehensively. The purpose of this software is to allow users to pose genetic questions in any RNA sequencing experiment quickly, enhancing the general utility...
Full Text Available Until now it is complicated for demarcating species of prokaryotes. The 16S rRNA gene sequence provide phylogenetic basis for classification. It has been widely accepted that more than 97% similarity in 16S rRNA gene sequence is a species definition for prokaryotes. However, this criterion can not correspond to real ecological unit, thus can not reveal the functional diversity in nature. The interaction with the environment is defined at the level of functional genes, not 16S rRNA gene. Protein-coding genes sequence can be expected to disclose much previously unknown ecological population of prokaryotes. These are the genes that determine the role of the species. Sequence similarity in multiple protein-coding genes is recommended as a primary criterion for demarcating taxa.
Full Text Available The dinoflagellate Karenia mikimotoi forms blooms in the coastal waters of temperate regions and occasionally causes massive fish and invertebrate mortality. This study aimed to elucidate the toxic effect of K. mikimotoi on marine organisms by using the genomics approach; RNA-sequence libraries were constructed, and data were analyzed to identify toxin-related genes. Next-generation sequencing produced 153,406 transcript contigs from the axenic culture of K. mikimotoi. BLASTX analysis against all assembled contigs revealed that 208 contigs were polyketide synthase (PKS sequences. Thus, K. mikimotoi was thought to have several genes encoding PKS metabolites and to likely produce toxin-like polyketide molecules. Of all the sequences, approximately 30 encoded eight PKS genes, which were remarkably similar to those of Karenia brevis. Our phylogenetic analyses showed that these genes belonged to a new group of PKS type-I genes. Phylogenetic and active domain analyses showed that the amino acid sequence of four among eight Karenia PKS genes was not similar to any of the reported PKS genes. These PKS genes might possibly be associated with the synthesis of polyketide toxins produced by Karenia species. Further, a homology search revealed 10 contigs that were similar to a toxin gene responsible for the synthesis of saxitoxin (sxtA in the toxic dinoflagellate Alexandrium fundyense. These contigs encoded A1-A3 domains of sxtA genes. Thus, this study identified some transcripts in K. mikimotoi that might be associated with several putative toxin-related genes. The findings of this study might help understand the mechanism of toxicity of K. mikimotoi and other dinoflagellates.
Jenjaroenpun, Piroon; Kremenska, Yuliya; Nair, Vrundha M; Kremenskoy, Maksym; Joseph, Baby; Kurochkin, Igor V
Exosomes are nanosized (30-100 nm) membrane vesicles secreted by most cell types. Exosomes have been found to contain various RNA species including miRNA, mRNA and long non-protein coding RNAs. A number of cancer cells produce elevated levels of exosomes. Because exosomes have been isolated from most body fluids they may provide a source for non-invasive cancer diagnostics. Transcriptome profiling that uses deep-sequencing technologies (RNA-Seq) offers enormous amount of data that can be used for biomarkers discovery, however, in case of exosomes this approach was applied only for the analysis of small RNAs. In this study, we utilized RNA-Seq technology to analyze RNAs present in microvesicles secreted by human breast cancer cell lines. Exosomes were isolated from the media conditioned by two human breast cancer cell lines, MDA-MB-231 and MDA-MB-436. Exosomal RNA was profiled using the Ion Torrent semiconductor chip-based technology. Exosomes were found to contain various classes of RNA with the major class represented by fragmented ribosomal RNA (rRNA), in particular 28S and 18S rRNA subunits. Analysis of exosomal RNA content revealed that it reflects RNA content of the donor cells. Although exosomes produced by the two cancer cell lines shared most of the RNA species, there was a number of non-coding transcripts unique to MDA-MB-231 and MDA-MB-436 cells. This suggests that RNA analysis might distinguish exosomes produced by low metastatic breast cancer cell line (MDA-MB-436) from that produced by highly metastatic breast cancer cell line (MDA-MB-231). The analysis of gene ontologies (GOs) associated with the most abundant transcripts present in exosomes revealed significant enrichment in genes encoding proteins involved in translation and rRNA and ncRNA processing. These GO terms indicate most expressed genes for both, cellular and exosomal RNA. For the first time, using RNA-seq, we examined the transcriptomes of exosomes secreted by human breast cancer cells. We
Full Text Available Exosomes are nanosized (30–100 nm membrane vesicles secreted by most cell types. Exosomes have been found to contain various RNA species including miRNA, mRNA and long non-protein coding RNAs. A number of cancer cells produce elevated levels of exosomes. Because exosomes have been isolated from most body fluids they may provide a source for non-invasive cancer diagnostics. Transcriptome profiling that uses deep-sequencing technologies (RNA-Seq offers enormous amount of data that can be used for biomarkers discovery, however, in case of exosomes this approach was applied only for the analysis of small RNAs. In this study, we utilized RNA-Seq technology to analyze RNAs present in microvesicles secreted by human breast cancer cell lines.Exosomes were isolated from the media conditioned by two human breast cancer cell lines, MDA-MB-231 and MDA-MB-436. Exosomal RNA was profiled using the Ion Torrent semiconductor chip-based technology. Exosomes were found to contain various classes of RNA with the major class represented by fragmented ribosomal RNA (rRNA, in particular 28S and 18S rRNA subunits. Analysis of exosomal RNA content revealed that it reflects RNA content of the donor cells. Although exosomes produced by the two cancer cell lines shared most of the RNA species, there was a number of non-coding transcripts unique to MDA-MB-231 and MDA-MB-436 cells. This suggests that RNA analysis might distinguish exosomes produced by low metastatic breast cancer cell line (MDA-MB-436 from that produced by highly metastatic breast cancer cell line (MDA-MB-231. The analysis of gene ontologies (GOs associated with the most abundant transcripts present in exosomes revealed significant enrichment in genes encoding proteins involved in translation and rRNA and ncRNA processing. These GO terms indicate most expressed genes for both, cellular and exosomal RNA.For the first time, using RNA-seq, we examined the transcriptomes of exosomes secreted by human breast
Full Text Available Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences.
Jiang, Jinjian; Wang, Nian; Chen, Peng; Zheng, Chunhou; Wang, Bing
Hotspot residues are important in the determination of protein-protein interactions, and they always perform specific functions in biological processes. The determination of hotspot residues is by the commonly-used method of alanine scanning mutagenesis experiments, which is always costly and time consuming. To address this issue, computational methods have been developed. Most of them are structure based, i.e., using the information of solved protein structures. However, the number of solved protein structures is extremely less than that of sequences. Moreover, almost all of the predictors identified hotspots from the interfaces of protein complexes, seldom from the whole protein sequences. Therefore, determining hotspots from whole protein sequences by sequence information alone is urgent. To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections using statistical physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from the AAindex1 dataset is developed. Then, the random projection technique was adopted to project the encoding instances into a reduced space. Then, several better random projections were obtained by training an IBk classifier based on the training dataset, which were thus applied to the test dataset. The ensemble of random projection classifiers is therefore obtained. Experimental results showed that although the performance of our method is not good enough for real applications of hotspots, it is very promising in the determination of hotspot residues from whole sequences. PMID:28718782
Zhang, Boyu; Yehdego, Daniel T; Johnson, Kyle L; Leung, Ming-Ying; Taufer, Michela
Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment. On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of
Zou, Quan; Hu, Qinghua; Guo, Maozu; Wang, Guohua
Multiple sequence alignment (MSA) is important work, but bottlenecks arise in the massive MSA of homologous DNA or genome sequences. Most of the available state-of-the-art software tools cannot address large-scale datasets, or they run rather slowly. The similarity of homologous DNA sequences is often ignored. Lack of parallelization is still a challenge for MSA research. We developed two software tools to address the DNA MSA problem. The first employed trie trees to accelerate the centre star MSA strategy. The expected time complexity was decreased to linear time from square time. To address large-scale data, parallelism was applied using the hadoop platform. Experiments demonstrated the performance of our proposed methods, including their running time, sum-of-pairs scores and scalability. Moreover, we supplied two massive DNA/RNA MSA datasets for further testing and research. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
de Mooij, Tristan; McCutcheon, Brandon A; Leontovich, Alexey A; Parney, Ian F
Glioblastoma (GBM) subverts the immune system toward immunosuppression through transfer of oncogenic components, including noncoding RNAs, via extracellular vesicles (EVs). GBM-derived EVs are detectable in blood and cerebrospinal fluid (CSF) and may 1 day serve as liquid biopsies. EVs of 5 adult GBM cell lines generated at our institution from primary GBMs were harvested after serum-free culture and purified via sequential centrifugation. RNA was collected using the miRNAeasy kit (Qiagen) and sequenced with Illumina HiSeq 2000. Data were analyzed using the OASIS-2.0 platform using HG38. MirTarBase and MirDB interrogated validated and predicted microRNA-gene interactions. MDS and hierarchical clustering demonstrated dBT165 as an outlier. Seven hundred twelve of 28 623 interrogated noncoding RNAs were found to be expressed, most of which were not yet associated with GBM EVs. Samples contained miRNAs, piRNAs, snoRNAs, snRNAs, and rRNAs. Hsa-miR-21-5p, hsa-let-7b-5p, hsa-miR-3182, hsa-miR-4448, hsa-let-7i-5p constituted highest overall expression. Hsa-miR-21-5p demonstrated a distinct pathophysiological role, even in hsa-miR-21-5p-low samples, through its unique target signature. Top genes targeted by all miRNAs per sample were highly conserved and specific for cell cycle, PI3K/Akt signaling, p53 and Glioma-curated KEGG pathways. MirDeep2 predicted 4 robustly expressed novel microRNAs with mature sequences aguggggccacgagcugaga, uagugguuaguacccugccuug, acagauugagagcucuuuu, and auuucuguccaacuucug. Predicted gene targets of novel miRNAs closely overlapped those of validated miRNAs. Hsa-miR-5585-3p (P = .04, fold-change: 2.8), hsa_piR_020365 (P = .016, fold-change: 7, 17th highest overall expression) and hsa_piR_008624 (P = .017, fold-change: 3.6) were predictive of <2 years survival. To our knowledge, this is the first report detailing small RNA sequencing of GBM EVs. We demonstrated the expression of a multitude of previously unassociated noncoding RNAs, including 4
We investigate a population of binary mistake sequences that result from learning with parametric models of different order. We obtain estimates of their error, algorithmic complexity and divergence from a purely random Bernoulli sequence. We study the relationship of these variables to the learner's information density parameter which is defined as the ratio between the lengths of the compressed to uncompressed files that contain the learner's decision rule. The results indicate that good learners have a low information density ρ while bad learners have a high ρ. Bad learners generate mistake sequences that are atypically complex or diverge stochastically from a purely random Bernoulli sequence. Good learners generate typically complex sequences with low divergence from Bernoulli sequences and they include mistake sequences generated by the Bayes optimal predictor. Based on the static algorithmic interference model of  the learner here acts as a static structure which "scatters" the bits of an input sequence (to be predicted) in proportion to its information density ρ thereby deforming its randomness characteristics.
Davide De Lucrezia
Full Text Available Are extant proteins the exquisite result of natural selection or are they random sequences slightly edited by evolution? This question has puzzled biochemists for long time and several groups have addressed this issue comparing natural protein sequences to completely random ones coming to contradicting conclusions. Previous works in literature focused on the analysis of primary structure in an attempt to identify possible signature of evolutionary editing. Conversely, in this work we compare a set of 762 natural proteins with an average length of 70 amino acids and an equal number of completely random ones of comparable length on the basis of their structural features. We use an ad hoc Evolutionary Neural Network Algorithm (ENNA in order to assess whether and to what extent natural proteins are edited from random polypeptides employing 11 different structure-related variables (i.e. net charge, volume, surface area, coil, alpha helix, beta sheet, percentage of coil, percentage of alpha helix, percentage of beta sheet, percentage of secondary structure and surface hydrophobicity. The ENNA algorithm is capable to correctly distinguish natural proteins from random ones with an accuracy of 94.36%. Furthermore, we study the structural features of 32 random polypeptides misclassified as natural ones to unveil any structural similarity to natural proteins. Results show that random proteins misclassified by the ENNA algorithm exhibit a significant fold similarity to portions or subdomains of extant proteins at atomic resolution. Altogether, our results suggest that natural proteins are significantly edited from random polypeptides and evolutionary editing can be readily detected analyzing structural features. Furthermore, we also show that the ENNA, employing simple structural descriptors, can predict whether a protein chain is natural or random.
Locati, Mauro D; Pagano, Johanna F B; Ensink, Wim A; van Olst, Marina; van Leeuwen, Selina; Nehrdich, Ulrike; Zhu, Kongju; Spaink, Herman P; Girard, Geneviève; Rauwerda, Han; Jonker, Martijs J; Dekker, Rob J; Breit, Timo M
5S rRNA is a ribosomal core component, transcribed from many gene copies organized in genomic repeats. Some eukaryotic species have two 5S rRNA types defined by their predominant expression in oogenesis or adult tissue. Our next-generation sequencing study on zebrafish egg, embryo, and adult tissue identified maternal-type 5S rRNA that is exclusively accumulated during oogenesis, replaced throughout the embryogenesis by a somatic-type, and thus virtually absent in adult somatic tissue. The maternal-type 5S rDNA contains several thousands of gene copies on chromosome 4 in tandem repeats with small intergenic regions, whereas the somatic-type is present in only 12 gene copies on chromosome 18 with large intergenic regions. The nine-nucleotide variation between the two 5S rRNA types likely affects TFIII binding and riboprotein L5 binding, probably leading to storage of maternal-type rRNA. Remarkably, these sequence differences are located exactly at the sequence-specific target site for genome integration by the 5S rRNA-specific Mutsu retrotransposon family. Thus, we could define maternal- and somatic-type MutsuDr subfamilies. Furthermore, we identified four additional maternal-type and two new somatic-type MutsuDr subfamilies, each with their own target sequence. This target-site specificity, frequently intact maternal-type retrotransposon elements, plus specific presence of Mutsu retrotransposon RNA and piRNA in egg and adult tissue, suggest an involvement of retrotransposons in achieving the differential copy number of the two types of 5S rDNA loci. © 2017 Locati et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Full Text Available Hydrocarbonoclastic bacteria (HCB play a key role in the biodegradation of oil hydrocarbons in marine and other environments. A small number of taxa have been identified as obligate HCB, notably the Gammaproteobacterial genera Alcanivorax, Cycloclasticus, Marinobacter, Neptumonas, Oleiphilus, Oleispira, and Thalassolituus, as well as the Alphaproteobacterial genus Thalassospira. Detection of HCB in amplicon-based sequencing surveys relies on high coverage by PCR primers and accurate taxonomic classification. In this study, we performed a phylogenetic analysis to identify 16S rRNA gene sequence regions that represent the breadth of sequence diversity within these taxa. Using validated sequences, we evaluated 449 universal 16S rRNA gene-targeted bacterial PCR primer pairs for their coverage of these taxa. The results of this analysis provide a practical framework for selection of suitable primer sets for optimal detection of HCB in sequencing surveys.
De Smet, Wim; De Loof, Karel; De Vos, Paul; Dawyndt, Peter; De Baets, Bernard
StrainInfo has augmented its type strain and species/subspecies passports with a recommendation for a high-quality 16S rRNA gene sequence available from the public sequence databases. These recommendations are generated by an automated pipeline that collects all candidate 16S rRNA gene sequences for a prokaryotic type strain, filters out low-quality sequences and retains a high-quality sequence from the remaining pool. Due to thorough automation, recommendations can be renewed daily using the latest updates of the public sequence databases and the latest species descriptions. We discuss the quality criteria constructed to filter and rank available 16S rRNA gene sequences, and show how a partially ordered set (poset) ranking algorithm can be applied to solve the multi-criteria ranking problem of selecting the best candidate sequence. The proof of concept of the recommender system is validated by comparing the results of automated selection with an expert selection made in the All-Species Living Tree Project. Based on these validation results, the pipeline may reliably be applied for non-type strains and developed further for the automated selection of housekeeping genes. Copyright © 2013 Elsevier GmbH. All rights reserved.
Iddo Z Ben-Dov
Full Text Available Urine is a potential source of biomarkers for diseases of the kidneys and urinary tract. RNA, including microRNA, is present in the urine enclosed in detached cells or in extracellular vesicles (EVs or bound and protected by extracellular proteins. Detection of cell- and disease-specific microRNA in urine may aid early diagnosis of organ-specific pathology. In this study, we applied barcoded deep sequencing to profile microRNAs in urine of healthy volunteers, and characterized the effects of sex, urine fraction (cells vs. EVs and repeated voids by the same individuals.Compared to urine-cell-derived small RNA libraries, urine-EV-derived libraries were relatively enriched with miRNA, and accordingly had lesser content of other small RNA such as rRNA, tRNA and sn/snoRNA. Unsupervised clustering of specimens in relation to miRNA expression levels showed prominent bundling by specimen type (urine cells or EVs and by sex, as well as a tendency of repeated (first and second void samples to neighbor closely. Likewise, miRNA profile correlations between void repeats, as well as fraction counterparts (cells and EVs from the same specimen were distinctly higher than correlations between miRNA profiles overall. Differential miRNA expression by sex was similar in cells and EVs.miRNA profiling of both urine EVs and sediment cells can convey biologically important differences between individuals. However, to be useful as urine biomarkers, careful consideration is needed for biofluid fractionation and sex-specific analysis, while the time of voiding appears to be less important.
Fuchs, T; Beier, D; Beier, H
Tobacco tRNA(Tyr) genes are mainly organized as a dispersed multigene family as shown by hybridization with a tRNA(Tyr)-specific probe to Southern blots of Eco RI-digested DNA. A Nicotiana genomic library was prepared by Eco RI digestion of nuclear DNA, ligation of the fragments into the vector lambda gtWES.lambda B and in vitro packaging. The phage library was screened with a 5'-labelled synthetic oligonucleotide complementary to nucleotides 18 to 37 of cytoplasmic tobacco tRNA(Tyr). Eleven hybridizing Eco RI fragments ranging in size from 1.7 to 7.5 kb were isolated from recombinant lambda phage and subcloned into pUC19 plasmid. Four of the sequenced tRNA(Tyr) genes code for the known tobacco tRNA1(Tyr) (G psi A) and seven code for tRNA2(Tyr) (G psi A). The two tRNA species differ in one nucleotide pair at the basis of the T psi C stem. Only one tRNA(Tyr) gene (pNtY5) contains a point mutation (T54-->A54). Comparison of the intervening sequences reveals that they differ considerably in length and sequence. Maturation of intron-containing pre-tRNAs was studied in HeLa and wheat germ extracts. All pre-tRNAs(Tyr)--with one exception--are processed and spliced in both extracts. The tRNA(Tyr) gene encoded by pNtY5 is transcribed efficiently in HeLa extract but processing of the pre-tRNA is impaired.
Graziadei, Andrea; Masiewicz, Pawel; Lapinaite, Audrone; Carlomagno, Teresa
RNA modifications confer complexity to the 4-nucleotide polymer; nevertheless, their exact function is mostly unknown. rRNA 2'-O-ribose methylation concentrates to ribosome functional sites and is important for ribosome biogenesis. The methyl group is transferred to rRNA by the box C/D RNPs: The rRNA sequence to be methylated is recognized by a complementary sequence on the guide RNA, which is part of the enzyme. In contrast to their eukaryotic homologs, archaeal box C/D enzymes can be assembled in vitro and are used to study the mechanism of 2'-O-ribose methylation. In Archaea, each guide RNA directs methylation to two distinct rRNA sequences, posing the question whether this dual architecture of the enzyme has a regulatory role. Here we use methylation assays and low-resolution structural analysis with small-angle X-ray scattering to study the methylation reaction guided by the sR26 guide RNA fromPyrococcus furiosus We find that the methylation efficacy at sites D and D' differ substantially, with substrate D' turning over more efficiently than substrate D. This observation correlates well with structural data: The scattering profile of the box C/D RNP half-loaded with substrate D' is similar to that of the holo complex, which has the highest activity. Unexpectedly, the guide RNA secondary structure is not responsible for the functional difference at the D and D' sites. Instead, this difference is recapitulated by the nature of the first base pair of the guide-substrate duplex. We suggest that substrate turnover may occur through a zip mechanism that initiates at the 5'-end of the product. © 2016 Graziadei et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Lemus-Minor, Carlos G; Cañizares, M Carmen; García-Pedrajas, María D; Pérez-Artés, Encarnación
A novel double-stranded RNA (dsRNA) mycovirus, designated Fusarium oxysporum f. sp. dianthi mycovirus 1 (FodV1), was isolated from a strain of the phytopathogenic fungus F. oxysporum f. sp. dianthi. The FodV1 genome had four dsRNA segments, designated, from the largest to the smallest one, dsRNA 1, 2 3, and 4. Each one of these segments contained a single open reading frame (ORF). dsRNA 1 (3555 bp) and dsRNA 3 (2794 bp) encoded a putative RNA-dependent RNA polymerase (RdRp) and a putative coat protein (CP), respectively. dsRNA 2 (2809 bp) and dsRNA 4 (2646 bp) contained ORFs encoding hypothetical proteins (named P2 and P4, respectively) with unknown functions. Analysis of its genomic structure, homology searches of the deduced amino acid sequences, and phylogenetic analysis all indicated that FodV1 is a new member of the family Chrysoviridae. This is the first report of the complete genomic characterization of a mycovirus identified in the plant pathogen Fusarium oxysporum.
Lake, Blue B; Ai, Rizi; Kaeser, Gwendolyn E; Salathia, Neeraj S; Yung, Yun C; Liu, Rui; Wildberg, Andre; Gao, Derek; Fung, Ho-Lim; Chen, Song; Vijayaraghavan, Raakhee; Wong, Julian; Chen, Allison; Sheng, Xiaoyan; Kaper, Fiona; Shen, Richard; Ronaghi, Mostafa; Fan, Jian-Bing; Wang, Wei; Chun, Jerold; Zhang, Kun
The human brain has enormously complex cellular diversity and connectivities fundamental to our neural functions, yet difficulties in interrogating individual neurons has impeded understanding of the underlying transcriptional landscape. We developed a scalable approach to sequence and quantify RNA molecules in isolated neuronal nuclei from a postmortem brain, generating 3227 sets of single-neuron data from six distinct regions of the cerebral cortex. Using an iterative clustering and classification approach, we identified 16 neuronal subtypes that were further annotated on the basis of known markers and cortical cytoarchitecture. These data demonstrate a robust and scalable method for identifying and categorizing single nuclear transcriptomes, revealing shared genes sufficient to distinguish previously unknown and orthologous neuronal subtypes as well as regional identity and transcriptomic heterogeneity within the human brain. Copyright © 2016, American Association for the Advancement of Science.
Polet, Stephane; Berney, Cédric; Fahrni, José; Pawlowski, Jan
In his grand monograph of Radiolaria, Ernst Haeckel originally included Phaeodarea together with Acantharea and Polycystinea, all three taxa characterized by the presence of a central capsule and the possession of axopodia. Cytological and ultrastructural studies, however, questioned the monophyly of Radiolaria, suggesting an independent evolutionary origin of the three taxa, and the first molecular data on Acantharea and Polycystinea brought controversial results. To test further the monophyly of Radiolaria, we sequenced the complete small subunit ribosomal RNA gene of three phaeodarians and three polycystines. Our analyses reveal that phaeodarians clearly branch among the recently described phylum Cercozoa, separately from Acantharea and Polycystinea. This result enhances the morphological variability within the phylum Cercozoa, which already contains very heterogeneous groups of protists. Our study also confirms the common origin of Acantharea and Polycystinea, which form a sister-group to the Cercozoa, and allows a phylogenetic reinterpretation of the morphological features of the three radiolarian groups.
Shore, Sabrina; Henderson, Jordana M; McCaffrey, Anton P
Next-generation small RNA sequencing is a valuable tool which is increasing our knowledge regarding small noncoding RNAs and their function in regulating genetic information. Library preparation protocols for small RNA have thus far been restricted due to higher RNA input requirements (>10 ng), long workflows, and tedious manual gel purifications. Small RNA library preparation methods focus largely on the prevention or depletion of a side product known as adapter dimer that tends to dominate the reaction. Adapter dimer is the ligation of two adapters to one another without an intervening library RNA insert or any useful sequencing information. The amplification of this side reaction is favored over the amplification of tagged library since it is shorter. The small size discrepancy between these two species makes separation and purification of the tagged library very difficult. Adapter dimer hinders the use of low input samples and the ability to automate the workflow so we introduce an improved library preparation protocol which uses chemically modified adapters (CleanTag) to significantly reduce the adapter dimer. CleanTag small RNA library preparation workflow decreases adapter dimer to allow for ultra-low input samples (down to approx. 10 pg total RNA), elimination of the gel purification step, and automation. We demonstrate how to carry out this streamlined protocol to improve NGS data quality and allow for the use of sample types with limited RNA material.
Thess, Andreas; Grund, Stefanie; Mui, Barbara L; Hope, Michael J; Baumhof, Patrick; Fotin-Mleczek, Mariola; Schlake, Thomas
Being a transient carrier of genetic information, mRNA could be a versatile, flexible, and safe means for protein therapies. While recent findings highlight the enormous therapeutic potential of mRNA, evidence that mRNA-based protein therapies are feasible beyond small animals such as mice is still lacking. Previous studies imply that mRNA therapeutics require chemical nucleoside modifications to obtain sufficient protein expression and avoid activation of the innate immune system. Here we show that chemically unmodified mRNA can achieve those goals as well by applying sequence-engineered molecules. Using erythropoietin (EPO) driven production of red blood cells as the biological model, engineered Epo mRNA elicited meaningful physiological responses from mice to nonhuman primates. Even in pigs of about 20 kg in weight, a single adequate dose of engineered mRNA encapsulated in lipid nanoparticles (LNPs) induced high systemic Epo levels and strong physiological effects. Our results demonstrate that sequence-engineered mRNA has the potential to revolutionize human protein therapies. PMID:26050989
Full Text Available Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects’ self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories.
Full Text Available In addition to evolutionarily-accrued sequence mutation or deletion, endogenous retroelements (EREs in eukaryotic genomes are subject to epigenetic silencing, preventing or reducing their transcription, particularly in the germplasm. Nevertheless, transcriptional activation of EREs, including endogenous retroviruses (ERVs and long interspersed nuclear elements (LINEs, is observed in somatic cells, variably upon cellular differentiation and frequently upon cellular transformation. ERE transcription is modulated during physiological and pathological immune cell activation, as well as in immune cell cancers. However, our understanding of the potential consequences of such modulation remains incomplete, partly due to the relative scarcity of information regarding genome-wide ERE transcriptional patterns in immune cells. Here, we describe a methodology that allows probing RNA-sequencing (RNA-seq data for genome-wide expression of EREs in murine and human cells. Our analysis of B cells reveals that their transcriptional response during immune activation is dominated by induction of gene transcription, and that EREs respond to a much lesser extent. The transcriptional activity of the majority of EREs is either unaffected or reduced by B cell activation both in mice and humans, albeit LINEs appear considerably more responsive in the latter host. Nevertheless, a small number of highly distinct ERVs are strongly and consistently induced during B cell activation. Importantly, this pattern contrasts starkly with B cell transformation, which exhibits widespread induction of EREs, including ERVs that minimally overlap with those responsive to immune stimulation. The distinctive patterns of ERE induction suggest different underlying mechanisms and will help separate physiological from pathological expression.
Full Text Available While mutans streptococci have long been assumed to be the specific pathogen responsible for human dental caries, the concept of a complex dental caries-associated microbiota has received significant attention in recent years. Molecular analyses revealed the complexity of the microbiota with the predominance of Lactobacillus and Prevotella in carious dentine lesions. However, characterization of the dentin caries-associated microbiota has not been extensively explored in different ethnicities and races. In the present study, the bacterial communities in the carious dentin of Japanese subjects were analyzed comprehensively with molecular approaches using the16S rRNA gene. Carious dentin lesion samples were collected from 32 subjects aged 4-76 years, and the 16S rRNA genes, amplified from the extracted DNA with universal primers, were sequenced with a pyrosequencer. The bacterial composition was classified into clusters I, II, and III according to the relative abundance (high, middle, low of Lactobacillus. The bacterial composition in cluster II was composed of relatively high proportions of Olsenella and Propionibacterium or subdominated by heterogeneous genera. The bacterial communities in cluster III were characterized by the predominance of Atopobium, Prevotella, or Propionibacterium with Streptococcus or Actinomyces. Some samples in clusters II and III, mainly related to Atopobium and Propionibacterium, were novel combinations of microbiota in carious dentin lesions and may be characteristic of the Japanese population. Clone library analysis revealed that Atopobium sp. HOT-416 and P. acidifaciens were specific species associated with dentinal caries among these genera in a Japanese population. We summarized the bacterial composition of dentinal carious lesions in a Japanese population using next-generation sequencing and found typical Japanese types with Atopobium or Propionibacterium predominating.
Full Text Available Grapevine berry development is a complex and genetically controlled process, with many morphological, biochemical and physiological changes occurring during the maturation process. Research carried out on grapevine berry development has been mainly concerned with wine grape, while barely focusing on table grape. 'Fujiminori' is an important table grapevine cultivar, which is cultivated in most provinces of China. In order to uncover the dynamic networks involved in anthocyanin biosynthesis, cell wall development, lipid metabolism and starch-sugar metabolism in 'Fujiminori' fruit, we employed RNA-sequencing (RNA-seq and analyzed the whole transcriptome of grape berry during development at the expanding period (40 days after full bloom, 40DAF, véraison period (65DAF, and mature period (90DAF. The sequencing depth in each sample was greater than 12×, and the expression level of nearly half of the expressed genes were greater than 1. Moreover, greater than 64% of the clean reads were aligned to the Vitis vinifera reference genome, and 5,620, 3,381, and 5,196 differentially expressed genes (DEGs were identified between different fruit stages, respectively. Results of the analysis of DEGs showed that the most significant changes in various processes occurred from the expanding stage to the véraison stage. The expression patterns of F3'H and F3'5'H were crucial in determining red or blue color of the fruit skin. The dynamic networks of cell wall development, lipid metabolism and starch-sugar metabolism were also constructed. A total of 4,934 SSR loci were also identified from 4,337 grapevine genes, which may be helpful for the development of phylogenetic analysis in grapevine and other fruit trees. Our work provides the foundation for developmental research of grapevine fruit as well as other non-climacteric fruits.
Shangguan, Lingfei; Mu, Qian; Fang, Xiang; Zhang, Kekun; Jia, Haifeng; Li, Xiaoying; Bao, Yiqun; Fang, Jinggui
Grapevine berry development is a complex and genetically controlled process, with many morphological, biochemical and physiological changes occurring during the maturation process. Research carried out on grapevine berry development has been mainly concerned with wine grape, while barely focusing on table grape. 'Fujiminori' is an important table grapevine cultivar, which is cultivated in most provinces of China. In order to uncover the dynamic networks involved in anthocyanin biosynthesis, cell wall development, lipid metabolism and starch-sugar metabolism in 'Fujiminori' fruit, we employed RNA-sequencing (RNA-seq) and analyzed the whole transcriptome of grape berry during development at the expanding period (40 days after full bloom, 40DAF), véraison period (65DAF), and mature period (90DAF). The sequencing depth in each sample was greater than 12×, and the expression level of nearly half of the expressed genes were greater than 1. Moreover, greater than 64% of the clean reads were aligned to the Vitis vinifera reference genome, and 5,620, 3,381, and 5,196 differentially expressed genes (DEGs) were identified between different fruit stages, respectively. Results of the analysis of DEGs showed that the most significant changes in various processes occurred from the expanding stage to the véraison stage. The expression patterns of F3'H and F3'5'H were crucial in determining red or blue color of the fruit skin. The dynamic networks of cell wall development, lipid metabolism and starch-sugar metabolism were also constructed. A total of 4,934 SSR loci were also identified from 4,337 grapevine genes, which may be helpful for the development of phylogenetic analysis in grapevine and other fruit trees. Our work provides the foundation for developmental research of grapevine fruit as well as other non-climacteric fruits.
Tan, Meng How; Au, Kin Fai; Yablonovitch, Arielle L; Wills, Andrea E; Chuang, Jason; Baker, Julie C; Wong, Wing Hung; Li, Jin Billy
The Xenopus embryo has provided key insights into fate specification, the cell cycle, and other fundamental developmental and cellular processes, yet a comprehensive understanding of its transcriptome is lacking. Here, we used paired end RNA sequencing (RNA-seq) to explore the transcriptome of Xenopus tropicalis in 23 distinct developmental stages. We determined expression levels of all genes annotated in RefSeq and Ensembl and showed for the first time on a genome-wide scale that, despite a general state of transcriptional silence in the earliest stages of development, approximately 150 genes are transcribed prior to the midblastula transition. In addition, our splicing analysis uncovered more than 10,000 novel splice junctions at each stage and revealed that many known genes have additional unannotated isoforms. Furthermore, we used Cufflinks to reconstruct transcripts from our RNA-seq data and found that ∼13.5% of the final contigs are derived from novel transcribed regions, both within introns and in intergenic regions. We then developed a filtering pipeline to separate protein-coding transcripts from noncoding RNAs and identified a confident set of 6686 noncoding transcripts in 3859 genomic loci. Since the current reference genome, XenTro3, consists of hundreds of scaffolds instead of full chromosomes, we also performed de novo reconstruction of the transcriptome using Trinity and uncovered hundreds of transcripts that are missing from the genome. Collectively, our data will not only aid in completing the assembly of the Xenopus tropicalis genome but will also serve as a valuable resource for gene discovery and for unraveling the fundamental mechanisms of vertebrate embryogenesis.
Persinoti, Gabriela Felix; de Aguiar Peres, Nalu Teixeira; Jacob, Tiago Rinaldi; Rossi, Antonio; Vêncio, Ricardo Zorzetto; Martinez-Rossi, Nilce Maria
The dermatophyte Trichophyton rubrum is an anthropophilic filamentous fungus that infects keratinized tissues and is the most common etiologic agent isolated in human dermatophytoses. The clinical treatment of these infections is challenging because only few antifungal drugs are commercially available. To understand the mode of action of cytotoxic drugs against fungi, we evaluated the time-dependent effects of acriflavine on T. rubrum transcriptome using high-throughput RNA-sequencing (RNA-seq) technology. RNA-seq analysis generated approximately 200 million short reads that were mapped to the Broad Institute's Dermatophyte Comparative Database before differential gene expression analysis was performed. By employing a stringent cut-off threshold of -1.5 and 1.5 log₂-fold changes in gene expression, a subset of 490 unique genes were found to be modulated in T. rubrum in response to acriflavine exposure. Among the selected genes, 69 genes were modulated at all exposure time points. Functional categorization indicated the putative involvement of these genes in various cellular processes such as oxidation-reduction reaction, transmembrane transport, and metal ion binding. Interestingly, genes putatively involved in the pathogenicity of dermatophytoses were down-regulated suggesting that this drug interferes with the virulence of T. rubrum. Moreover, we identified 159 novel putative transcripts in intergenic regions and two transcripts in intron regions of T. rubrum genome. The results provide insights into the molecular events underlying the stress responses of T. rubrum to acriflavine, revealing that this drug interfered with important molecular events involved in the establishment and maintenance of fungal infection in the host. In addition, the identification of novel transcripts will further enable the improvement of gene annotation and open reading frame prediction of T. rubrum and other dermatophyte genomes.
Zanchetta, G; Buscaglia, M; Bellini, T [Dipartimento di Chimica, Biochimica e Biotecnologie per la Medicina, Universita di Milano, Milano (Italy); Nakata, M; Clark, N A [Department of Physics and Liquid Crystal Materials Research Center, University of Colorado, Boulder, CO 80309-0390 (United States)], E-mail: email@example.com, E-mail: firstname.lastname@example.org
We have recently shown that solutions of very short double-stranded B-DNA and A-RNA, down to six base pairs in length, can self-organize into chiral nematic and columnar liquid crystal (LC) phases. These observations were made on fully complementary sequences forming duplexes with blunt ends, where the LC ordering is due to base stacking forces promoting end-to-end aggregation of duplexes into living-polymer-type structures. Here we report LC formation in solutions of DNA and RNA 14mers forming double helices having single-stranded dangling ends that are 'sticky', i.e., mutually complementary with similar ends on other duplexes. This finding widens the conditions for spontaneous long range ordering in oligomeric nucleic acids, thus strengthening the notion that nucleic acids have remarkable self-assembly capability. Quantitative analysis of the phase diagram enables the extraction, within a nearest-neighbor interaction approximation, of the free energy associated with the pairing and stacking of nucleobases.
Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda
Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem-loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping-pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration.
Full Text Available Grass carp hemorrhagic disease, caused by the grass carp reovirus (GCRV, is a major disease that hampers the development of grass carp aquaculture in China. The mechanism underlying GCRV infection is still largely unknown. Circular RNAs (circRNAs are important regulators involved in various biological processes. In the present study, grass carp were infected with GCRV, and spleen samples were collected at 0 (control, 1, 3, 5, and 7 days post-infection (dpi. Samples were used to construct and sequence circRNA libraries, and a total of 5052 circRNAs were identified before and after GCRV infection, of which 41 exhibited differential expression compared with controls. Many parental genes of the differentially expressed circRNAs are involved in metal ion binding, protein ubiquitination, enzyme activity, and nucleotide binding. Moreover, 72 binding miRNAs were predicted from the differentially expressed circRNAs, of which eight targeted genes were predicted to be involved in immune responses, blood coagulation, hemostasis, and complement and coagulation cascades. Upregulation of these genes may lead to endothelial and blood cell damage and hemorrhagic symptoms. Our results indicate that an mRNA–miRNA–circRNA network may be present in grass carp infected with GCRV, providing new insight into the mechanism underlying grass carp reovirus infection.
Limayem, Alya; Micciche, Andrew; Nayak, Bina; Mohapatra, Shyam
Algae biomass-fed wastewaters are a promising source of lipid and bioenergy manufacture, revealing substantial end-product investment returns. However, wastewaters would contain lytic pathogens carrying drug resistance detrimental to algae yield and environmental safety. This study was conducted to simultaneously decipher through high-throughput advanced Illumina 16S ribosomal RNA (rRNA) gene sequencing, the cultivable and uncultivable bacterial community profile found in a single sample that was directly recovered from the local wastewater systems. Samples were collected from two previously documented sources including anaerobically digested (AD) municipal wastewater and swine wastewater with algae namely Chlorella spp. in addition to control samples, swine wastewater, and municipal wastewater without algae. Results indicated the presence of a significant level of Bacteria in all samples with an average of approximately 95.49% followed by Archaea 2.34%, in local wastewaters designed for algae cultivation. Taxonomic genus identification indicated the presence of Calothrix, Pseudomonas, and Clostridium as the most prevalent strains in both local municipal and swine wastewater samples containing algae with an average of 17.37, 12.19, and 7.84%, respectively. Interestingly, swine wastewater without algae displayed the lowest level of Pseudomonas strains algae indicates potential coexistence between these strains and algae microenvironment, suggesting further investigations. This finding was particularly relevant for the earlier documented adverse effects of some nosocomial Pseudomonas strains on algae growth and their multidrug resistance potential, requiring the development of targeted bioremediation with regard to the beneficial flora.