WorldWideScience

Sample records for transcriptome sequencing performance

  1. Illumina-based de novo transcriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    2017-12-18

    Dec 18, 2017 ... In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI ...

  2. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  3. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  4. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  5. Comparative sequence analyses of genome and transcriptome ...

    Indian Academy of Sciences (India)

    /fulltext/jbsc/040/05/0891-0907. Keywords. Asian elephant; comparative genomics; gene prediction; transcriptome. Abstract. The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ...

  6. [Biomarker screening of rat pulmonary hypertension model by transcriptome sequencing].

    Science.gov (United States)

    Hu, F; Xie, L; Yu, L; Chen, J; Liu, H M

    2016-04-01

    To screen relative gene and pathway of rat severe pulmonary hypertension by transcriptome sequencing. Pulmonary hypertension animal model of SD rats was established by left lung resection and hypodermic injection of monocrotaline.Monocrotaline was injected subcutaneously one week after left lung resection.Eight rats at 1, 3, 5 weeks after the injection of monocrotaline respectively were named group M1, group M2 and group M3.Eight normal rats were assigned into control group (group C). The right lung tissue was used for transcriptome sequencing to screen the differentially expressed genes.KEGG pathway analysis was performed to screen the pathways with enriched differentially expressed genes. The animal model was established successfully.The pulmonary artery pressure was as follows: group C (28.6±3.0) mmHg(1 mmHg=0.133 kPa), group M1 (38.9±3.3) mmHg, group M2 (50.8±3.9) mmHg, group M3 (51.5±3.5) mmHg.The pressure elevated in group M1 compared with group C (P=0.007). The pressure in M2 and M3 elevated compared with M1(P=0.002 and Ppulmonary hypertension were epithelial specific receptor tyrosine kinase(Tie2) and thrombospondin-1(TSP-1). Tie2 was down-regulated (qpulmonary hypertension and up-regulated (qpulmonary hypertension.TSP-1 was up-regulated (qpulmonary hypertension and down-regulated (qpulmonary hypertension.In the stage of severe pulmonary hypertension, the differentially expressed genes were enriched mainly in the pathways of phosphatidylinostitol 3-kinase, focal adhesion kinase and extracellular matrix receptor interaction. The study provides transcriptome information of rat pulmonary hypertension model and normal rat.Possible mechanisms of pulmonary hypertension are found.These genes and pathways might be new precursor for the diagnosis and treatment of pulmonary hypertension.

  7. Transcriptome sequencing and characterization for the sea cucumber Apostichopus japonicus (Selenka, 1867.

    Directory of Open Access Journals (Sweden)

    Huixia Du

    Full Text Available BACKGROUND: Sea cucumbers are a special group of marine invertebrates. They occupy a taxonomic position that is believed to be important for understanding the origin and evolution of deuterostomes. Some of them such as Apostichopus japonicus represent commercially important aquaculture species in Asian countries. Many efforts have been devoted to increasing the number of expressed sequence tags (ESTs for A. japonicus, but a comprehensive characterization of its transcriptome remains lacking. Here, we performed the large-scale transcriptome profiling and characterization by pyrosequencing diverse cDNA libraries from A. japonicus. RESULTS: In total, 1,061,078 reads were obtained by 454 sequencing of eight cDNA libraries representing different developmental stages and adult tissues in A. japonicus. These reads were assembled into 29,666 isotigs, which were further clustered into 21,071 isogroups. Nearly 40% of the isogroups showed significant matches to known proteins based on sequence similarity. Gene ontology (GO and KEGG pathway analyses recovered diverse biological functions and processes. Candidate genes that were potentially involved in aestivation were identified. Transcriptome comparison with the sea urchin Strongylocentrotus purpuratus revealed similar patterns of GO term representation. In addition, 4,882 putative orthologous genes were identified, of which 202 were not present in the non-echinoderm organisms. More than 700 simple sequence repeats (SSRs and 54,000 single nucleotide polymorphisms (SNPs were detected in the A. japonicus transcriptome. CONCLUSION: Pyrosequencing was proven to be efficient in rapidly identifying a large set of genes for the sea cucumber A. japonicus. Through the large-scale transcriptome sequencing as well as public EST data integration, we performed a comprehensive characterization of the A. japonicus transcriptome and identified candidate aestivation-related genes. A large number of potential genetic

  8. SMRT sequencing of full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt).

    Science.gov (United States)

    Jia, Dong; Wang, Yuanxin; Liu, Yanhong; Hu, Jun; Guo, Yanqiong; Gao, Lingling; Ma, Ruiyan

    2018-02-02

    This study was aimed at generating the full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt) using single-molecule real-time (SMRT) sequencing. Four developmental stages of A. hygrophila, including eggs, larvae, pupae, and adults were harvested for isolating total RNA. The mixed samples were used for SMRT sequencing to generate the full-length transcriptome. Based on the obtained transcriptome data, alternative splicing event, simple sequence repeat (SSR) analysis, coding sequence prediction, transcript functional annotation, and lncRNA prediction were performed. Total 9.45 Gb of clean reads were generated, including 335,045 reads of insert (ROI) and 158,085 full-length non-chimeric (FLNC) reads. Transcript clustering analysis of FLNC reads identified 40,004 consensus isoforms, including 31,015 high-quality ones. After removing redundant reads, 28,982 transcripts were obtained. Total 145 alternative splicing events were predicted. Additionally, 12,753 SSRs and 16,205 coding sequences were identified based on SSR analysis. Furthermore, 24,031 transcripts were annotated in eight functional databases, and 4,198 lncRNAs were predicted. This is the first study to perform SMRT sequencing of the full-length transcriptome of A. hygrophila. The obtained transcriptome may facilitate further exploration of the genetic data of A. hygrophila and uncover the interactions between this insect and the ecosystem.

  9. Sequencing and characterization of the guppy (Poecilia reticulata transcriptome

    Directory of Open Access Journals (Sweden)

    Rodd F Helen

    2011-04-01

    Full Text Available Abstract Background Next-generation sequencing is providing researchers with a relatively fast and affordable option for developing genomic resources for organisms that are not among the traditional genetic models. Here we present a de novo assembly of the guppy (Poecilia reticulata transcriptome using 454 sequence reads, and we evaluate potential uses of this transcriptome, including detection of sex-specific transcripts and deployment as a reference for gene expression analysis in guppies and a related species. Guppies have been model organisms in ecology, evolutionary biology, and animal behaviour for over 100 years. An annotated transcriptome and other genomic tools will facilitate understanding the genetic and molecular bases of adaptation and variation in a vertebrate species with a uniquely well known natural history. Results We generated approximately 336 Mbp of mRNA sequence data from male brain, male body, female brain, and female body. The resulting 1,162,670 reads assembled into 54,921 contigs, creating a reference transcriptome for the guppy with an average read depth of 28×. We annotated nearly 40% of this reference transcriptome by searching protein and gene ontology databases. Using this annotated transcriptome database, we identified candidate genes of interest to the guppy research community, putative single nucleotide polymorphisms (SNPs, and male-specific expressed genes. We also showed that our reference transcriptome can be used for RNA-sequencing-based analysis of differential gene expression. We identified transcripts that, in juveniles, are regulated differently in the presence and absence of an important predator, Rivulus hartii, including two genes implicated in stress response. For each sample in the RNA-seq study, >50% of high-quality reads mapped to unique sequences in the reference database with high confidence. In addition, we evaluated the use of the guppy reference transcriptome for gene expression analyses in

  10. Sequencing and characterization of the guppy (Poecilia reticulata) transcriptome

    Science.gov (United States)

    2011-01-01

    Background Next-generation sequencing is providing researchers with a relatively fast and affordable option for developing genomic resources for organisms that are not among the traditional genetic models. Here we present a de novo assembly of the guppy (Poecilia reticulata) transcriptome using 454 sequence reads, and we evaluate potential uses of this transcriptome, including detection of sex-specific transcripts and deployment as a reference for gene expression analysis in guppies and a related species. Guppies have been model organisms in ecology, evolutionary biology, and animal behaviour for over 100 years. An annotated transcriptome and other genomic tools will facilitate understanding the genetic and molecular bases of adaptation and variation in a vertebrate species with a uniquely well known natural history. Results We generated approximately 336 Mbp of mRNA sequence data from male brain, male body, female brain, and female body. The resulting 1,162,670 reads assembled into 54,921 contigs, creating a reference transcriptome for the guppy with an average read depth of 28×. We annotated nearly 40% of this reference transcriptome by searching protein and gene ontology databases. Using this annotated transcriptome database, we identified candidate genes of interest to the guppy research community, putative single nucleotide polymorphisms (SNPs), and male-specific expressed genes. We also showed that our reference transcriptome can be used for RNA-sequencing-based analysis of differential gene expression. We identified transcripts that, in juveniles, are regulated differently in the presence and absence of an important predator, Rivulus hartii, including two genes implicated in stress response. For each sample in the RNA-seq study, >50% of high-quality reads mapped to unique sequences in the reference database with high confidence. In addition, we evaluated the use of the guppy reference transcriptome for gene expression analyses in a congeneric species, the

  11. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  12. Chloroplast microsatellite markers for Artocarpus (Moraceae) developed from transcriptome sequences.

    Science.gov (United States)

    Gardner, Elliot M; Laricchia, Kristen M; Murphy, Matthew; Ragone, Diane; Scheffler, Brian E; Simpson, Sheron; Williams, Evelyn W; Zerega, Nyree J C

    2015-09-01

    Chloroplast microsatellite loci were characterized from transcriptomes of Artocarpus altilis (breadfruit) and A. camansi (breadnut). They were tested in A. odoratissimus (terap) and A. altilis and evaluated in silico for two congeners. Fifteen simple sequence repeats (SSRs) were identified in chloroplast sequences from four Artocarpus transcriptome assemblies. The markers were evaluated using capillary electrophoresis in A. odoratissimus (105 accessions) and A. altilis (73). They were also evaluated in silico in A. altilis (10), A. camansi (6), and A. altilis × A. mariannensis (7) transcriptomes. All loci were polymorphic in at least one species, with all 15 polymorphic in A. camansi. Per species, average alleles per locus ranged between 2.2 and 2.5. Three loci had evidence of fragment-length homoplasy. These markers will complement existing nuclear markers by enabling confident identification of maternal and clone lines, which are often important in vegetatively propagated crops such as breadfruit.

  13. A deep sequencing analysis of transcriptomes and the development ...

    Indian Academy of Sciences (India)

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3–39.8 million paired-end reads with 73 bp lengths were generated. The pooled reads from the ...

  14. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  15. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Science.gov (United States)

    Liu, Hongliang; Wang, Tingting; Wang, Jinke; Quan, Fusheng; Zhang, Yong

    2013-01-01

    Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology. Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51%) unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17%) unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes. The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  16. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  17. Huntington's disease biomarker progression profile identified by transcriptome sequencing in peripheral blood

    NARCIS (Netherlands)

    Mastrokolias, A.; Ariyurek, Y.; Goeman, J.J.; Duijn, E. van; Roos, R.A.; Mast, R.C. van der; Ommen, G.B. van; Dunnen, J.T. den; Hoen, P.A.C. 't; Roon-Mom, W.M. van

    2015-01-01

    With several therapeutic approaches in development for Huntington's disease, there is a need for easily accessible biomarkers to monitor disease progression and therapy response. We performed next-generation sequencing-based transcriptome analysis of total RNA from peripheral blood of 91 mutation

  18. Transcriptome sequencing and comparative transcriptome analysis of the scleroglucan producer Sclerotium rolfsii

    Directory of Open Access Journals (Sweden)

    Stahl Ulf

    2010-05-01

    Full Text Available Abstract Background The plant pathogenic basidiomycete Sclerotium rolfsii produces the industrially exploited exopolysaccharide scleroglucan, a polymer that consists of (1 → 3-β-linked glucose with a (1 → 6-β-glycosyl branch on every third unit. Although the physicochemical properties of scleroglucan are well understood, almost nothing is known about the genetics of scleroglucan biosynthesis. Similarly, the biosynthetic pathway of oxalate, the main by-product during scleroglucan production, has not been elucidated yet. In order to provide a basis for genetic and metabolic engineering approaches, we studied scleroglucan and oxalate biosynthesis in S. rolfsii using different transcriptomic approaches. Results Two S. rolfsii transcriptomes obtained from scleroglucan-producing and scleroglucan-nonproducing conditions were pooled and sequenced using the 454 pyrosequencing technique yielding ~350,000 reads. These could be assembled into 21,937 contigs and 171,833 singletons, for which 6,951 had significant matches in public protein data bases. Sequence data were used to obtain first insights into the genomics of scleroglucan and oxalate production and to predict putative proteins involved in the synthesis of both metabolites. Using comparative transcriptomics, namely Agilent microarray hybridization and suppression subtractive hybridization, we identified ~800 unigenes which are differently expressed under scleroglucan-producing and non-producing conditions. From these, candidate genes were identified which could represent potential leads for targeted modification of the S. rolfsii metabolism for increased scleroglucan yields. Conclusions The results presented in this paper provide for the first time genomic and transcriptomic data about S. rolfsii and demonstrate the power and usefulness of combined transcriptome sequencing and comparative microarray analysis. The data obtained allowed us to predict the biosynthetic pathways of scleroglucan and

  19. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae

    Directory of Open Access Journals (Sweden)

    Samuel E. Fox

    2013-03-01

    Full Text Available Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis, 104,950 (Spain, and 177,682 (Greece transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs and >20,000 simple sequence repeat (SSR DNA sites. Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum.

  20. Nuclear RNA sequencing of the mouse erythroid cell transcriptome.

    Directory of Open Access Journals (Sweden)

    Jennifer A Mitchell

    Full Text Available In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.

  1. Nuclear RNA Sequencing of the Mouse Erythroid Cell Transcriptome

    Science.gov (United States)

    Umlauf, David; Chen, Chih-yu; Moir, Catherine A.; Eskiw, Christopher H.; Schoenfelder, Stefan; Chakalova, Lyubomira; Nagano, Takashi; Fraser, Peter

    2012-01-01

    In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq) in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq) of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A)-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs. PMID:23209567

  2. Strategies for Wheat Stripe Rust Pathogenicity Identified by Transcriptome Sequencing.

    Directory of Open Access Journals (Sweden)

    Diana P Garnica

    Full Text Available Stripe rust caused by the fungus Puccinia striiformis f.sp. tritici (Pst is a major constraint to wheat production worldwide. The molecular events that underlie Pst pathogenicity are largely unknown. Like all rusts, Pst creates a specialized cellular structure within host cells called the haustorium to obtain nutrients from wheat, and to secrete pathogenicity factors called effector proteins. We purified Pst haustoria and used next-generation sequencing platforms to assemble the haustorial transcriptome as well as the transcriptome of germinated spores. 12,282 transcripts were assembled from 454-pyrosequencing data and used as reference for digital gene expression analysis to compare the germinated uredinospores and haustoria transcriptomes based on Illumina RNAseq data. More than 400 genes encoding secreted proteins which constitute candidate effectors were identified from the haustorial transcriptome, with two thirds of these up-regulated in this tissue compared to germinated spores. RT-PCR analysis confirmed the expression patterns of 94 effector candidates. The analysis also revealed that spores rely mainly on stored energy reserves for growth and development, while haustoria take up host nutrients for massive energy production for biosynthetic pathways and the ultimate production of spores. Together, these studies substantially increase our knowledge of potential Pst effectors and provide new insights into the pathogenic strategies of this important organism.

  3. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  4. Comparative sequence analyses of genome and transcriptome ...

    Indian Academy of Sciences (India)

    2015-12-04

    . We performed ... world, belonging to the family Elephantidae and order. Proboscidea ... Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/.

  5. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  6. Characterization of common carp transcriptome: sequencing, de novo assembly, annotation and comparative genomics.

    Directory of Open Access Journals (Sweden)

    Peifeng Ji

    Full Text Available BACKGROUND: Common carp (Cyprinus carpio is one of the most important aquaculture species of Cyprinidae with an annual global production of 3.4 million tons, accounting for nearly 14% of the freshwater aquaculture production in the world. Due to the economical and ecological importance of common carp, genomic data are eagerly needed for genetic improvement purpose. However, there is still no sufficient transcriptome data available. The objective of the project is to sequence transcriptome deeply and provide well-assembled transcriptome sequences to common carp research community. RESULT: Transcriptome sequencing of common carp was performed using Roche 454 platform. A total of 1,418,591 clean ESTs were collected and assembled into 36,811 cDNA contigs, with average length of 888 bp and N50 length of 1,002 bp. Annotation was performed and a total of 19,165 unique proteins were identified from assembled contigs. Gene ontology and KEGG analysis were performed and classified all contigs into functional categories for understanding gene functions and regulation pathways. Open Reading Frames (ORFs were detected from 29,869 (81.1% contigs with an average ORF length of 763 bp. From these contigs, 9,625 full-length cDNAs were identified with sequence length from 201 bp to 9,956 bp. Comparative analysis revealed that 27,693(75.2% contigs have significant similarity to zebrafish Refseq proteins, and 24,371(66.2%, 24,501(66.5% and 25,025(70.0% to teraodon, medaka and three-spined stickleback refseq proteins. A total of 2,064 microsatellites were initially identified from 1,730 contigs, and 1,639 unique sequences had sufficient flanking sequences on both sides for primer design. CONCLUSION: The transcriptome of common carp had been deep sequenced, de novo assembled and characterized, providing the valuable resource for better understanding of common carp genome. The transcriptome data will facilitate future functional studies on common carp genome, and

  7. Using next generation transcriptome sequencing to predict an ectomycorrhizal metabolome

    Directory of Open Access Journals (Sweden)

    Cseke Leland J

    2011-05-01

    Full Text Available Abstract Background Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. Results We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. Conclusions The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.

  8. High-throughput sequencing of black pepper root transcriptome

    Science.gov (United States)

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  9. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  10. High-throughput sequencing of black pepper root transcriptome.

    Science.gov (United States)

    Gordo, Sheila M C; Pinheiro, Daniel G; Moreira, Edith C O; Rodrigues, Simone M; Poltronieri, Marli C; de Lemos, Oriel F; da Silva, Israel Tojal; Ramos, Rommel T J; Silva, Artur; Schneider, Horacio; Silva, Wilson A; Sampaio, Iracilda; Darnet, Sylvain

    2012-09-17

    Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host's root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant's root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  11. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing.

    Science.gov (United States)

    Cummings, Beryl B; Marshall, Jamie L; Tukiainen, Taru; Lek, Monkol; Donkervoort, Sandra; Foley, A Reghan; Bolduc, Veronique; Waddell, Leigh B; Sandaradura, Sarah A; O'Grady, Gina L; Estrella, Elicia; Reddy, Hemakumar M; Zhao, Fengmei; Weisburd, Ben; Karczewski, Konrad J; O'Donnell-Luria, Anne H; Birnbaum, Daniel; Sarkozy, Anna; Hu, Ying; Gonorazky, Hernan; Claeys, Kristl; Joshi, Himanshu; Bournazos, Adam; Oates, Emily C; Ghaoui, Roula; Davis, Mark R; Laing, Nigel G; Topf, Ana; Kang, Peter B; Beggs, Alan H; North, Kathryn N; Straub, Volker; Dowling, James J; Muntoni, Francesco; Clarke, Nigel F; Cooper, Sandra T; Bönnemann, Carsten G; MacArthur, Daniel G

    2017-04-19

    Exome and whole-genome sequencing are becoming increasingly routine approaches in Mendelian disease diagnosis. Despite their success, the current diagnostic rate for genomic analyses across a variety of rare diseases is approximately 25 to 50%. We explore the utility of transcriptome sequencing [RNA sequencing (RNA-seq)] as a complementary diagnostic tool in a cohort of 50 patients with genetically undiagnosed rare muscle disorders. We describe an integrated approach to analyze patient muscle RNA-seq, leveraging an analysis framework focused on the detection of transcript-level changes that are unique to the patient compared to more than 180 control skeletal muscle samples. We demonstrate the power of RNA-seq to validate candidate splice-disrupting mutations and to identify splice-altering variants in both exonic and deep intronic regions, yielding an overall diagnosis rate of 35%. We also report the discovery of a highly recurrent de novo intronic mutation in COL6A1 that results in a dominantly acting splice-gain event, disrupting the critical glycine repeat motif of the triple helical domain. We identify this pathogenic variant in a total of 27 genetically unsolved patients in an external collagen VI-like dystrophy cohort, thus explaining approximately 25% of patients clinically suggestive of having collagen VI dystrophy in whom prior genetic analysis is negative. Overall, this study represents a large systematic application of transcriptome sequencing to rare disease diagnosis and highlights its utility for the detection and interpretation of variants missed by current standard diagnostic approaches. Copyright © 2017, American Association for the Advancement of Science.

  12. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  13. Researches on Transcriptome Sequencing in the Study of Traditional Chinese Medicine

    Science.gov (United States)

    Xin, Jie; Zhang, Rong-chao; Wang, Lei

    2017-01-01

    Due to its incomparable advantages, the application of transcriptome sequencing in the study of traditional Chinese medicine attracts more and more attention of researchers, which greatly promote the development of traditional Chinese medicine. In this paper, the applications of transcriptome sequencing in traditional Chinese medicine were summarized by reviewing recent related papers. PMID:28900463

  14. Transcriptome sequencing and annotation for the Jamaican fruit bat (Artibeus jamaicensis.

    Directory of Open Access Journals (Sweden)

    Timothy I Shaw

    Full Text Available The Jamaican fruit bat (Artibeus jamaicensis is one of the most common bats in the tropical Americas. It is thought to be a potential reservoir host of Tacaribe virus, an arenavirus closely related to the South American hemorrhagic fever viruses. We performed transcriptome sequencing and annotation from lung, kidney and spleen tissues using 454 and Illumina platforms to develop this species as an animal model. More than 100,000 contigs were assembled, with 25,000 genes that were functionally annotated. Of the remaining unannotated contigs, 80% were found within bat genomes or transcriptomes. Annotated genes are involved in a broad range of activities ranging from cellular metabolism to genome regulation through ncRNAs. Reciprocal BLAST best hits yielded 8,785 sequences that are orthologous to mouse, rat, cattle, horse and human. Species tree analysis of sequences from 2,378 loci was used to achieve 95% bootstrap support for the placement of bat as sister to the clade containing horse, dog, and cattle. Through substitution rate estimation between bat and human, 32 genes were identified with evidence for positive selection. We also identified 466 immune-related genes, which may be useful for studying Tacaribe virus infection of this species. The Jamaican fruit bat transcriptome dataset is a resource that should provide additional candidate markers for studying bat evolution and ecology, and tools for analysis of the host response and pathology of disease.

  15. Transcriptome profiling of testis during sexual maturation stages in Eriocheir sinensis using Illumina sequencing.

    Directory of Open Access Journals (Sweden)

    Lin He

    Full Text Available The testis is a highly specialized tissue that plays dual roles in ensuring fertility by producing spermatozoa and hormones. Spermatogenesis is a complex process, resulting in the production of mature sperm from primordial germ cells. Significant structural and biochemical changes take place in the seminiferous epithelium of the adult testis during spermatogenesis. The gene expression pattern of testis in Chinese mitten crab (Eriocheir sinensis has not been extensively studied, and limited genetic research has been performed on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for testis of E. sinensis. In two runs, we produced 25,698,778 sequencing reads corresponding with 2.31 Gb total nucleotides. These reads were assembled into 342,753 contigs or 141,861 scaffold sequences, which identified 96,311 unigenes. Based on similarity searches with known proteins, 39,995 unigenes were annotated based on having a Blast hit in the non-redundant database or ESTscan results with a cut-off E-value above 10(-5. This is the first report of a mitten crab transcriptome using high-throughput sequencing technology, and all these testes transcripts can help us understand the molecular mechanisms involved in spermatogenesis and testis maturation.

  16. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  17. Transcriptome sequencing in prostate cancer identifies inter-tumor heterogeneity

    Directory of Open Access Journals (Sweden)

    Janet Mendonca

    2015-06-01

    Full Text Available Given the dearth of gene mutations in prostate cancer, [1] ,[2] it is likely that genomic rearrangements play a significant role in the evolution of prostate cancer. However, in the search for recurrent genomic alterations, "private alterations" have received less attention. Such alterations may provide insights into the evolution, behavior, and clinical outcome of an individual tumor. In a recent report in "Genome Biology" Wyatt et al. [3] defines unique alterations in a cohort of high-risk prostate cancer patient with a lethal phenotype. Utilizing a transcriptome sequencing approach they observe high inter-tumor heterogeneity; however, the genes altered distill into three distinct cancer-relevant pathways. Their analysis reveals the presence of several non-ETS fusions, which may contribute to the phenotype of individual tumors, and have significance for disease progression.

  18. Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture.

    Directory of Open Access Journals (Sweden)

    Alicia R Martin

    2014-08-01

    Full Text Available Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP. The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and

  19. Sequencing the transcriptome of milk production: milk trumps mammary tissue

    Science.gov (United States)

    2013-01-01

    Background Studies of normal human mammary gland development and function have mostly relied on cell culture, limited surgical specimens, and rodent models. Although RNA extracted from human milk has been used to assay the mammary transcriptome non-invasively, this assay has not been adequately validated in primates. Thus, the objectives of the current study were to assess the suitability of lactating rhesus macaques as a model for lactating humans and to determine whether RNA extracted from milk fractions is representative of RNA extracted from mammary tissue for the purpose of studying the transcriptome of milk-producing cells. Results We confirmed that macaque milk contains cytoplasmic crescents and that ample high-quality RNA can be obtained for sequencing. Using RNA sequencing, RNA extracted from macaque milk fat and milk cell fractions more accurately represented RNA from mammary epithelial cells (cells that produce milk) than did RNA from whole mammary tissue. Mammary epithelium-specific transcripts were more abundant in macaque milk fat, whereas adipose or stroma-specific transcripts were more abundant in mammary tissue. Functional analyses confirmed the validity of milk as a source of RNA from milk-producing mammary epithelial cells. Conclusions RNA extracted from the milk fat during lactation accurately portrayed the RNA profile of milk-producing mammary epithelial cells in a non-human primate. However, this sample type clearly requires protocols that minimize RNA degradation. Overall, we validated the use of RNA extracted from human and macaque milk and provided evidence to support the use of lactating macaques as a model for human lactation. PMID:24330573

  20. Sequencing and Characterization of Divergent Marbling Levels in the Beef Cattle ( Muscle Transcriptome

    Directory of Open Access Journals (Sweden)

    Dong Chen

    2015-02-01

    Full Text Available Marbling is an important trait regarding the quality of beef. Analysis of beef cattle transcriptome and its expression profile data are essential to extend the genetic information resources and would support further studies on beef cattle. RNA sequencing was performed in beef cattle using the Illumina High-Seq2000 platform. Approximately 251.58 million clean reads were generated from a high marbling (H group and low marbling (L group. Approximately 80.12% of the 19,994 bovine genes (protein coding were detected in all samples, and 749 genes exhibited differential expression between the H and L groups based on fold change (>1.5-fold, p<0.05. Multiple gene ontology terms and biological pathways were found significantly enriched among the differentially expressed genes. The transcriptome data will facilitate future functional studies on marbling formation in beef cattle and may be applied to improve breeding programs for cattle and closely related mammals.

  1. Transcriptome sequencing for SNP discovery across Cucumis melo

    Directory of Open Access Journals (Sweden)

    Blanca José

    2012-06-01

    Full Text Available Abstract Background Melon (Cucumis melo L. is a highly diverse species that is cultivated worldwide. Recent advances in massively parallel sequencing have begun to allow the study of nucleotide diversity in this species. The Sanger method combined with medium-throughput 454 technology were used in a previous study to analyze the genetic diversity of germplasm representing 3 botanical varieties, yielding a collection of about 40,000 SNPs distributed in 14,000 unigenes. However, the usefulness of this resource is limited as the sequenced genotypes do not represent the whole diversity of the species, which is divided into two subspecies with many botanical varieties variable in plant, flowering, and fruit traits, as well as in stress response. As a first step to extensively document levels and patterns of nucleotide variability across the species, we used the high-throughput SOLiD™ system to resequence the transcriptomes of a set of 67 genotypes that had previously been selected from a core collection representing the extant variation of the entire species. Results The deep transcriptome resequencing of all of the genotypes, grouped into 8 pools (wild African agrestis, Asian agrestis and acidulus, exotic Far Eastern conomon, Indian momordica and Asian dudaim and flexuosus, commercial cantalupensis, subsp. melo Asian and European landraces, Spanish inodorus landraces, and Piel de Sapo breeding lines yielded about 300 M reads. Short reads were mapped to the recently generated draft genome assembly of the DHL line Piel de Sapo (inodorus x Songwhan Charmi (conomon and to a new version of melon transcriptome. Regions with at least 6X coverage were used in SNV calling, generating a melon collection with 303,883 variants. These SNVs were dispersed across the entire C. melo genome, and distributed in 15,064 annotated genes. The number and variability of in silico SNVs differed considerably between pools. Our finding of higher genomic diversity in wild

  2. Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes

    Science.gov (United States)

    Bruskiewich, Richard; Burris, Jason N.; Carrigan, Charlotte T.; Chase, Mark W.; Clarke, Neil D.; Covshoff, Sarah; dePamphilis, Claude W.; Edger, Patrick P.; Goh, Falicia; Graham, Sean; Greiner, Stephan; Hibberd, Julian M.; Jordon-Thaden, Ingrid; Kutchan, Toni M.; Leebens-Mack, James; Melkonian, Michael; Miles, Nicholas; Myburg, Henrietta; Patterson, Jordan; Pires, J. Chris; Ralph, Paula; Rolf, Megan; Sage, Rowan F.; Soltis, Douglas; Soltis, Pamela; Stevenson, Dennis; Stewart, C. Neal; Surek, Barbara; Thomsen, Christina J. M.; Villarreal, Juan Carlos; Wu, Xiaolei; Zhang, Yong; Deyholos, Michael K.; Wong, Gane Ka-Shu

    2012-01-01

    Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers. PMID:23185583

  3. Transcriptome sequencing reveals high isoform diversity in the ant Formica exsecta

    Directory of Open Access Journals (Sweden)

    Kishor Dhaygude

    2017-11-01

    Full Text Available Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera. The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans, in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon.

  4. Deep RNA sequencing of the skeletal muscle transcriptome in swimming fish.

    Directory of Open Access Journals (Sweden)

    Arjan P Palstra

    Full Text Available Deep RNA sequencing (RNA-seq was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss with the specific objective to identify expressed genes and quantify the transcriptomic effects of swimming-induced exercise. Pubertal autumn-spawning seawater-raised female rainbow trout were rested (n = 10 or swum (n = 10 for 1176 km at 0.75 body-lengths per second in a 6,000-L swim-flume under reproductive conditions for 40 days. Red and white muscle RNA of exercised and non-exercised fish (4 lanes was sequenced and resulted in 15-17 million reads per lane that, after de novo assembly, yielded 149,159 red and 118,572 white muscle contigs. Most contigs were annotated using an iterative homology search strategy against salmonid ESTs, the zebrafish Danio rerio genome and general Metazoan genes. When selecting for large contigs (>500 nucleotides, a number of novel rainbow trout gene sequences were identified in this study: 1,085 and 1,228 novel gene sequences for red and white muscle, respectively, which included a number of important molecules for skeletal muscle function. Transcriptomic analysis revealed that sustained swimming increased transcriptional activity in skeletal muscle and specifically an up-regulation of genes involved in muscle growth and developmental processes in white muscle. The unique collection of transcripts will contribute to our understanding of red and white muscle physiology, specifically during the long-term reproductive migration of salmonids.

  5. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.).

    Science.gov (United States)

    Fu, Nan; Wang, Qian; Shen, Huo-Lin

    2013-01-01

    Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

  6. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L..

    Directory of Open Access Journals (Sweden)

    Nan Fu

    Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

  7. Expression divergence measured by transcriptome sequencing of four yeast species

    Directory of Open Access Journals (Sweden)

    Busby Michele A

    2011-12-01

    Full Text Available Abstract Background The evolution of gene expression is a challenging problem in evolutionary biology, for which accurate, well-calibrated measurements and methods are crucial. Results We quantified gene expression with whole-transcriptome sequencing in four diploid, prototrophic strains of Saccharomyces species grown under the same condition to investigate the evolution of gene expression. We found that variation in expression is gene-dependent with large variations in each gene's expression between replicates of the same species. This confounds the identification of genes differentially expressed across species. To address this, we developed a statistical approach to establish significance bounds for inter-species differential expression in RNA-Seq data based on the variance measured across biological replicates. This metric estimates the combined effects of technical and environmental variance, as well as Poisson sampling noise by isolating each component. Despite a paucity of large expression changes, we found a strong correlation between the variance of gene expression change and species divergence (R2 = 0.90. Conclusion We provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects. GEO Accession Number: GSE32679

  8. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress.

    Science.gov (United States)

    Xu, Yanjie; Gao, Shan; Yang, Yingjie; Huang, Mingyun; Cheng, Lina; Wei, Qian; Fei, Zhangjun; Gao, Junping; Hong, Bo

    2013-09-28

    Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars.

  9. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

    Science.gov (United States)

    2013-01-01

    Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255

  10. Transcriptome Sequencing and Analysis of the Fast Growing Shoots of Moso Bamboo (Phyllostachys edulis)

    Science.gov (United States)

    Zhang, Ying; Hu, Tao; Mu, Shaohua; Li, Xueping; Gao, Jian

    2013-01-01

    Background The moso bamboo, a large woody bamboo with the highest ecological, economic, and cultural value of all bamboos, has one of the highest growth speeds in the world. Genetic research into moso bamboo has been scarce, partly because of the lack of previous genomic resources. In the present study, for the first time, we performed de novo transcriptome sequencing and mapped to the moso bamboo genomic resources (reference genome and genes) to produce a comprehensive dataset for the fast growing shoots of moso bamboo. Results The fast growing shoots mixed with six different heights and culms after leaf expansion of moso bamboo transcriptome were sequenced using the Illumina HiSeq™ 2000 sequencing platform, respectively. More than 80 million reads including 65,045,670 and 68,431,884 clean reads were produced in the two libraries. More than 81% of the reads were matched to the reference genome, and nearly 50% of the reads were matched to the reference genes. The genes with log 2 ratio > 2 or plant hormones, cell cycle regulation, cell wall metabolism and cell morphogenesis genes were further analyzed and they may form a network that regulates the fast growth of moso bamboo shoots. Conclusion Firstly, our data provides the most comprehensive transcriptomic resource for moso bamboo to date. Candidate genes have been identified and they are potentially involved in the growth and development of moso bamboo. The results give a better insight into the mechanisms of moso bamboo shoots rapid growth and provide gene resources for improving plant growth. PMID:24244391

  11. Transcriptome sequencing and identification of cold tolerance genes in hardy Corylus species (C. heterophylla Fisch floral buds.

    Directory of Open Access Journals (Sweden)

    Xin Chen

    Full Text Available BACKGROUND: The genus Corylus is an important woody species in Northeast China. Its products, hazelnuts, constitute one of the most important raw materials for the pastry and chocolate industry. However, limited genetic research has focused on Corylus because of the lack of genomic resources. The advent of high-throughput sequencing technologies provides a turning point for Corylus research. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive database for the Corylus heterophylla Fisch floral buds. RESULTS: The C. heterophylla Fisch floral buds transcriptome was sequenced using the Illumina paired-end sequencing technology. We produced 28,930,890 raw reads and assembled them into 82,684 contigs. A total of 40,941 unigenes were identified, among which 30,549 were annotated in the NCBI Non-redundant (Nr protein database and 18,581 were annotated in the Swiss-Prot database. Of these annotated unigenes, 25,311 and 10,514 unigenes were assigned to gene ontology (GO categories and clusters of orthologous groups (COG, respectively. We could map 17,207 unigenes onto 128 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database. Additionally, based on the transcriptome, we constructed a candidate cold tolerance gene set of C. heterophylla Fisch floral buds. The expression patterns of selected genes during four stages of cold acclimation suggested that these genes might be involved in different cold responsive stages in C. heterophylla Fisch floral buds. CONCLUSION: The transcriptome of C. heterophylla Fisch floral buds was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the C. heterophylla Fisch floral buds transcriptome. Candidate genes potentially involved in cold tolerance were identified, providing a material basis for future molecular mechanism analysis of C. heterophylla Fisch floral buds tolerant to cold stress.

  12. De novo sequencing and a comprehensive analysis of purple sweet potato (Impomoea batatas L.) transcriptome.

    Science.gov (United States)

    Xie, Fuliang; Burklew, Caitlin E; Yang, Yanfang; Liu, Min; Xiao, Peng; Zhang, Baohong; Qiu, Deyou

    2012-07-01

    High-throughput RNA sequencing was performed for comprehensively analyzing the transcriptome of the purple sweet potato. A total of 58,800 unigenes were obtained and ranged from 200 nt to 10,380 nt with an average length of 476 nt. The average expression of one unigene was 34 reads per kb per million reads (RPKM) with a maximum expression of 1,935 RPKM. At least 40,280 (68.5%) unigenes were identified to be protein-coding genes, in which 11,978 and 5,184 genes were homologous to Arabidopsis and rice proteins, respectively. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis showed that 19,707 (33.5%) unigenes were classified to 1,807 terms of GO including molecular functions, biological processes, and cellular components and 9,970 (17.0%) unigenes were enriched to 11,119 KEGG pathways. We found that at least 3,553 genes may be involved in the biosynthesis pathways of starch, alkaloids, anthocyanin pigments, and vitamins. Additionally, 851 potential simple sequence repeats (SSRs) were identified in all unigenes. Transcriptome sequencing on tuberous roots of the sweet potato yielded substantial transcriptional sequences and potentially useful SSR markers which provide an important data source for sweet potato research. Comparison of two RNA-sequence datasets from the purple and the yellow sweet potato showed that UDP-glucose-flavonoid 3-O-glucosyltransferase was one of the key enzymes in the pathway of anthocyanin biosynthesis and that anthocyanin-3-glucoside might be one of the major components for anthocyanin pigments in the purple sweet potato. This study contributes to the molecular mechanisms of sweet potato development and metabolism and therefore that increases the potential utilization of the sweet potato in food nutrition and pharmacy.

  13. Transcriptome analysis of Emiliania huxleyi cells grown under different conditions using high-throughput sequencing data

    Science.gov (United States)

    Andreson, R.; Anlauf, H.; Mackinder, L.; Iglesias-Rodriguez, D.; LaRoche, J.; Lenhard, B.

    2012-04-01

    Coccolithophores are ideal for studying genes responsible for biomineralization processes due to relatively small genome sizes, ability to grow in culture, and as a natural model system for measuring expression of calcification-related genes in two life stages. As the Emiliania huxleyi has several annotated calcification-related proteins, we have concentrated on analyzing its genes and promoter areas. Many recent studies have focused primarily on transcriptome analysis of E. huxleyi using nutrient-limited conditions to get more information about up-regulated genes involved in biomineralization and calcification processes. Although there are more than 100,000 EST sequences for E. huxleyi available from these projects in public databases, that data is often insufficient to identify the exact position of transcription start site (TSS) to perform precise analysis (nucleotide content, motif search) of core promoters and regulatory mechanisms in immediate flanking areas. ESTs are not ideal for these kinds of analyses because the standard technologies of producing 5' EST libraries do not guarantee that the exact 5' end of the transcript will be captured. To determine the extent and accurate positions of 5' ends of transcripts and therefore the positions of core promoters, Cap analysis of gene expression (CAGE) sequencing method was used for sequencing RNA of E. huxleyi in both stages, calcifying and non-calcifying. As an additional info, gene expression levels of RNA for 21 samples were retrieved with whole transcriptome shotgun sequencing (RNA-Seq). The collections of reads these methods produced were used to map and annotate genes on several samples and measure the RNA expression levels in different conditions. Although there are not much data available for close organisms, it is possible to compare these results with other species to find conserved regulatory mechanisms between genes related to calcification. Visualization tools allowing browsing of annotated genes

  14. Characterization of the floral transcriptome of Moso bamboo (Phyllostachys edulis at different flowering developmental stages by transcriptome sequencing and RNA-seq analysis.

    Directory of Open Access Journals (Sweden)

    Jian Gao

    Full Text Available BACKGROUND: As an arborescent and perennial plant, Moso bamboo (Phyllostachys edulis (Carrière J. Houzeau, synonym Phyllostachys heterocycla Carrière is characterized by its infrequent sexual reproduction with flowering intervals ranging from several to more than a hundred years. However, little bamboo genomic research has been conducted on this due to a variety of reasons. Here, for the first time, we investigated the transcriptome of developing flowers in Moso bamboo by using high-throughput Illumina GAII sequencing and mapping short reads to the Moso bamboo genome and reference genes. We performed RNA-seq analysis on four important stages of flower development, and obtained extensive gene and transcript abundance data for the floral transcriptome of this key bamboo species. RESULTS: We constructed a cDNA library using equal amounts of RNA from Moso bamboo leaf samples from non-flowering plants (CK and mixed flower samples (F of four flower development stages. We generated more than 67 million reads from each of the CK and F samples. About 70% of the reads could be uniquely mapped to the Moso bamboo genome and the reference genes. Genes detected at each stage were categorized to putative functional categories based on their expression patterns. The analysis of RNA-seq data of bamboo flowering tissues at different developmental stages reveals key gene expression properties during the flower development of bamboo. CONCLUSION: We showed that a combination of transcriptome sequencing and RNA-seq analysis was a powerful approach to identifying candidate genes related to floral transition and flower development in bamboo species. The results give a better insight into the mechanisms of Moso bamboo flowering and ageing. This transcriptomic data also provides an important gene resource for improving breeding for Moso bamboo.

  15. Comparative analysis of transcriptomes in aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing

    Directory of Open Access Journals (Sweden)

    Taketo Okada

    2016-12-01

    Full Text Available Ephedra plants are taxonomically classified as gymnosperms, and are medicinally important as the botanical origin of crude drugs and as bioresources that contain pharmacologically active chemicals. Here we show a comparative analysis of the transcriptomes of aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing by RNA-Seq. De novo assembly of short cDNA sequence reads generated 23,358, 13,373, and 28,579 contigs longer than 200 bases from aerial stems, roots, or both aerial stems and roots, respectively. The presumed functions encoded by these contig sequences were annotated by BLAST (blastx. Subsequently, these contigs were classified based on gene ontology slims, Enzyme Commission numbers, and the InterPro database. Furthermore, comparative gene expression analysis was performed between aerial stems and roots. These transcriptome analyses revealed differences and similarities between the transcriptomes of aerial stems and roots in E. sinica. Deep transcriptome sequencing of Ephedra should open the door to molecular biological studies based on the entire transcriptome, tissue- or organ-specific transcriptomes, or targeted genes of interest.

  16. Transcriptome sequencing reveals the roles of transcription factors in modulating genotype by nitrogen interaction in maize.

    Science.gov (United States)

    Chen, Qiuyue; Liu, Zhipeng; Wang, Baobao; Wang, Xufeng; Lai, Jinsheng; Tian, Feng

    2015-10-01

    Global transcriptome analysis in maize revealed differential nitrogen response between genotypes and implicate a crucial role of transcription factors in driving genotype by nitrogen interactions at gene expression level. Developing nitrogen-efficient cultivars are essential for sustainable and productive agriculture. Nitrogen use efficiency of plants is highly dependent on the interaction of environmental and genetic variation and results in adaptive phenotypes. This study used transcriptome sequencing to perform a comprehensive genotype by nitrogen (G × N) interaction analysis for two elite Chinese maize inbreds grown at normal and low nitrogen levels in field conditions. We demonstrated that the two maize inbreds showed contrasting agronomic and transcriptomic responses to changes in nitrogen availability. A total of 96 genes with a significant G × N interaction were detected. After characterizing the expression patterns of G × N interaction genes, we found that the G × N interaction genes tended to show condition-specific differential expression. The functional annotations of G × N interaction genes revealed that many different kinds of genes were involved in G × N interactions, but a significant enrichment for transcription factors was detected, particularly the AP2/EREBP and WRKY family, suggesting that transcription factors might play important roles in driving G × N interaction at gene expression level for nitrogen response in maize. Taken together, these results not only provide novel insights into the mechanism of nitrogen response in maize and set important basis for further characterization but also have important implications for other genotype by stress interaction.

  17. Transcriptome Sequencing of Gynostemma pentaphyllum to Identify Genes and Enzymes Involved in Triterpenoid Biosynthesis

    Directory of Open Access Journals (Sweden)

    Qicong Chen

    2016-01-01

    Full Text Available G. pentaphyllum (Gynostemma pentaphyllum, a creeping herbaceous perennial with many important medicinal properties, is widely distributed in Asia. Gypenosides (triterpenoid saponins, the main effective components of G. pentaphyllum, are well studied. FPS (farnesyl pyrophosphate synthase, SS (squalene synthase, and SE (squalene epoxidase are the main enzymes involved in the synthesis of triterpenoid saponins. Considering the important medicinal functions of G. pentaphyllum, it is necessary to investigate the transcriptomic information of G. pentaphyllum to facilitate future studies of transcriptional regulation. After sequencing G. pentaphyllum, we obtained 50,654,708 unigenes. Next, we used RPKM (reads per kilobases per million reads to calculate expression of the unigenes and we performed comparison of our data to that contained in five common databases to annotate different aspects of the unigenes. Finally, we noticed that FPS, SS, and SE showed differential expression of enzymes in DESeq. Leaves showed the highest expression of FPS, SS, and SE relative to the other two tissues. Our research provides transcriptomic information of G. pentaphyllum in its natural environment and we found consistency in unigene expression, enzymes expression (FPS, SS, and SE, and the distribution of gypenosides content in G. pentaphyllum. Our results will enable future related studies of G. pentaphyllum.

  18. Huntington's disease biomarker progression profile identified by transcriptome sequencing in peripheral blood.

    Science.gov (United States)

    Mastrokolias, Anastasios; Ariyurek, Yavuz; Goeman, Jelle J; van Duijn, Erik; Roos, Raymund A C; van der Mast, Roos C; van Ommen, GertJan B; den Dunnen, Johan T; 't Hoen, Peter A C; van Roon-Mom, Willeke M C

    2015-10-01

    With several therapeutic approaches in development for Huntington's disease, there is a need for easily accessible biomarkers to monitor disease progression and therapy response. We performed next-generation sequencing-based transcriptome analysis of total RNA from peripheral blood of 91 mutation carriers (27 presymptomatic and, 64 symptomatic) and 33 controls. Transcriptome analysis by DeepSAGE identified 167 genes significantly associated with clinical total motor score in Huntington's disease patients. Relative to previous studies, this yielded novel genes and confirmed previously identified genes, such as H2AFY, an overlap in results that has proven difficult in the past. Pathway analysis showed enrichment of genes of the immune system and target genes of miRNAs, which are downregulated in Huntington's disease models. Using a highly parallelized microfluidics array chip (Fluidigm), we validated 12 of the top 20 significant genes in our discovery cohort and 7 in a second independent cohort. The five genes (PROK2, ZNF238, AQP9, CYSTM1 and ANXA3) that were validated independently in both cohorts present a candidate biomarker panel for stage determination and therapeutic readout in Huntington's disease. Finally we suggest a first empiric formula predicting total motor score from the expression levels of our biomarker panel. Our data support the view that peripheral blood is a useful source to identify biomarkers for Huntington's disease and monitor disease progression in future clinical trials.

  19. Development of molecular resources for an intertidal clam, Sinonovacula constricta, using 454 transcriptome sequencing.

    Directory of Open Access Journals (Sweden)

    Donghong Niu

    Full Text Available BACKGROUND: The razor clam Sinonovacula constricta is a benthic intertidal bivalve species with important commercial value. Despite its economic importance, knowledge of its transcriptome is scarce. Next generation sequencing technologies offer rapid and efficient tools for generating large numbers of sequences, which can be used to characterize the transcriptome, to develop effective molecular markers and to identify genes associated with growth, a key breeding trait. RESULTS: Total RNA was isolated from the mantle, gill, liver, siphon, gonad and muscular foot tissues. High-throughput deep sequencing of S. constricta using 454 pyrosequencing technology yielded 859,313 high-quality reads with an average read length of 489 bp. Clustering and assembly of these reads produced 16,323 contigs and 131,346 singletons with average lengths of 1,376 bp and 458 bp, respectively. Based on transcriptome sequencing, 14,615 sequences had significant matches with known genes encoding 147,669 predicted proteins. Subsequently, previously unknown growth-related genes were identified. A total of 13,563 microsatellites (SSRs and 13,634 high-confidence single nucleotide polymorphism loci (SNPs were discovered, of which almost half were validated. CONCLUSION: De novo sequencing of the razor clam S. constricta transcriptome on the 454 GS FLX platform generated a large number of ESTs. Candidate growth factors and a large number of SSRs and SNPs were identified. These results will impact genetic studies of S. constricta.

  20. Combining Next-Generation Sequencing and Microarray Technology into a Transcriptomics Approach for the Non-Model Organism Chironomus riparius

    Science.gov (United States)

    Marinković, Marino; de Leeuw, Wim C.; de Jong, Mark; Kraak, Michiel H. S.; Admiraal, Wim; Breit, Timo M.; Jonker, Martijs J.

    2012-01-01

    Whole-transcriptome gene-expression analyses are commonly performed in species that have a sequenced genome and for which microarrays are commercially available. To do such analyses in species with no or limited genome data, i.e. non-model organisms, necessary transcriptomics resources, i.e. an annotated transcriptome and a validated gene-expression microarray, must first be developed. The aim of the present study was to establish an advanced approach for developing transcriptomics resources for non-model organisms by combining next-generation sequencing (NGS) and microarray technology. We applied our approach to the non-biting midge Chironomus riparius, an ecologically relevant species that is widely used in sediment ecotoxicity testing. We sampled extensively covering all C. riparius developmental stages as well as toxicant exposed larvae and obtained from a normalized cDNA library 1.5 M NGS reads totalling 501 Mbp. Using the NGS data we developed transcriptomics resources in several steps. First, we designed 844 k probes directly on the NGS reads, as well as 76 k probes targeting expressed sequence tags of related species. These probes were tested for their affinity to C. riparius DNA and mRNA, by performing two biological experiments with a 1 M probe-selection microarray that contained the entire probe-library. Subsequently, the 1.5 M NGS reads were assembled into 23,709 isotigs and 135,082 singletons, which were associated to ∼55 k, respectively, ∼61 k gene ontology terms and which corresponded together to 22,593 unique protein accessions. An algorithm was developed that took the assembly and the probe affinities to DNA and mRNA into account, what resulted in 59 k highly-reliable probes that targeted uniquely 95% of the isotigs and 18% of the singletons. Concluding, our approach allowed the development of high-quality transcriptomics resources for C. riparius, and is applicable to any non-model organism. It is expected, that these resources will advance

  1. Haematobia irritans dataset of raw sequence reads from Illumina-based transcriptome sequencing of specific tissues and life stages

    Science.gov (United States)

    Illumina HiSeq technology was used to sequence the transcriptome from various dissected tissues and life stages from the horn fly, Haematobia irritans. These samples include eggs (0, 2, 4, and 9 hours post-oviposition), adult fly gut, adult fly legs, adult fly malpighian tubule, adult fly ovary, adu...

  2. De novo transcriptome assembly of Zanthoxylum bungeanum using Illumina sequencing for evolutionary analysis and simple sequence repeat marker development

    OpenAIRE

    Feng, Shijing; Zhao, Lili; Liu, Zhenshan; Liu, Yulin; Yang, Tuxi; Wei, Anzhi

    2017-01-01

    Zanthoxylum, an ancient economic crop in Asia, has a satisfying aromatic taste and immense medicinal values. A lack of genomic information and genetic markers has limited the evolutionary analysis and genetic improvement of Zanthoxylum species and their close relatives. To better understand the evolution, domestication, and divergence of Zanthoxylum, we present a de novo transcriptome analysis of an elite cultivar of Z. bungeanum using Illumina sequencing; we then developed simple sequence re...

  3. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome.

    Science.gov (United States)

    Weisberg, Alexandra J; Kim, Gunjune; Westwood, James H; Jelesko, John G

    2017-11-10

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is "leaves of three, let it be", which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species.

  4. Transcriptome

    Science.gov (United States)

    ... Links for Patient Care Education All About the Human Genome Project Fact Sheets Genetic Education Resources for Teachers Genomic ... for a newly found gene's function. The National Human Genome Research ... two projects that created transcriptome resources for researchers around the ...

  5. De novo sequence assembly and characterization of Lycoris aurea transcriptome using GS FLX titanium platform of 454 pyrosequencing.

    Directory of Open Access Journals (Sweden)

    Ren Wang

    Full Text Available BACKGROUND: Lycoris aurea, also called Golden Magic Lily, is an ornamentally and medicinally important species of the Amaryllidaceae family. To date, the sequencing of its whole genome is unavailable as a non-model organism. Transcriptomic information is also scarce for this species. In this study, we performed de novo transcriptome sequencing to produce the first comprehensive expressed sequence tag (EST dataset for L. aurea using high-throughput sequencing technology. METHODOLOGY AND PRINCIPAL FINDINGS: Total RNA was isolated from leaves with sodium nitroprusside (SNP, salicylic acid (SA, or methyl jasmonate (MeJA treatment, stems, and flowers at the bud, blooming, and wilting stages. Equal quantities of RNA from each tissue and stage were pooled to construct a cDNA library. Using 454 pyrosequencing technology, a total of 937,990 high quality reads (308.63 Mb with an average read length of 329 bp were generated. Clustering and assembly of these reads produced a non-redundant set of 141,111 unique sequences, comprising 24,604 contigs and 116,507 singletons. All of the unique sequences were involved in the biological process, cellular component and molecular function categories by GO analysis. Potential genes and their functions were predicted by KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literatures, many putative genes involved in Amaryllidaceae alkaloids synthesis, including PAL, TYDC OMT, NMT, P450, and other potentially important candidate genes, were identified for the first time in this Lycoris. Furthermore, 6,386 SSRs and 18,107 high-confidence SNPs were identified in this EST dataset. CONCLUSIONS: The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in L. aurea. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will provide useful

  6. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

    Science.gov (United States)

    Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  7. De-novo transcriptome sequencing of a normalized cDNA pool from influenza infected ferrets.

    Directory of Open Access Journals (Sweden)

    Jeremy V Camp

    Full Text Available The ferret is commonly used as a model for studies of infectious diseases. The genomic sequence of this animal model is not yet characterized, and only a limited number of fully annotated cDNAs are currently available in GenBank. The majority of genes involved in innate or adaptive immune response are still lacking, restricting molecular genetic analysis of host response in the ferret model. To enable de novo identification of transcriptionally active ferret genes in response to infection, we performed de-novo transcriptome sequencing of animals infected with H1N1 A/California/07/2009. We also included splenocytes induced with bacterial lipopolysaccharide to allow for identification of transcripts specifically induced by gram-negative bacteria. We pooled and normalized the cDNA library in order to delimit the risk of sequencing only highly expressed genes. While normalization of the cDNA library removes the possibility of assessing expression changes between individual animals, it has been shown to increase identification of low abundant transcripts. In this study, we identified more than 19,000 partial ferret transcripts, including more than 1000 gene orthologs known to be involved in the innate and the adaptive immune response.

  8. The first Illumina-based de novo transcriptome sequencing and analysis of safflower flowers.

    Directory of Open Access Journals (Sweden)

    Huang Lulin

    Full Text Available BACKGROUND: The safflower, Carthamus tinctorius L., is a worldwide oil crop, and its flowers, which have a high flavonoid content, are an important medicinal resource against cardiovascular disease in traditional medicine. Because the safflower has a large and complex genome, the development of its genomic resources has been delayed. Second-generation Illumina sequencing is now an efficient route for generating an enormous volume of sequences that can represent a large number of genes and their expression levels. METHODOLOGY/PRINCIPAL FINDINGS: To investigate the genes and pathways that might control flavonoids and other secondary metabolites in the safflower, we used Illumina sequencing to perform a de novo assembly of the safflower tubular flower tissue transcriptome. We obtained a total of 4.69 Gb in clean nucleotides comprising 52,119,104 clean sequencing reads, 195,320 contigs, and 120,778 unigenes. Based on similarity searches with known proteins, we annotated 70,342 of the unigenes (about 58% of the identified unigenes with cut-off E-values of 10(-5. In total, 21,943 of the safflower unigenes were found to have COG classifications, and BLAST2GO assigned 26,332 of the unigenes to 1,754 GO term annotations. In addition, we assigned 30,203 of the unigenes to 121 KEGG pathways. When we focused on genes identified as contributing to flavonoid biosynthesis and the biosynthesis of unsaturated fatty acids, which are important pathways that control flower and seed quality, respectively, we found that these genes were fairly well conserved in the safflower genome compared to those of other plants. CONCLUSIONS/SIGNIFICANCE: Our study provides abundant genomic data for Carthamus tinctorius L. and offers comprehensive sequence resources for studying the safflower. We believe that these transcriptome datasets will serve as an important public information platform to accelerate studies of the safflower genome, and may help us define the mechanisms of

  9. Comparative transcriptome analysis in Sclerotinia sclerotiorum and S. trifoliorum by 454 Titanium RNA sequencing

    Science.gov (United States)

    Sclerotinia sclerotiorum and S. trifoliorum are two closely related devastating plant pathogens. Extensive research has been conducted on S. sclerotiorum and its genome sequences are available. To take advantages of the genomic information of S. sclerotiorum, we compared the transcriptome of S. tr...

  10. De novo assembly and characterization of the global transcriptome for Rhyacionia leptotubula using Illumina paired-end sequencing.

    Directory of Open Access Journals (Sweden)

    Jia-Ying Zhu

    Full Text Available BACKGROUND: The pine tip moth, Rhyacionia leptotubula (Lepidoptera: Tortricidae is one of the most destructive forestry pests in Yunnan Province, China. Despite its importance, less is known regarding all aspects of this pest. Understanding the genetic information of it is essential for exploring the specific traits at the molecular level. Thus, we here sequenced the transcriptome of R. leptotubula with high-throughput Illumina sequencing. METHODOLOGY/PRINCIPAL FINDINGS: In a single run, more than 60 million sequencing reads were generated. De novo assembling was performed to generate a collection of 46,910 unigenes with mean length of 642 bp. Based on Blastx search with an E-value cut-off of 10(-5, 22,581 unigenes showed significant similarities to known proteins from National Center for Biotechnology Information (NCBI non-redundant (Nr protein database. Of these annotated unigenes, 10,360, 6,937 and 13,894 were assigned to Gene Ontology (GO, Clusters of Orthologous Group (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases, respectively. A total of 5,926 unigenes were annotated with domain similarity derived functional information, of which 55 and 39 unigenes respectively encoding the insecticide resistance related enzymes, cytochrome P450 and carboxylesterase. Using the transcriptome data, 47 unigenes belonging to the typical "stress" genes of heat shock protein (Hsp family were retrieved. Furthermore, 1,450 simple sequence repeats (SSRs were detected; 3.09% of the unigenes contained SSRs. Large numbers of SSR primer pairs were designed and out of randomly verified primer pairs 80% were successfully yielded amplicons. CONCLUSIONS/SIGNIFICANCE: A large of putative R. leptotubula transcript sequences has been obtained from the deep sequencing, which extensively increases the comprehensive and integrated genomic resources of this pest. This large-scale transcriptome dataset will be an important information platform for promoting our

  11. Whole-exome and transcriptome sequencing of refractory diffuse large B-cell lymphoma

    OpenAIRE

    Park, Ha Young; Lee, Seung-Bok; Yoo, Hae-Yong; Kim, Seok-Jin; Kim, Won-Seog; Kim, Jong-Il; Ko, Young-Hyeh

    2016-01-01

    Diffuse large B-cell lymphoma (DLBCL) is the most common type of non-Hodgkin lymphoma. Although rituximab therapy improves clinical outcome, some patients develop resistant DLBCL; however, the genetic alterations in these patients are not well documented. To identify the genetic background of refractory DLBCL, we conducted whole-exome sequencing and transcriptome sequencing for six patients with refractory and seven with responsive DLBCL. The average numbers of pathogenic somatic single nucle...

  12. Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus transcriptome.

    Directory of Open Access Journals (Sweden)

    J Joe Hull

    Full Text Available Mirid plant bugs (Hemiptera: Miridae are economically important insect pests of many crops worldwide. The western tarnished plant bug Lygus hesperus Knight is a pest of cotton, alfalfa, fruit and vegetable crops, and potentially of several emerging biofuel and natural product feedstocks in the western US. However, little is known about the underlying molecular genetics, biochemistry, or physiology of L. hesperus, including their ability to survive extreme environmental conditions.We used 454 pyrosequencing of a normalized adult cDNA library and de novo assembly to obtain an adult L. hesperus transcriptome consisting of 1,429,818 transcriptomic reads representing 36,131 transcript isoforms (isotigs that correspond to 19,742 genes. A search of the transcriptome against deposited L. hesperus protein sequences revealed that 86 out of 87 were represented. Comparison with the non-redundant database indicated that 54% of the transcriptome exhibited similarity (e-value ≤ 1(-5 with known proteins. In addition, Gene Ontology (GO terms, Kyoto Encyclopedia of Genes and Genomes (KEGG annotations, and potential Pfam domains were assigned to each transcript isoform. To gain insight into the molecular basis of the L. hesperus thermal stress response we used transcriptomic sequences to identify 52 potential heat shock protein (Hsp homologs. A subset of these transcripts was sequence verified and their expression response to thermal stress monitored by semi-quantitative PCR. Potential homologs of Hsp70, Hsp40, and 2 small Hsps were found to be upregulated in the heat-challenged adults, suggesting a role in thermotolerance.The L. hesperus transcriptome advances the underlying molecular understanding of this arthropod pest by significantly increasing the number of known genes, and provides the basis for further exploration and understanding of the fundamental mechanisms of abiotic stress responses.

  13. Transcriptome analysis of Dastarcus helophoroides (Coleoptera: Bothrideridae using Illumina HiSeq sequencing.

    Directory of Open Access Journals (Sweden)

    Wei Zhang

    Full Text Available BACKGROUND: Dastarcus helophoroides is known as the most valuable natural enemy insect against many large-body longhorned beetles. The molecular mechanism of its long lifespan and reproduction makes it a unique resource for genomic research. However, molecular biological studies on this parasitic beetle are scarce, and genomic information for D. helophoroides is not currently available. Thus, transcriptome information for this species is an important resource that is required for a better understanding of the molecular mechanisms of D. helophoroides. In this study, we obtained transcriptome information of D. helophoroides using high-throughput RNA sequencing. RESULTS: Using Illumina HiSeq 2000 sequencing, 27,543,746 clean reads corresponding to a total of 2.48 Gb nucleotides were obtained from a single run. These reads were assembled into 42,810 unigenes with a mean length of 683 bp. Using a sequence similarity search against the five public databases (NR, Swiss-Prot, GO, COG, KEGG with a cut-off E-value of 10(-5 using Blastx, a total of 31,293 unigenes were annotated with gene description, gene ontology terms, or metabolic pathways. CONCLUSIONS: To the best of our knowledge, this is the first study on the transcriptome information of D. helophoroides. The transcriptome data presented in this study provide comprehensive information for future studies in D. helophoroides, particularly for functional genomic studies in this parasitic beetle.

  14. A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity.

    Science.gov (United States)

    Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter

    2014-09-11

    The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic

  15. De novo Sequencing and Analysis of Lemongrass Transcriptome Provides First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    Directory of Open Access Journals (Sweden)

    Seema Meena

    2016-07-01

    Full Text Available Aromatic grasses of the genus Cymbopogon (Poaceae family represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavour, fragrance, cosmetic and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step towards understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases (TPS, pyrophosphatases (PPase, alcohol dehydrogenases (ADH, aldo-keto reductases (AKR, carotenoid cleavage dioxygenases (CCD, alcohol acetyltransferases (AAT and aldehyde dehydrogenases (ALDH, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes and acetates. Molecular modeling and docking further supported the role of identified enzymes in aroma formation in Cymbopogon. Also, simple sequence repeats (SSRs were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  16. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing.

    Science.gov (United States)

    Shen, Yingjia; Venu, R C; Nobuta, Kan; Wu, Xiaohui; Notibala, Varun; Demirci, Caghan; Meyers, Blake C; Wang, Guo-Liang; Ji, Guoli; Li, Qingshun Q

    2011-09-01

    Polyadenylation sites mark the ends of mRNA transcripts. Alternative polyadenylation (APA) may alter sequence elements and/or the coding capacity of transcripts, a mechanism that has been demonstrated to regulate gene expression and transcriptome diversity. To study the role of APA in transcriptome dynamics, we analyzed a large-scale data set of RNA "tags" that signify poly(A) sites and expression levels of mRNA. These tags were derived from a wide range of tissues and developmental stages that were mutated or exposed to environmental treatments, and generated using digital gene expression (DGE)-based protocols of the massively parallel signature sequencing (MPSS-DGE) and the Illumina sequencing-by-synthesis (SBS-DGE) sequencing platforms. The data offer a global view of APA and how it contributes to transcriptome dynamics. Upon analysis of these data, we found that ∼60% of Arabidopsis genes have multiple poly(A) sites. Likewise, ∼47% and 82% of rice genes use APA, supported by MPSS-DGE and SBS-DGE tags, respectively. In both species, ∼49%-66% of APA events were mapped upstream of annotated stop codons. Interestingly, 10% of the transcriptomes are made up of APA transcripts that are differentially distributed among developmental stages and in tissues responding to environmental stresses, providing an additional level of transcriptome dynamics. Examples of pollen-specific APA switching and salicylic acid treatment-specific APA clearly demonstrated such dynamics. The significance of these APAs is more evident in the 3034 genes that have conserved APA events between rice and Arabidopsis.

  17. Sequencing and de novo assembly of the transcriptome of the glassy-winged sharpshooter (Homalodisca vitripennis.

    Directory of Open Access Journals (Sweden)

    Raja Sekhar Nandety

    Full Text Available BACKGROUND: The glassy-winged sharpshooter Homalodisca vitripennis (Hemiptera: Cicadellidae, is a xylem-feeding leafhopper and important vector of the bacterium Xylella fastidiosa; the causal agent of Pierce's disease of grapevines. The functional complexity of the transcriptome of H. vitripennis has not been elucidated thus far. It is a necessary blueprint for an understanding of the development of H. vitripennis and for designing efficient biorational control strategies including those based on RNA interference. RESULTS: Here we elucidate and explore the transcriptome of adult H. vitripennis using high-throughput paired end deep sequencing and de novo assembly. A total of 32,803,656 paired-end reads were obtained with an average transcript length of 624 nucleotides. We assembled 32.9 Mb of the transcriptome of H. vitripennis that spanned across 47,265 loci and 52,708 transcripts. Comparison of our non-redundant database showed that 45% of the deduced proteins of H. vitripennis exhibit identity (e-value ≤1(-5 with known proteins. We assigned Gene Ontology (GO terms, Kyoto Encyclopedia of Genes and Genomes (KEGG annotations, and potential Pfam domains to each transcript isoform. In order to gain insight into the molecular basis of key regulatory genes of H. vitripennis, we characterized predicted proteins involved in the metabolism of juvenile hormone, and biogenesis of small RNAs (Dicer and Piwi sequences from the transcriptomic sequences. Analysis of transposable element sequences of H. vitripennis indicated that the genome is less expanded in comparison to many other insects with approximately 1% of the transcriptome carrying transposable elements. CONCLUSIONS: Our data significantly enhance the molecular resources available for future study and control of this economically important hemipteran. This transcriptional information not only provides a more nuanced understanding of the underlying biological and physiological mechanisms that

  18. Annotation and Re-Sequencing of Genes from De Novo Transcriptome Assembly of Abies alba (Pinaceae

    Directory of Open Access Journals (Sweden)

    Anna M. Roschanski

    2013-01-01

    Full Text Available Premise of the study: We present a protocol for the annotation of transcriptome sequence data and the identification of candidate genes therein using the example of the nonmodel conifer Abies alba. Methods and Results: A normalized cDNA library was built from an A. alba seedling. The sequencing on a 454 platform yielded more than 1.5 million reads that were de novo assembled into 25 149 contigs. Two complementary approaches were applied to annotate gene fragments that code for (1 well-known proteins and (2 proteins that are potentially adaptively relevant. Primer development and testing yielded 88 amplicons that could successfully be resequenced from genomic DNA. Conclusions: The annotation workflow offers an efficient way to identify potential adaptively relevant genes from the large quantity of transcriptome sequence data. The primer set presented should be prioritized for single-nucleotide polymorphism detection in adaptively relevant genes in A. alba.

  19. Annotation and re-sequencing of genes from de novo transcriptome assembly of Abies alba (Pinaceae).

    Science.gov (United States)

    Roschanski, Anna M; Fady, Bruno; Ziegenhagen, Birgit; Liepelt, Sascha

    2013-01-01

    We present a protocol for the annotation of transcriptome sequence data and the identification of candidate genes therein using the example of the nonmodel conifer Abies alba. • A normalized cDNA library was built from an A. alba seedling. The sequencing on a 454 platform yielded more than 1.5 million reads that were de novo assembled into 25149 contigs. Two complementary approaches were applied to annotate gene fragments that code for (1) well-known proteins and (2) proteins that are potentially adaptively relevant. Primer development and testing yielded 88 amplicons that could successfully be resequenced from genomic DNA. • The annotation workflow offers an efficient way to identify potential adaptively relevant genes from the large quantity of transcriptome sequence data. The primer set presented should be prioritized for single-nucleotide polymorphism detection in adaptively relevant genes in A. alba.

  20. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species

    Directory of Open Access Journals (Sweden)

    Wang Xiao-Wei

    2012-10-01

    Full Text Available Abstract Background Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1 and Mediterranean (MED, respectively. Results More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84% and much higher than that of MEAM1 and MED (0.83%. This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for

  1. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species.

    Science.gov (United States)

    Wang, Xiao-Wei; Zhao, Qiong-Yi; Luan, Jun-Bo; Wang, Yu-Jun; Yan, Gen-Hong; Liu, Shu-Sheng

    2012-10-04

    Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between

  2. Whole transcriptome sequencing enables discovery and analysis of viruses in archived primary central nervous system lymphomas.

    Directory of Open Access Journals (Sweden)

    Christopher DeBoever

    Full Text Available Primary central nervous system lymphomas (PCNSL have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV, JC polyomavirus (JCV, and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples.

  3. Genetic signatures of adaptation revealed from transcriptome sequencing of Arctic and red foxes.

    Science.gov (United States)

    Kumar, Vikas; Kutschera, Verena E; Nilsson, Maria A; Janke, Axel

    2015-08-07

    The genus Vulpes (true foxes) comprises numerous species that inhabit a wide range of habitats and climatic conditions, including one species, the Arctic fox (Vulpes lagopus) which is adapted to the arctic region. A close relative to the Arctic fox, the red fox (Vulpes vulpes), occurs in subarctic to subtropical habitats. To study the genetic basis of their adaptations to different environments, transcriptome sequences from two Arctic foxes and one red fox individual were generated and analyzed for signatures of positive selection. In addition, the data allowed for a phylogenetic analysis and divergence time estimate between the two fox species. The de novo assembly of reads resulted in more than 160,000 contigs/transcripts per individual. Approximately 17,000 homologous genes were identified using human and the non-redundant databases. Positive selection analyses revealed several genes involved in various metabolic and molecular processes such as energy metabolism, cardiac gene regulation, apoptosis and blood coagulation to be under positive selection in foxes. Branch site tests identified four genes to be under positive selection in the Arctic fox transcriptome, two of which are fat metabolism genes. In the red fox transcriptome eight genes are under positive selection, including molecular process genes, notably genes involved in ATP metabolism. Analysis of the three transcriptomes and five Sanger re-sequenced genes in additional individuals identified a lower genetic variability within Arctic foxes compared to red foxes, which is consistent with distribution range differences and demographic responses to past climatic fluctuations. A phylogenomic analysis estimated that the Arctic and red fox lineages diverged about three million years ago. Transcriptome data are an economic way to generate genomic resources for evolutionary studies. Despite not representing an entire genome, this transcriptome analysis identified numerous genes that are relevant to arctic

  4. Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events.

    Science.gov (United States)

    Liu, Jinfeng; Lee, William; Jiang, Zhaoshi; Chen, Zhongqiang; Jhunjhunwala, Suchit; Haverty, Peter M; Gnad, Florian; Guan, Yinghui; Gilbert, Houston N; Stinson, Jeremy; Klijn, Christiaan; Guillory, Joseph; Bhatt, Deepali; Vartanian, Steffan; Walter, Kimberly; Chan, Jocelyn; Holcomb, Thomas; Dijkgraaf, Peter; Johnson, Stephanie; Koeman, Julie; Minna, John D; Gazdar, Adi F; Stern, Howard M; Hoeflich, Klaus P; Wu, Thomas D; Settleman, Jeff; de Sauvage, Frederic J; Gentleman, Robert C; Neve, Richard M; Stokoe, David; Modrusan, Zora; Seshagiri, Somasekar; Shames, David S; Zhang, Zemin

    2012-12-01

    Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.

  5. Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles.

    Science.gov (United States)

    Marincevic-Zuniga, Yanara; Dahlberg, Johan; Nilsson, Sara; Raine, Amanda; Nystedt, Sara; Lindqvist, Carl Mårten; Berglund, Eva C; Abrahamsson, Jonas; Cavelier, Lucia; Forestier, Erik; Heyman, Mats; Lönnerholm, Gudmar; Nordlund, Jessica; Syvänen, Ann-Christine

    2017-08-14

    Structural chromosomal rearrangements that lead to expressed fusion genes are a hallmark of acute lymphoblastic leukemia (ALL). In this study, we performed transcriptome sequencing of 134 primary ALL patient samples to comprehensively detect fusion transcripts. We combined fusion gene detection with genome-wide DNA methylation analysis, gene expression profiling, and targeted sequencing to determine molecular signatures of emerging ALL subtypes. We identified 64 unique fusion events distributed among 80 individual patients, of which over 50% have not previously been reported in ALL. Although the majority of the fusion genes were found only in a single patient, we identified several recurrent fusion gene families defined by promiscuous fusion gene partners, such as ETV6, RUNX1, PAX5, and ZNF384, or recurrent fusion genes, such as DUX4-IGH. Our data show that patients harboring these fusion genes displayed characteristic genome-wide DNA methylation and gene expression signatures in addition to distinct patterns in single nucleotide variants and recurrent copy number alterations. Our study delineates the fusion gene landscape in pediatric ALL, including both known and novel fusion genes, and highlights fusion gene families with shared molecular etiologies, which may provide additional information for prognosis and therapeutic options in the future.

  6. Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles

    Directory of Open Access Journals (Sweden)

    Yanara Marincevic-Zuniga

    2017-08-01

    Full Text Available Abstract Background Structural chromosomal rearrangements that lead to expressed fusion genes are a hallmark of acute lymphoblastic leukemia (ALL. In this study, we performed transcriptome sequencing of 134 primary ALL patient samples to comprehensively detect fusion transcripts. Methods We combined fusion gene detection with genome-wide DNA methylation analysis, gene expression profiling, and targeted sequencing to determine molecular signatures of emerging ALL subtypes. Results We identified 64 unique fusion events distributed among 80 individual patients, of which over 50% have not previously been reported in ALL. Although the majority of the fusion genes were found only in a single patient, we identified several recurrent fusion gene families defined by promiscuous fusion gene partners, such as ETV6, RUNX1, PAX5, and ZNF384, or recurrent fusion genes, such as DUX4-IGH. Our data show that patients harboring these fusion genes displayed characteristic genome-wide DNA methylation and gene expression signatures in addition to distinct patterns in single nucleotide variants and recurrent copy number alterations. Conclusion Our study delineates the fusion gene landscape in pediatric ALL, including both known and novel fusion genes, and highlights fusion gene families with shared molecular etiologies, which may provide additional information for prognosis and therapeutic options in the future.

  7. Effector identification in the lettuce downy mildew Bremia lactucae by massively parallel transcriptome sequencing.

    Science.gov (United States)

    Stassen, Joost H M; Seidl, Michael F; Vergeer, Pim W J; Nijman, Isaäc J; Snel, Berend; Cuppen, Edwin; Van den Ackerveken, Guido

    2012-09-01

    Lettuce downy mildew (Bremia lactucae) is a rapidly adapting oomycete pathogen affecting commercial lettuce cultivation. Oomycetes are known to use a diverse arsenal of secreted proteins (effectors) to manipulate their hosts. Two classes of effector are known to be translocated by the host: the RXLRs and Crinklers. To gain insight into the repertoire of effectors used by B. lactucae to manipulate its host, we performed massively parallel sequencing of cDNA derived from B. lactucae spores and infected lettuce (Lactuca sativa) seedlings. From over 2.3 million 454 GS FLX reads, 59 618 contigs were assembled representing both plant and pathogen transcripts. Of these, 19 663 contigs were determined to be of B. lactucae origin as they matched pathogen genome sequences (SOLiD) that were obtained from >270 million reads of spore-derived genomic DNA. After correction of cDNA sequencing errors with SOLiD data, translation into protein models and filtering, 16 372 protein models remained, 1023 of which were predicted to be secreted. This secretome included elicitins, necrosis and ethylene-inducing peptide 1-like proteins, glucanase inhibitors and lectins, and was enriched in cysteine-rich proteins. Candidate host-translocated effectors included 78 protein models with RXLR effector features. In addition, we found indications for an unknown number of Crinkler-like sequences. Similarity clustering of secreted proteins revealed additional effector candidates. We provide a first look at the transcriptome of B. lactucae and its encoded effector arsenal. © 2012 THE AUTHORS. MOLECULAR PLANT PATHOLOGY © 2012 BSPP AND BLACKWELL PUBLISHING LTD.

  8. Transcriptome sequencing and developmental regulation of gene expression in Anopheles aquasalis.

    Directory of Open Access Journals (Sweden)

    André L Costa-da-Silva

    2014-07-01

    Full Text Available Anopheles aquasalis is a major malaria vector in coastal areas of South and Central America where it breeds preferentially in brackish water. This species is very susceptible to Plasmodium vivax and it has been already incriminated as responsible vector in malaria outbreaks. There has been no high-throughput investigation into the sequencing of An. aquasalis genes, transcripts and proteins despite its epidemiological relevance. Here we describe the sequencing, assembly and annotation of the An. aquasalis transcriptome.A total of 419 thousand cDNA sequence reads, encompassing 164 million nucleotides, were assembled in 7544 contigs of ≥ 2 sequences, and 1999 singletons. The majority of the An. aquasalis transcripts encode proteins with their closest counterparts in another neotropical malaria vector, An. darlingi. Several analyses in different protein databases were used to annotate and predict the putative functions of the deduced An. aquasalis proteins. Larval and adult-specific transcripts were represented by 121 and 424 contig sequences, respectively. Fifty-one transcripts were only detected in blood-fed females. The data also reveal a list of transcripts up- or down-regulated in adult females after a blood meal. Transcripts associated with immunity, signaling networks and blood feeding and digestion are discussed.This study represents the first large-scale effort to sequence the transcriptome of An. aquasalis. It provides valuable information that will facilitate studies on the biology of this species and may lead to novel strategies to reduce malaria transmission on the South American continent. The An. aquasalis transcriptome is accessible at http://exon.niaid.nih.gov/transcriptome/An_aquasalis/Anaquexcel.xlsx.

  9. Transcriptome sequencing and developmental regulation of gene expression in Anopheles aquasalis.

    Science.gov (United States)

    Costa-da-Silva, André L; Marinotti, Osvaldo; Ribeiro, José M C; Silva, Maria C P; Lopes, Adriana R; Barros, Michele S; Sá-Nunes, Anderson; Kojin, Bianca B; Carvalho, Eneas; Suesdek, Lincoln; Silva-Neto, Mário Alberto C; James, Anthony A; Capurro, Margareth L

    2014-07-01

    Anopheles aquasalis is a major malaria vector in coastal areas of South and Central America where it breeds preferentially in brackish water. This species is very susceptible to Plasmodium vivax and it has been already incriminated as responsible vector in malaria outbreaks. There has been no high-throughput investigation into the sequencing of An. aquasalis genes, transcripts and proteins despite its epidemiological relevance. Here we describe the sequencing, assembly and annotation of the An. aquasalis transcriptome. A total of 419 thousand cDNA sequence reads, encompassing 164 million nucleotides, were assembled in 7544 contigs of ≥ 2 sequences, and 1999 singletons. The majority of the An. aquasalis transcripts encode proteins with their closest counterparts in another neotropical malaria vector, An. darlingi. Several analyses in different protein databases were used to annotate and predict the putative functions of the deduced An. aquasalis proteins. Larval and adult-specific transcripts were represented by 121 and 424 contig sequences, respectively. Fifty-one transcripts were only detected in blood-fed females. The data also reveal a list of transcripts up- or down-regulated in adult females after a blood meal. Transcripts associated with immunity, signaling networks and blood feeding and digestion are discussed. This study represents the first large-scale effort to sequence the transcriptome of An. aquasalis. It provides valuable information that will facilitate studies on the biology of this species and may lead to novel strategies to reduce malaria transmission on the South American continent. The An. aquasalis transcriptome is accessible at http://exon.niaid.nih.gov/transcriptome/An_aquasalis/Anaquexcel.xlsx.

  10. Transcriptome sequencing and microarray design for functional genomics in the extremophile Arabidopsis relative Thellungiella salsuginea (Eutrema salsugineum).

    Science.gov (United States)

    Lee, Yang Ping; Giorgi, Federico M; Lohse, Marc; Kvederaviciute, Kotryna; Klages, Sven; Usadel, Björn; Meskiene, Irute; Reinhardt, Richard; Hincha, Dirk K

    2013-11-14

    Most molecular studies of plant stress tolerance have been performed with Arabidopsis thaliana, although it is not particularly stress tolerant and may lack protective mechanisms required to survive extreme environmental conditions. Thellungiella salsuginea has attracted interest as an alternative plant model species with high tolerance of various abiotic stresses. While the T. salsuginea genome has recently been sequenced, its annotation is still incomplete and transcriptomic information is scarce. In addition, functional genomics investigations in this species are severely hampered by a lack of affordable tools for genome-wide gene expression studies. Here, we report the results of Thellungiella de novo transcriptome assembly and annotation based on 454 pyrosequencing and development and validation of a T. salsuginea microarray. ESTs were generated from a non-normalized and a normalized library synthesized from RNA pooled from samples covering different tissues and abiotic stress conditions. Both libraries yielded partially unique sequences, indicating their necessity to obtain comprehensive transcriptome coverage. More than 1 million sequence reads were assembled into 42,810 unigenes, approximately 50% of which could be functionally annotated. These unigenes were compared to all available Thellungiella genome sequence information. In addition, the groups of Late Embryogenesis Abundant (LEA) proteins, Mitogen Activated Protein (MAP) kinases and protein phosphatases were annotated in detail. We also predicted the target genes for 384 putative miRNAs. From the sequence information, we constructed a 44 k Agilent oligonucleotide microarray. Comparison of same-species and cross-species hybridization results showed superior performance of the newly designed array for T. salsuginea samples. The developed microarrays were used to investigate transcriptional responses of T. salsuginea and Arabidopsis during cold acclimation using the MapMan software. This study provides

  11. Exploring the switchgrass transcriptome using second-generation sequencing technology.

    Directory of Open Access Journals (Sweden)

    Yixing Wang

    Full Text Available BACKGROUND: Switchgrass (Panicum virgatum L. is a C4 perennial grass and widely popular as an important bioenergy crop. To accelerate the pace of developing high yielding switchgrass cultivars adapted to diverse environmental niches, the generation of genomic resources for this plant is necessary. The large genome size and polyploid nature of switchgrass makes whole genome sequencing a daunting task even with current technologies. Exploring the transcriptional landscape using next generation sequencing technologies provides a viable alternative to whole genome sequencing in switchgrass. PRINCIPAL FINDINGS: Switchgrass cDNA libraries from germinating seedlings, emerging tillers, flowers, and dormant seeds were sequenced using Roche 454 GS-FLX Titanium technology, generating 980,000 reads with an average read length of 367 bp. De novo assembly generated 243,600 contigs with an average length of 535 bp. Using the foxtail millet genome as a reference greatly improved the assembly and annotation of switchgrass ESTs. Comparative analysis of the 454-derived switchgrass EST reads with other sequenced monocots including Brachypodium, sorghum, rice and maize indicated a 70-80% overlap. RPKM analysis demonstrated unique transcriptional signatures of the four tissues analyzed in this study. More than 24,000 ESTs were identified in the dormant seed library. In silico analysis indicated that there are more than 2000 EST-SSRs in this collection. Expression of several orphan ESTs was confirmed by RT-PCR. SIGNIFICANCE: We estimate that about 90% of the switchgrass gene space has been covered in this analysis. This study nearly doubles the amount of EST information for switchgrass currently in the public domain. The celerity and economical nature of second-generation sequencing technologies provide an in-depth view of the gene space of complex genomes like switchgrass. Sequence analysis of closely related members of the NAD(+-malic enzyme type C4 grasses such as

  12. Transcriptome Sequencing of Chemically Induced Aquilaria sinensis to Identify Genes Related to Agarwood Formation.

    Science.gov (United States)

    Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia

    2016-01-01

    Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various pathways, including the sesquiterpenoid (ko00909) and plant-pathogen interaction (ko03040) pathways. These pathways were related to sesquiterpenoid biosynthesis and defensive responses to chemical stimulation. The transcriptome data of the different parts of the chemically induced A. sinensis trunk provide a rich source of materials for discovering and identifying the genes involved in sesquiterpenoid production and in defensive responses to chemical stimulation. This study is the first to use de novo sequencing and transcriptome assembly for different parts of chemically induced A. sinensis. Results demonstrate that the sesquiterpenoid biosynthesis pathway and WRKY transcription factor play important roles in agarwood formation via chemical induction. The comparative analysis of the transcriptome data of agarwood and A. sinensis lays the foundation

  13. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    OpenAIRE

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita

    2001-01-01

    open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% o...

  14. Transcriptome sequencing and De Novo analysis of Youngia japonica using the illumina platform.

    Directory of Open Access Journals (Sweden)

    Yulan Peng

    Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.

  15. Transcriptome sequencing and analysis of leaf tissue of Avicennia marina using the Illumina platform.

    Directory of Open Access Journals (Sweden)

    Jianzi Huang

    Full Text Available Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina.

  16. Transcriptome sequencing and analysis of leaf tissue of Avicennia marina using the Illumina platform.

    Science.gov (United States)

    Huang, Jianzi; Lu, Xiang; Zhang, Wanke; Huang, Rongfeng; Chen, Shouyi; Zheng, Yizhi

    2014-01-01

    Avicennia marina is a widely distributed mangrove species that thrives in high-salinity habitats. It plays a significant role in supporting coastal ecosystem and holds unique potential for studying molecular mechanisms underlying ecological adaptation. Despite and sometimes because of its numerous merits, this species is facing increasing pressure of exploitation and deforestation. Both study on adaptation mechanisms and conservation efforts necessitate more genomic resources for A. marina. In this study, we used Illumina sequencing of an A. marina foliar cDNA library to generate a transcriptome dataset for gene and marker discovery. We obtained 40 million high-quality reads and assembled them into 91,125 unigenes with a mean length of 463 bp. These unigenes covered most of the publicly available A. marina Sanger ESTs and greatly extended the repertoire of transcripts for this species. A total of 54,497 and 32,637 unigenes were annotated based on homology to sequences in the NCBI non-redundant and the Swiss-prot protein databases, respectively. Both Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed some transcriptomic signatures of stress adaptation for this halophytic species. We also detected an extraordinary amount of transcripts derived from fungal endophytes and demonstrated the utility of transcriptome sequencing in surveying endophyte diversity without isolating them out of plant tissues. Additionally, we identified 3,423 candidate simple sequence repeats (SSRs) from 3,141 unigenes with a density of one SSR locus every 8.25 kb sequence. Our transcriptomic data will provide valuable resources for ecological, genetic and evolutionary studies in A. marina.

  17. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    Directory of Open Access Journals (Sweden)

    Shikha Kalra

    Full Text Available Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO, enzyme commission (EC and kyoto encyclopedia of genes and genomes (KEGG databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450 and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1% markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  18. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    Science.gov (United States)

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  19. Next generation sequencing unravels the biosynthetic ability of spearmint (Mentha spicata) peltate glandular trichomes through comparative transcriptomics.

    Science.gov (United States)

    Jin, Jingjing; Panicker, Deepa; Wang, Qian; Kim, Mi Jung; Liu, Jun; Yin, Jun-Lin; Wong, Limsoon; Jang, In-Cheol; Chua, Nam-Hai; Sarojam, Rajani

    2014-11-01

    Plant glandular trichomes are chemical factories with specialized metabolic capabilities to produce diverse compounds. Aromatic mint plants produce valuable essential oil in specialised glandular trichomes known as peltate glandular trichomes (PGT). Here, we performed next generation transcriptome sequencing of different tissues of Mentha spicata (spearmint) to identify differentially expressed transcripts specific to PGT. Our results provide a comprehensive overview of PGT's dynamic metabolic activities which will help towards pathway engineering. Spearmint RNAs from 3 different tissues: PGT, leaf and leaf stripped of PGTs (leaf-PGT) were sequenced by Illumina paired end sequencing. The sequences were assembled de novo into 40,587 non-redundant unigenes; spanning a total of 101 Mb. Functions could be assigned to 27,025 (67%) unigenes and among these 3,919 unigenes were differentially expressed in PGT relative to leaf - PGT. Lack of photosynthetic transcripts in PGT transcriptome indicated the high levels of purity of isolated PGT, as mint PGT are non-photosynthetic. A significant number of these unigenes remained unannotated or encoded hypothetical proteins. We found 16 terpene synthases (TPS), 18 cytochrome P450s, 5 lipid transfer proteins and several transcription factors that were preferentially expressed in PGT. Among the 16 TPSs, two were characterized biochemically and found to be sesquiterpene synthases. The extensive transcriptome data set renders a complete description of genes differentially expressed in spearmint PGT. This will facilitate the metabolic engineering of mint terpene pathway to increase yield and also enable the development of strategies for sustainable production of novel or altered valuable compounds in mint.

  20. A gene expression microarray for Nicotiana benthamiana based on de novo transcriptome sequence assembly.

    Science.gov (United States)

    Goralski, Michal; Sobieszczanska, Paula; Obrepalska-Steplowska, Aleksandra; Swiercz, Aleksandra; Zmienko, Agnieszka; Figlerowicz, Marek

    2016-01-01

    Nicotiana benthamiana has been widely used in laboratories around the world for studying plant-pathogen interactions and posttranscriptional gene expression silencing. Yet the exploration of its transcriptome has lagged behind due to the lack of both adequate sequence information and genome-wide analysis tools, such as DNA microarrays. Despite the increasing use of high-throughput sequencing technologies, the DNA microarrays still remain a popular gene expression tool, because they are cheaper and less demanding regarding bioinformatics skills and computational effort. We designed a gene expression microarray with 103,747 60-mer probes, based on two recently published versions of N. benthamiana transcriptome (v.3 and v.5). Both versions were reconstructed from RNA-Seq data of non-strand-specific pooled-tissue libraries, so we defined the sense strand of the contigs prior to designing the probe. To accomplish this, we combined a homology search against Arabidopsis thaliana proteins and hybridization to a test 244k microarray containing pairs of probes, which represented individual contigs. We identified the sense strand in 106,684 transcriptome contigs and used this information to design an Nb-105k microarray on an Agilent eArray platform. Following hybridization of RNA samples from N. benthamiana roots and leaves we demonstrated that the new microarray had high specificity and sensitivity for detection of differentially expressed transcripts. We also showed that the data generated with the Nb-105k microarray may be used to identify incorrectly assembled contigs in the v.5 transcriptome, by detecting inconsistency in the gene expression profiles, which is indicated using multiple microarray probes that match the same v.5 primary transcripts. We provided a complete design of an oligonucleotide microarray that may be applied to the research of N. benthamiana transcriptome. This, in turn, will allow the N. benthamiana research community to take full advantage of

  1. De novo transcriptome assembly of Zanthoxylum bungeanum using Illumina sequencing for evolutionary analysis and simple sequence repeat marker development.

    Science.gov (United States)

    Feng, Shijing; Zhao, Lili; Liu, Zhenshan; Liu, Yulin; Yang, Tuxi; Wei, Anzhi

    2017-12-01

    Zanthoxylum, an ancient economic crop in Asia, has a satisfying aromatic taste and immense medicinal values. A lack of genomic information and genetic markers has limited the evolutionary analysis and genetic improvement of Zanthoxylum species and their close relatives. To better understand the evolution, domestication, and divergence of Zanthoxylum, we present a de novo transcriptome analysis of an elite cultivar of Z. bungeanum using Illumina sequencing; we then developed simple sequence repeat markers for identification of Zanthoxylum. In total, we predicted 45,057 unigenes and 22,212 protein coding sequences, approximately 90% of which showed significant similarities to known proteins in databases. Phylogenetic analysis indicated that Zanthoxylum is relatively recent and estimated to have diverged from Citrus ca. 36.5-37.7 million years ago. We also detected a whole-genome duplication event in Zanthoxylum that occurred 14 million years ago. We found no protein coding sequences that were significantly under positive selection by Ka/Ks. Simple sequence repeat analysis divided 31 Zanthoxylum cultivars and landraces into three major groups. This Zanthoxylum reference transcriptome provides crucial information for the evolutionary study of the Zanthoxylum genus and the Rutaceae family, and facilitates the establishment of more effective Zanthoxylum breeding programs.

  2. Sequencing and de novo assembly of a Dahlia hybrid cultivar transcriptome

    Directory of Open Access Journals (Sweden)

    Erik Michael Lehnert

    2014-07-01

    Full Text Available Dahlia variabilis, with an exceptionally high diversity of floral forms and colors, is a popular flower amongst both commercial growers and hobbyists. Recently, some genetic controls of pigment patterns have been elucidated. These studies have been limited, however, by the lack of comprehensive transcriptomic resources for this species. Here we report the sequencing, assembly, and annotation of the transcriptome of the developing leaves, stems, and floral buds of D. variabilis. This resulted in 35,638 contigs, most of which seem to contain the complete coding sequence, and of which 20,881 could be successfully annotated by similarity to UniProt. Furthermore, we conducted a preliminary investigation to identify contigs with expression patterns consistent with tissue-specificity. These results will accelerate research into the genetic controls of pigmentation and floral form of D. variabilis.

  3. Sequencing and de novo assembly of a Dahlia hybrid cultivar transcriptome.

    Science.gov (United States)

    Lehnert, Erik M; Walbot, Virginia

    2014-01-01

    Dahlia variabilis, with an exceptionally high diversity of floral forms and colors, is a popular flower amongst both commercial growers and hobbyists. Recently, some genetic controls of pigment patterns have been elucidated. These studies have been limited, however, by the lack of comprehensive transcriptomic resources for this species. Here we report the sequencing, assembly, and annotation of the transcriptome of the developing leaves, stems, and floral buds of D. variabilis. This resulted in 35,638 contigs, most of which seem to contain the complete coding sequence, and of which 20,881 could be successfully annotated by similarity to UniProt. Furthermore, we conducted a preliminary investigation to identify contigs with expression patterns consistent with tissue-specificity. These results will accelerate research into the genetic controls of pigmentation and floral form of D. variabilis.

  4. De novo transcriptome sequencing in Anopheles funestus using Illumina RNA-seq technology.

    Directory of Open Access Journals (Sweden)

    Jacob E Crawford

    Full Text Available BACKGROUND: Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. METHODOLOGY/PRINCIPAL FINDINGS: We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and "target-based" contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1:1 orthologues between An. funestus and An. gambiae and found that among these 1:1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. CONCLUSIONS/SIGNIFICANCE: We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case. We anticipate that our approach

  5. Characterization of an extensive rainbow trout miRNA transcriptome by next generation sequencing.

    Science.gov (United States)

    Juanchich, Amelie; Bardou, Philippe; Rué, Olivier; Gabillard, Jean-Charles; Gaspin, Christine; Bobe, Julien; Guiguen, Yann

    2016-03-01

    MicroRNAs (miRNAs) have emerged as important post-transcriptional regulators of gene expression in a wide variety of physiological processes. They can control both temporal and spatial gene expression and are believed to regulate 30 to 70% of the genes. Data are however limited for fish species, with only 9 out of the 30,000 fish species present in miRBase. The aim of the current study was to discover and characterize rainbow trout (Oncorhynchus mykiss) miRNAs in a large number of tissues using next-generation sequencing in order to provide an extensive repertoire of rainbow trout miRNAs. A total of 38 different samples corresponding to 16 different tissues or organs were individually sequenced and analyzed independently in order to identify a large number of miRNAs with high confidence. This led to the identification of 2946 miRNA loci in the rainbow trout genome, including 445 already known miRNAs. Differential expression analysis was performed in order to identify miRNAs exhibiting specific or preferential expression among the 16 analyzed tissues. In most cases, miRNAs exhibit a specific pattern of expression in only a few tissues. The expression data from sRNA sequencing were confirmed by RT-qPCR. In addition, novel miRNAs are described in rainbow trout that had not been previously reported in other species. This study represents the first characterization of rainbow trout miRNA transcriptome from a wide variety of tissue and sets an extensive repertoire of rainbow trout miRNAs. It provides a starting point for future studies aimed at understanding the roles of miRNAs in major physiological process such as growth, reproduction or adaptation to stress. These rainbow trout miRNAs repertoire provide a novel resource to advance genomic research in salmonid species.

  6. Minimally invasive genomic and transcriptomic profiling of visceral cancers by next-generation sequencing of circulating exosomes.

    Science.gov (United States)

    San Lucas, F A; Allenson, K; Bernard, V; Castillo, J; Kim, D U; Ellis, K; Ehli, E A; Davies, G E; Petersen, J L; Li, D; Wolff, R; Katz, M; Varadhachary, G; Wistuba, I; Maitra, A; Alvarez, H

    2016-04-01

    The ability to perform comprehensive profiling of cancers at high resolution is essential for precision medicine. Liquid biopsies using shed exosomes provide high-quality nucleic acids to obtain molecular characterization, which may be especially useful for visceral cancers that are not amenable to routine biopsies. We isolated shed exosomes in biofluids from three patients with pancreaticobiliary cancers (two pancreatic, one ampullary). We performed comprehensive profiling of exoDNA and exoRNA by whole genome, exome and transcriptome sequencing using the Illumina HiSeq 2500 sequencer. We assessed the feasibility of calling copy number events, detecting mutational signatures and identifying potentially actionable mutations in exoDNA sequencing data, as well as expressed point mutations and gene fusions in exoRNA sequencing data. Whole-exome sequencing resulted in 95%-99% of the target regions covered at a mean depth of 133-490×. Genome-wide copy number profiles, and high estimates of tumor fractions (ranging from 56% to 82%), suggest robust representation of the tumor DNA within the shed exosomal compartment. Multiple actionable mutations, including alterations in NOTCH1 and BRCA2, were found in patient exoDNA samples. Further, RNA sequencing of shed exosomes identified the presence of expressed fusion genes, representing an avenue for elucidation of tumor neoantigens. We have demonstrated high-resolution profiling of the genomic and transcriptomic landscapes of visceral cancers. A wide range of cancer-derived biomarkers could be detected within the nucleic acid cargo of shed exosomes, including copy number profiles, point mutations, insertions, deletions, gene fusions and mutational signatures. Liquid biopsies using shed exosomes has the potential to be used as a clinical tool for cancer diagnosis, therapeutic stratification and treatment monitoring, precluding the need for direct tumor sampling. © The Author 2015. Published by Oxford University Press on behalf

  7. De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes.

    Directory of Open Access Journals (Sweden)

    Ingrid Garbus

    Full Text Available A long-standing goal in plant breeding has been the ability to confer apomixis to agriculturally relevant species, which would require a deeper comprehension of the molecular basis of apomictic regulatory mechanisms. Eragrostis curvula (Schrad. Nees is a perennial grass that includes both sexual and apomictic cytotypes. The availability of a reference transcriptome for this species would constitute a very important tool toward the identification of genes controlling key steps of the apomictic pathway. Here, we used Roche/454 sequencing technologies to generate reads from inflorescences of E. curvula apomictic and sexual genotypes that were de novo assembled into a reference transcriptome. Near 90% of the 49568 assembled isotigs showed sequence similarity to sequences deposited in the public databases. A gene ontology analysis categorized 27448 isotigs into at least one of the three main GO categories. We identified 11475 SSRs, and several of them were assayed in E curvula germoplasm using SSR-based primers, providing a valuable set of molecular markers that could allow direct allele selection. The differential contribution to each library of the spliced forms of several transcripts revealed the existence of several isotigs produced via alternative splicing of single genes. The reference transcriptome presented and validated in this work will be useful for the identification of a wide range of gene(s related to agronomic traits of E. curvula, including those controlling key steps of the apomictic pathway in this species, allowing the extrapolation of the findings to other plant species.

  8. Sequencing and de novo transcriptome assembly of Anthopleura dowii Verrill (1869, from Mexico

    Directory of Open Access Journals (Sweden)

    Jorge-Tonatiuh Ayala-Sumuano

    2017-03-01

    Full Text Available Next-generation technologies for determination of genomics and transcriptomics composition have a wide range of applications. Moreover, the development of tools for big data set analysis has allowed the identification of molecules and networks involved in metabolism, evolution or behavior. By natural habitats aquatic organisms have implemented molecular strategies for survival, including the production and secretion of toxic compounds for their predators; therefore these organisms are possible sources of proteins or peptides with potential biotechnological application. In the last decade anthozoans, mainly octocorals but also sea anemones, have been proben to be a source of natural products. Members of the genus Anthopleura are one of the best known and most studied sea anemones because they are common constituents of rocky intertidal communities and show interesting ecological and biological phenomena (e.g. intraespecific competition, symbiosis, etc.; however, many aspects of these taxa remain in need to be analyzed. This work describes the transcriptome sequencing of Anthopleura dowii Verrill, 1869 (Cnidaria: Anthozoa: Actiniaria; this is the first report of this kind for these species. The data set used to construct the transcriptome has been deposited on NCBI's database. Illumina sequence reads are available under BioProject accession number PRJNA329297 and Sequence Read Archive under accession number SRP078992.

  9. Single-cell sequencing of the small-RNA transcriptome.

    Science.gov (United States)

    Faridani, Omid R; Abdullayev, Ilgar; Hagemann-Jensen, Michael; Schell, John P; Lanner, Fredrik; Sandberg, Rickard

    2016-12-01

    Little is known about the heterogeneity of small-RNA expression as small-RNA profiling has so far required large numbers of cells. Here we present a single-cell method for small-RNA sequencing and apply it to naive and primed human embryonic stem cells and cancer cells. Analysis of microRNAs and fragments of tRNAs and small nucleolar RNAs (snoRNAs) reveals the potential of microRNAs as markers for different cell types and states.

  10. Whole-genome duplication and molecular evolution in Cornus L. (Cornaceae - Insights from transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Yan Yu

    Full Text Available The pattern and rate of genome evolution have profound consequences in organismal evolution. Whole-genome duplication (WGD, or polyploidy, has been recognized as an important evolutionary mechanism of plant diversification. However, in non-model plants the molecular signals of genome duplications have remained largely unexplored. High-throughput transcriptome data from next-generation sequencing have set the stage for novel investigations of genome evolution using new bioinformatic and methodological tools in a phylogenetic framework. Here we compare ten de novo-assembled transcriptomes representing the major lineages of the angiosperm genus Cornus (dogwood and relevant outgroups using a customized pipeline for analyses. Using three distinct approaches, molecular dating of orthologous genes, analyses of the distribution of synonymous substitutions between paralogous genes, and examination of substitution rates through time, we detected a shared WGD event in the late Cretaceous across all taxa sampled. The inferred doubling event coincides temporally with the paleoclimatic changes associated with the initial divergence of the genus into three major lineages. Analyses also showed an acceleration of rates of molecular evolution after WGD. The highest rates of molecular evolution were observed in the transcriptome of the herbaceous lineage, C. canadensis, a species commonly found at higher latitudes, including the Arctic. Our study demonstrates the value of transcriptome data for understanding genome evolution in closely related species. The results suggest dramatic increase in sea surface temperature in the late Cretaceous may have contributed to the evolution and diversification of flowering plants.

  11. Transcriptome sequencing and expression analysis of terpenoid biosynthesis genes in Litsea cubeba.

    Directory of Open Access Journals (Sweden)

    Xiao-Jiao Han

    Full Text Available BACKGROUND: Aromatic essential oils extracted from fresh fruits of Litsea cubeba (Lour. Pers., have diverse medical and economic values. The dominant components in these essential oils are monoterpenes and sesquiterpenes. Understanding the molecular mechanisms of terpenoid biosynthesis is essential for improving the yield and quality of terpenes. However, the 40 available L. cubeba nucleotide sequences in the public databases are insufficient for studying the molecular mechanisms. Thus, high-throughput transcriptome sequencing of L. cubeba is necessary to generate large quantities of transcript sequences for the purpose of gene discovery, especially terpenoid biosynthesis related genes. RESULTS: Using Illumina paired-end sequencing, approximately 23.5 million high-quality reads were generated. De novo assembly yielded 68,648 unigenes with an average length of 834 bp. A total of 38,439 (56% unigenes were annotated for their functions, and 35,732 and 25,806 unigenes could be aligned to the GO and COG database, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG, 16,130 unigenes were assigned to 297 KEGG pathways, and 61 unigenes, which contained the mevalonate and 2-C-methyl-D-erythritol 4-phosphate pathways, could be related to terpenoid backbone biosynthesis. Of the 12,963 unigenes, 285 were annotated to the terpenoid pathways using the PlantCyc database. Additionally, 14 terpene synthase genes were identified from the transcriptome. The expression patterns of the 16 genes related to terpenoid biosynthesis were analyzed by RT-qPCR to explore their putative functions. CONCLUSION: RNA sequencing was effective in identifying a large quantity of sequence information. To our knowledge, this study is the first exploration of the L. cubeba transcriptome, and the substantial amount of transcripts obtained will accelerate the understanding of the molecular mechanisms of essential oils biosynthesis. The

  12. De Novo Sequencing and Assembly Analysis of Transcriptome in Pinus bungeana Zucc. ex Endl.

    Directory of Open Access Journals (Sweden)

    Qifei Cai

    2018-03-01

    Full Text Available To enrich the molecular data of Pinus bungeana Zucc. ex Endl. and study the regulating factors of different morphology controled by apical dominance. In this study, de novo assembly of transcriptome annotation was performed for two varieties of Pinus bungeana Zucc. ex Endl. that are obviously different in morphology. More than 147 million reads were produced, which were assembled into 88,092 unigenes. Based on a similarity search, 11,692 unigenes showed significant similarity to proteins from Picea sitchensis (Bong. Carr. From this collection of unigenes, a large number of molecular markers were identified, including 2829 simple sequence repeats (SSRs. A total of 158 unigenes expressed differently between two varieties, including 98 up-regulated and 60 down-regulated unigenes. Furthermore, among the differently expressed genes (DEGs, five genes which may impact the plant morphology were further validated by reverse transcription quantitative polymerase chain reaction (RT-qPCR. The five genes related to cytokinin oxidase/dehydrogenase (CKX, two-component response regulator ARR-A family (ARR-A, plant hormone signal transduction (AHP, and MADS-box transcription factors have a close relationship with apical dominance. This new dataset will be a useful resource for future genetic and genomic studies in Pinus bungeana Zucc. ex Endl.

  13. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  14. Transcriptome and target DNA enrichment sequence data provide new insights into the phylogeny of vespid wasps (Hymenoptera: Aculeata: Vespidae)

    DEFF Research Database (Denmark)

    Bank, Sarah; Sann, Manuela; Mayer, Christoph

    2017-01-01

    and exploited the transcript sequences for design of probes for enriching 913 single-copy protein-coding genes to complement the transcriptome data with nucleotide sequence data from additional 25 ethanol-preserved vespid species. Results from phylogenetic analyses of the combined sequence data revealed...

  15. SNP discovery in European anchovy (Engraulis encrasicolus, L by high-throughput transcriptome and genome sequencing.

    Directory of Open Access Journals (Sweden)

    Iratxe Montes

    Full Text Available Increased throughput in sequencing technologies has facilitated the acquisition of detailed genomic information in non-model species. The focus of this research was to discover and validate SNPs derived from the European anchovy (Engraulis encrasicolus transcriptome, a species with no available reference genome, using next generation sequencing technologies. A cDNA library was constructed from four tissues of ten fish individuals corresponding to three populations of E. encrasicolus, and Roche 454 GS FLX Titanium sequencing yielded 19,367 contigs. Additionally, the European anchovy genome was sequenced for the same ten individuals using an Illumina HiSeq2000. Using a computational pipeline for combining transcriptome and genome information, a total of 18,994 SNPs met the necessary minor allele frequency and depth filters. A series of further stringent filters were applied to identify those SNPs likely to succeed in genotyping assays, and for filtering of those in potential duplicated genome regions. A novel method for detecting potential intron-exon boundaries in areas of putative SNPs has also been applied in silico to improve genotyping success. In all, 2,317 filtered putative transcriptome SNPs suitable for genotyping primer design were identified. From those, a subset of 530 were selected, with the genotyping results showing the highest reported conversion and validation rates (91.3% and 83.2%, respectively reported to date for a non-model species. This study represents a promising strategy to discover genotypable SNPs in the exome of non-model organisms. The genomic resource generated for E. encrasicolus, both in terms of sequences and novel markers, will be informative for research into this species with applications including traceability studies, population genetic analyses and aquaculture.

  16. SNP discovery in European anchovy (Engraulis encrasicolus, L) by high-throughput transcriptome and genome sequencing.

    Science.gov (United States)

    Montes, Iratxe; Conklin, Darrell; Albaina, Aitor; Creer, Simon; Carvalho, Gary R; Santos, María; Estonba, Andone

    2013-01-01

    Increased throughput in sequencing technologies has facilitated the acquisition of detailed genomic information in non-model species. The focus of this research was to discover and validate SNPs derived from the European anchovy (Engraulis encrasicolus) transcriptome, a species with no available reference genome, using next generation sequencing technologies. A cDNA library was constructed from four tissues of ten fish individuals corresponding to three populations of E. encrasicolus, and Roche 454 GS FLX Titanium sequencing yielded 19,367 contigs. Additionally, the European anchovy genome was sequenced for the same ten individuals using an Illumina HiSeq2000. Using a computational pipeline for combining transcriptome and genome information, a total of 18,994 SNPs met the necessary minor allele frequency and depth filters. A series of further stringent filters were applied to identify those SNPs likely to succeed in genotyping assays, and for filtering of those in potential duplicated genome regions. A novel method for detecting potential intron-exon boundaries in areas of putative SNPs has also been applied in silico to improve genotyping success. In all, 2,317 filtered putative transcriptome SNPs suitable for genotyping primer design were identified. From those, a subset of 530 were selected, with the genotyping results showing the highest reported conversion and validation rates (91.3% and 83.2%, respectively) reported to date for a non-model species. This study represents a promising strategy to discover genotypable SNPs in the exome of non-model organisms. The genomic resource generated for E. encrasicolus, both in terms of sequences and novel markers, will be informative for research into this species with applications including traceability studies, population genetic analyses and aquaculture.

  17. Peripheral blood transcriptome sequencing reveals rejection-relevant genes in long-term heart transplantation.

    Science.gov (United States)

    Chen, Yan; Zhang, Haibo; Xiao, Xue; Jia, Yixin; Wu, Weili; Liu, Licheng; Jiang, Jun; Zhu, Baoli; Meng, Xu; Chen, Weijun

    2013-10-03

    Peripheral blood-based gene expression patterns have been investigated as biomarkers to monitor the immune system and rule out rejection after heart transplantation. Recent advances in the high-throughput deep sequencing (HTS) technologies provide new leads in transcriptome analysis. By performing Solexa/Illumina's digital gene expression (DGE) profiling, we analyzed gene expression profiles of PBMCs from 6 quiescent (grade 0) and 6 rejection (grade 2R&3R) heart transplant recipients at more than 6 months after transplantation. Subsequently, quantitative real-time polymerase chain reaction (qRT-PCR) was carried out in an independent validation cohort of 47 individuals from three rejection groups (ISHLT, grade 0,1R, 2R&3R). Through DGE sequencing and qPCR validation, 10 genes were identified as informative genes for detection of cardiac transplant rejection. A further clustering analysis showed that the 10 genes were not only effective for distinguishing patients with acute cardiac allograft rejection, but also informative for discriminating patients with renal allograft rejection based on both blood and biopsy samples. Moreover, PPI network analysis revealed that the 10 genes were connected to each other within a short interaction distance. We proposed a 10-gene signature for heart transplant patients at high-risk of developing severe rejection, which was found to be effective as well in other organ transplant. Moreover, we supposed that these genes function systematically as biomarkers in long-time allograft rejection. Further validation in broad transplant population would be required before the non-invasive biomarkers can be generally utilized to predict the risk of transplant rejection. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  18. The Pseudomonas aeruginosa transcriptome in planktonic cultures and static biofilms using RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Andreas Dötsch

    Full Text Available In this study, we evaluated how gene expression differs in mature Pseudomonas aeruginosa biofilms as opposed to planktonic cells by the use of RNA sequencing technology that gives rise to both quantitative and qualitative information on the transcriptome. Although a large proportion of genes were consistently regulated in both the stationary phase and biofilm cultures as opposed to the late exponential growth phase cultures, the global biofilm gene expression pattern was clearly distinct indicating that biofilms are not just surface attached cells in stationary phase. A large amount of the genes found to be biofilm specific were involved in adaptation to microaerophilic growth conditions, repression of type three secretion and production of extracellular matrix components. Additionally, we found many small RNAs to be differentially regulated most of them similarly in stationary phase cultures and biofilms. A qualitative analysis of the RNA-seq data revealed more than 3000 putative transcriptional start sites (TSS. By the use of rapid amplification of cDNA ends (5'-RACE we confirmed the presence of three different TSS associated with the pqsABCDE operon, two in the promoter of pqsA and one upstream of the second gene, pqsB. Taken together, this study reports the first transcriptome study on P. aeruginosa that employs RNA sequencing technology and provides insights into the quantitative and qualitative transcriptome including the expression of small RNAs in P. aeruginosa biofilms.

  19. Analysis of upland cotton (Gossypium hirsutum) response to Verticillium dahliae inoculation by transcriptome sequencing.

    Science.gov (United States)

    Shao, B X; Zhao, Y L; Chen, W; Wang, H M; Guo, Z J; Gong, H Y; Sang, X H; Cui, Y L; Wang, C H

    2015-10-27

    Verticillium wilt is one of the main diseases in cotton (Gossypium hirsutum), severely reduces yield and fiber quality, and is difficult to be con-trolled effectively. At present, the molecular mechanism that confers resistance to this disease is unclear. Transcriptome sequencing is an important method to detect resistance genes, explore metabolic pathways, and study resistance mechanisms. In this study, the transcriptome of a disease-resistant inbred cot-ton line inoculated with Verticillium dahliae was sequenced. A total of 126,402 unigenes were obtained using de novo assembly and data analysis, 99,712 (78.88%) of which were annotated into the Nr, Nt, Swiss-Prot, KEGG, COG, and GO databases. The expression patterns of 16 candidate disease-resis-tance genes showed that some genes were upregulated soon after V. dahliae inoculation and others were upregulated later, which may indicate instanta-neous basal defense and lagged specific defense, respectively. We conducted a preliminary analysis of the transcriptome database, which will contribute to further research regarding the cloning of disease-resistance genes.

  20. Transcriptome sequencing of two phenotypic mosaic Eucalyptus trees reveals large scale transcriptome re-modelling.

    Directory of Open Access Journals (Sweden)

    Amanda Padovan

    Full Text Available Phenotypic mosaic trees offer an ideal system for studying differential gene expression. We have investigated two mosaic eucalypt trees from two closely related species (Eucalyptus melliodora and E. sideroxylon, which each support two types of leaves: one part of the canopy is resistant to insect herbivory and the remaining leaves are susceptible. Driving this ecological distinction are differences in plant secondary metabolites. We used these phenotypic mosaics to investigate genome wide patterns of foliar gene expression with the aim of identifying patterns of differential gene expression and the somatic mutation(s that lead to this phenotypic mosaicism. We sequenced the mRNA pool from leaves of the resistant and susceptible ecotypes from both mosaic eucalypts using the Illumina HiSeq 2000 platform. We found large differences in pathway regulation and gene expression between the ecotypes of each mosaic. The expression of the genes in the MVA and MEP pathways is reflected by variation in leaf chemistry, however this is not the case for the terpene synthases. Apart from the terpene biosynthetic pathway, there are several other metabolic pathways that are differentially regulated between the two ecotypes, suggesting there is much more phenotypic diversity than has been described. Despite the close relationship between the two species, they show large differences in the global patterns of gene and pathway regulation.

  1. Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation

    DEFF Research Database (Denmark)

    Czaban, Adrian; Sharma, Sapna; Byrne, Stephen

    2015-01-01

    -Festuca complex show very diverse phenotypes, including for many agronomically important traits. Analysis of sequenced transcriptomes of these non-model species may shed light on the molecular mechanisms underlying this phenotypic diversity. Results We have generated de novo transcriptome assemblies for four......Background The Lolium-Festuca complex incorporates species from the Lolium genera and the broad leaf fescues, both belonging to the subfamily Pooideae. This subfamily also includes wheat, barley, oat and rye, making it extremely important to world agriculture. Species within the Lolium...... species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family...

  2. Next is now: new technologies for sequencing of genomes, transcriptomes and beyond

    Science.gov (United States)

    Lister, Ryan; Gregory, Brian D.; Ecker, Joseph R.

    2009-01-01

    Summary The sudden availability of DNA sequencing technologies that rapidly produce vast amounts of sequence information has triggered a paradigm shift in genomics, enabling massively parallel surveying of complex nucleic acid populations. The diversity of applications to which these technologies have already been applied demonstrates the immense range of cellular processes and properties that can now be studied at the single-base resolution. These include genome resequencing and polymorphism discovery, mutation mapping, DNA methylation, histone modifications, transcriptome sequencing, gene discovery, alternative splicing identification, small RNA profiling, DNA-protein and possibly even protein-protein interactions. Thus, these deep sequencing technologies offer plant biologists unprecedented opportunities to increase the understanding of the functions and dynamics of plant cells and populations. PMID:19157957

  3. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    Science.gov (United States)

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression

  4. Sequencing and De Novo Assembly of the Gonadal Transcriptome of the Endangered Chinese Sturgeon (Acipenser sinensis.

    Directory of Open Access Journals (Sweden)

    Huamei Yue

    Full Text Available The Chinese sturgeon (Acipenser sinensis is endangered through anthropogenic activities including over-fishing, damming, shipping, and pollution. Controlled reproduction has been adopted and successfully conducted for conservation. However, little information is available on the reproductive regulation of the species. In this study, we conducted de novo transcriptome assembly of the gonad tissue to create a comprehensive dataset for A. sinensis.The Illumina sequencing platform was adopted to obtain 47,333,701 and 47,229,705 high quality reads from testis and ovary cDNA libraries generated from three-year-old A. sinensis. We identified 86,027 unigenes of which 30,268 were annotated in the NCBI non-redundant protein database and 28,281 were annotated in the Swiss-prot database. Among the annotated unigenes, 26,152 and 7,734 unigenes, respectively, were assigned to gene ontology categories and clusters of orthologous groups. In addition, 12,557 unigenes were mapped to 231 pathways in the Kyoto Encyclopedia of Genes and Genomes Pathway database. A total of 1,896 unigenes, potentially differentially expressed between the two gonad types, were found, with 1,894 predicted to be up-regulated in ovary and only two in testis. Fifty-five potential gametogenesis-related genes were screened in the transcriptome and 34 genes with significant matches were found. Besides, more paralogs of 11 genes in three gene families (sox, apolipoprotein and cyclin were found in A. sinensis compared to their orthologs in the diploid Danio rerio. In addition, 12,151 putative simple sequence repeats (SSRs were detected.This study provides the first de novo transcriptome analysis currently available for A. sinensis. The transcriptomic data represents the fundamental resource for future research on the mechanism of early gametogenesis in sturgeons. The SSRs identified in this work will be valuable for assessment of genetic diversity of wild fish and genealogy management of

  5. Development of Transcriptomic Markers for Population Analysis Using Restriction Site Associated RNA Sequencing (RARseq.

    Directory of Open Access Journals (Sweden)

    Magdy S Alabady

    Full Text Available We describe restriction site associated RNA sequencing (RARseq, an RNAseq-based genotype by sequencing (GBS method. It includes the construction of RNAseq libraries from double stranded cDNA digested with selected restriction enzymes. To test this, we constructed six single- and six-dual-digested RARseq libraries from six F2 pitcher plant individuals and sequenced them on a half of a Miseq run. On average, the de novo approach of population genome analysis detected 544 and 570 RNA SNPs, whereas the reference transcriptome-based approach revealed an average of 1907 and 1876 RNA SNPs per individual, from single- and dual-digested RARseq data, respectively. The average numbers of RNA SNPs and alleles per loci are 1.89 and 2.17, respectively. Our results suggest that the RARseq protocol allows good depth of coverage per loci for detecting RNA SNPs and polymorphic loci for population genomics and mapping analyses. In non-model systems where complete genomes sequences are not always available, RARseq data can be analyzed in reference to the transcriptome. In addition to enriching for functional markers, this method may prove particularly useful in organisms where the genomes are not favorable for DNA GBS.

  6. Analysis of wheat microspore embryogenesis induction by transcriptome and small RNA sequencing using the highly responsive cultivar "Svilena".

    Science.gov (United States)

    Seifert, Felix; Bössow, Sandra; Kumlehn, Jochen; Gnad, Heike; Scholten, Stefan

    2016-04-21

    Microspore embryogenesis describes a stress-induced reprogramming of immature male plant gametophytes to develop into embryo-like structures, which can be regenerated into doubled haploid plants after whole genome reduplication. This mechanism is of high interest for both research as well as plant breeding. The objective of this study was to characterize transcriptional changes and regulatory relationships in early stages of cold stress-induced wheat microspore embryogenesis by transcriptome and small RNA sequencing using a highly responsive cultivar. Transcriptome and small RNA sequencing was performed in a staged time-course to analyze wheat microspore embryogenesis induction. The analyzed stages were freshly harvested, untreated uninucleate microspores and the two following stages from in vitro anther culture: directly after induction by cold-stress treatment and microspores undergoing the first nuclear divisions. A de novo transcriptome assembly resulted in 29,388 contigs distributing to 20,224 putative transcripts of which 9,305 are not covered by public wheat cDNAs. Differentially expressed transcripts and small RNAs were identified for the stage transitions highlighting various processes as well as specific genes to be involved in microspore embryogenesis induction. This study establishes a comprehensive functional genomics resource for wheat microspore embryogenesis induction and initial understanding of molecular mechanisms involved. A large set of putative transcripts presumably specific for microspore embryogenesis induction as well as contributing processes and specific genes were identified. The results allow for a first insight in regulatory roles of small RNAs in the reprogramming of microspores towards an embryogenic cell fate.

  7. De novo transcriptome sequencing and analysis of male, pseudo-male and female yellow perch, Perca flavescens.

    Directory of Open Access Journals (Sweden)

    Yan-He Li

    Full Text Available Transcriptome sequencing could facilitate discovery of sex-biased genes, biological pathways and molecular markers, which could help clarify the molecular mechanism of sex determination and sexual dimorphism, and assist with selective breeding in aquaculture. Yellow perch has unique gonad system and sexual dimorphism and is an alternative model to study mechanism of sex determination, sexual dimorphism and sexual selection. In this study, we performed the de novo assembly of yellow perch gonads and muscle transcriptomes by high throughput Illumina sequencing. A total of 212,180 contigs were obtained, ranging from 127 to 64,876 bp, and N50 of 1,066 bp. The assembly RNA-Seq contigs (≥200bp were then used for subsequent analyses, including annotation, pathway analysis, and microsatellites discovery. No female- and pseudo-male-biased genes were involved in any pathways while male-biased genes were involved in 29 pathways, and neuroactive ligand receptor interaction and enzyme of trypsin (enzyme code, EC: 3.4.21.4 was highly involved. Pyruvate kinase (enzyme code, EC: 2.7.1.40, which plays important roles in cell proliferation, was highly expressed in muscles. In addition, a total of 183,939 SNPs, 11,286 InDels and 41,479 microsatellites were identified. This study is the first report on transcriptome information in Percids, and provides rich resources for conducting further studies on understanding the molecular basis of sex determinations, sexual dimorphism, and sexual selection in fish, and for population studies and marker-assisted selection in Percids.

  8. Differential Expression Profiles of the Transcriptome in Breast Cancer Cell Lines Revealed by Next Generation Sequencing

    Directory of Open Access Journals (Sweden)

    Yu Shi

    2017-11-01

    Full Text Available Background/Aims: As MCF-7 and MDA-MB-231 cells are the typical cell lines of two clinical breast tumour subtypes, the aim of the present study was to elucidate the transcriptome differences between MCF-7 and MDA-MB-231 breast cancer cell lines. Methods: The mRNA, miRNA (MicroRNA and lncRNA (Long non-coding RNA expression profiles were examined using NGS (next generation sequencing instrument Illumina HiSeq-2500. GO (Gene Ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes pathway analyses were performed to identify the biological functions of differentially expressed coding RNAs. Subsequently, we constructed an mRNA-ncRNA (non-coding RNA targeting regulatory network. Finally, we performed RT-qPCR (real-time quantitative PCR to confirm the NGS results. Results: There are sharp distinctions of the coding and non-coding RNA profiles between MCF-7 and MDA-MB-231 cell lines. Among the mRNAs and ncRNAs with the most differential expression, SLPI, SOD2, miR-7, miR-143 and miR-145 were highly expressed in MCF-7 cells, while CD55, KRT17, miR-21, miR-10b, miR-9, NEAT1 and PICSAR were over-expressed in MDA-MB-231 cells. Differentially expressed mRNAs are primarily involved in biological processes of locomotion, biological adhesion, ECM-receptor interaction pathway and focal adhesion. In the targeting regulatory network of differentially expressed RNAs, mRNAs and miRNAs are primarily associated with tumour metastasis, but the functions of lncRNAs remain uncharacterized. Conclusion: These results provide a basis for future studies of breast cancer metastasis and drug resistance.

  9. A comparison of single molecule and amplification based sequencing of cancer transcriptomes.

    Directory of Open Access Journals (Sweden)

    Lee T Sam

    2011-03-01

    Full Text Available The second wave of next generation sequencing technologies, referred to as single-molecule sequencing (SMS, carries the promise of profiling samples directly without employing polymerase chain reaction steps used by amplification-based sequencing (AS methods. To examine the merits of both technologies, we examine mRNA sequencing results from single-molecule and amplification-based sequencing in a set of human cancer cell lines and tissues. We observe a characteristic coverage bias towards high abundance transcripts in amplification-based sequencing. A larger fraction of AS reads cover highly expressed genes, such as those associated with translational processes and housekeeping genes, resulting in relatively lower coverage of genes at low and mid-level abundance. In contrast, the coverage of high abundance transcripts plateaus off using SMS. Consequently, SMS is able to sequence lower- abundance transcripts more thoroughly, including some that are undetected by AS methods; however, these include many more mapping artifacts. A better understanding of the technical and analytical factors introducing platform specific biases in high throughput transcriptome sequencing applications will be critical in cross platform meta-analytic studies.

  10. Deep Sequencing of Porphyromonas gingivalis and comparative transcriptome analysis of a LuxS mutant

    Directory of Open Access Journals (Sweden)

    Takanoi eHirano

    2012-06-01

    Full Text Available Porphyromonas gingivalis is a major etiological agent and chronic and aggressive forms of periodontal disease. The organism is an assacharolytic anaerobe and is a constituent of mixed species biofilms in a variety of microenvironments in the oral cavity. P. gingivalis expresses a range of virulence factors over which it exerts tight control. High-throughput sequencing technologies provide the opportunity to relate functional genomics to basic biology. In this study we report qualitative and quantitative RNA-Seq analysis of the transcriptome of P. gingivalis. We have also applied RNA-Seq to the transcriptome of a ΔluxS mutant of P. gingivalis deficient in AI-2-mediated bacterial communication. The transcriptome analysis confirmed the expression of all predicted ORFs for strain ATCC 33277, including 854 hypothetical proteins, and allowed the identification of hitherto unknown transcriptional units. Twelve noncoding RNAs were identified, including 11 small RNAs and one cobalamine riboswitch. Fifty seven genes were differentially regulated in the LuxS mutant. Addition of exogenous synthetic 4,5-dihydroxy-2,3-pentanedione (DPD, AI-2 precursor to the ΔluxS mutant culture complemented expression of a subset of genes, indicating that LuxS is involved in both AI-2 signaling and non-signaling dependent systems in P. gingivalis. This work provides an important dataset for future study of P. gingivalis pathophysiology and further defines the LuxS regulon in this oral pathogen.

  11. Sequencing and characterization of striped venus transcriptome expand resources for clam fishery genetics.

    Directory of Open Access Journals (Sweden)

    Alessandro Coppe

    Full Text Available BACKGROUND: The striped venus Chamelea gallina clam fishery is among the oldest and the largest in the Mediterranean Sea, particularly in the inshore waters of northern Adriatic Sea. The high fishing pressure has lead to a strong stock abundance decline, enhanced by several irregular mortality events. The nearly complete lack of molecular characterization limits the available genetic resources for C. gallina. We achieved the first transcriptome of this species with the aim of identifying an informative set of expressed genes, potential markers to assess genetic structure of natural populations and molecular resources for pathogenic contamination detection. METHODOLOGY/PRINCIPAL FINDINGS: The 454-pyrosequencing of a normalized cDNA library of a pool C. gallina adult individuals yielded 298,494 raw reads. Different steps of reads assembly and filtering produced 36,422 contigs of high quality, one half of which (18,196 were annotated by similarity. A total of 111 microsatellites and 20,377 putative SNPs were identified. A panel of 13 polymorphic transcript-linked microsatellites was developed and their variability assessed in 12 individuals. Remarkably, a scan to search for contamination sequences of infectious origin indicated the presence of several Vibrionales species reported to be among the most frequent clam pathogen's species. Results reported in this study were included in a dedicated database available at http://compgen.bio.unipd.it/chameleabase. CONCLUSIONS/SIGNIFICANCE: This study represents the first attempt to sequence and de novo annotate the transcriptome of the clam C. gallina. The availability of this transcriptome opens new perspectives in the study of biochemical and physiological role of gene products and their responses to large and small-scale environmental stress in C. gallina, with high throughput experiments such as custom microarray or targeted re-sequencing. Molecular markers, such as the already optimized EST

  12. Deep Sequencing in Microdissected Renal Tubules Identifies Nephron Segment-Specific Transcriptomes.

    Science.gov (United States)

    Lee, Jae Wook; Chou, Chung-Lin; Knepper, Mark A

    2015-11-01

    The function of each renal tubule segment depends on the genes expressed therein. High-throughput methods used for global profiling of gene expression in unique cell types have shown low sensitivity and high false positivity, thereby limiting the usefulness of these methods in transcriptomic research. However, deep sequencing of RNA species (RNA-seq) achieves highly sensitive and quantitative transcriptomic profiling by sequencing RNAs in a massive, parallel manner. Here, we used RNA-seq coupled with classic renal tubule microdissection to comprehensively profile gene expression in each of 14 renal tubule segments from the proximal tubule through the inner medullary collecting duct of rat kidneys. Polyadenylated mRNAs were captured by oligo-dT primers and processed into adapter-ligated cDNA libraries that were sequenced using an Illumina platform. Transcriptomes were identified to a median depth of 8261 genes in microdissected renal tubule samples (105 replicates in total) and glomeruli (5 replicates). Manual microdissection allowed a high degree of sample purity, which was evidenced by the observed distributions of well established cell-specific markers. The main product of this work is an extensive database of gene expression along the nephron provided as a publicly accessible webpage (https://helixweb.nih.gov/ESBL/Database/NephronRNAseq/index.html). The data also provide genome-wide maps of alternative exon usage and polyadenylation sites in the kidney. We illustrate the use of the data by profiling transcription factor expression along the renal tubule and mapping metabolic pathways. Copyright © 2015 by the American Society of Nephrology.

  13. Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus.

    Directory of Open Access Journals (Sweden)

    Lei Pan

    Full Text Available Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination.Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt. Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO, Clusters of Orthologous Groups (KOG and Kyoto Encyclopedia of Genes and Genomes (KEGG, we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit.The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development.

  14. Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation.

    Science.gov (United States)

    Czaban, Adrian; Sharma, Sapna; Byrne, Stephen L; Spannagl, Manuel; Mayer, Klaus F X; Asp, Torben

    2015-03-28

    The Lolium-Festuca complex incorporates species from the Lolium genera and the broad leaf fescues, both belonging to the subfamily Pooideae. This subfamily also includes wheat, barley, oat and rye, making it extremely important to world agriculture. Species within the Lolium-Festuca complex show very diverse phenotypes, and many of them are related to agronomically important traits. Analysis of sequenced transcriptomes of these non-model species may shed light on the molecular mechanisms underlying this phenotypic diversity. We have generated de novo transcriptome assemblies for four species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family clusters for the four species using OrthoMCL and analyzed their inferred phylogenetic relationships. Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex. Grouping of the gene families based on their BLAST identity enabled us to divide ortholog groups into those that are very conserved and those that are more evolutionarily relaxed. The ratio of the non-synonumous to synonymous substitutions enabled us to pinpoint protein sequences evolving in response to positive selection. These proteins may explain some of the differences between the more stress tolerant Festuca, and the less stress tolerant Lolium species. Our data presents a comprehensive transcriptome sequence comparison between species from the Lolium-Festuca complex, with the identification of potential candidate genes underlying some important phenotypical differences within the complex (such as VRN2). The orthologous genes between the species have a very high %id (91,61%) and the majority of gene families were shared for all of them. It is

  15. Transcriptome analysis for Caenorhabditis elegans based on novel expressed sequence tags

    Directory of Open Access Journals (Sweden)

    Moerman Donald G

    2008-07-01

    Full Text Available Abstract Background We have applied a high-throughput pyrosequencing technology for transcriptome profiling of Caenorhabditis elegans in its first larval stage. Using this approach, we have generated a large amount of data for expressed sequence tags, which provides an opportunity for the discovery of putative novel transcripts and alternative splice variants that could be developmentally specific to the first larval stage. This work also demonstrates the successful and efficient application of a next generation sequencing methodology. Results We have generated over 30 million bases of novel expressed sequence tags from first larval stage worms utilizing high-throughput sequencing technology. We have shown that approximately 14% of the newly sequenced expressed sequence tags map completely or partially to genomic regions where there are no annotated genes or splice variants and therefore, imply that these are novel genetic structures. Expressed sequence tags, which map to intergenic (around 1000 and intronic regions (around 580, may represent novel transcribed regions, such as unannotated or unrecognized small protein-coding or non-protein-coding genes or splice variants. Expressed sequence tags, which map across intron-exon boundaries (around 300, indicate possible alternative splice sites, while expressed sequence tags, which map near the ends of known transcripts (around 600, suggest extension of the coding or untranslated regions. We have also discovered that intergenic and intronic expressed sequence tags, which are well conserved across different nematode species, are likely to represent non-coding RNAs. Lastly, we have incorporated available serial analysis of gene expression data generated from first larval stage worms, in order to predict novel transcripts that might be specifically or predominantly expressed in the first larval stage. Conclusion We have demonstrated the use of a high-throughput sequencing methodology to efficiently

  16. Sequencing of first-strand cDNA library reveals full-length transcriptomes.

    Science.gov (United States)

    Agarwal, Saurabh; Macfarlan, Todd S; Sartor, Maureen A; Iwase, Shigeki

    2015-01-21

    Massively parallel strand-specific sequencing of RNA (ssRNA-seq) has emerged as a powerful tool for profiling complex transcriptomes. However, many current methods for ssRNA-seq suffer from the underrepresentation of both the 5' and 3' ends of RNAs, which can be attributed to second-strand cDNA synthesis. The 5' and 3' ends of RNA harbour crucial information for gene regulation; namely, transcription start sites (TSSs) and polyadenylation sites. Here we report a novel ssRNA-seq method that does not involve second-strand cDNA synthesis, as we Directly Ligate sequencing Adaptors to the First-strand cDNA (DLAF). This novel method with fewer enzymatic reactions results in a higher quality of the libraries than the conventional method. Sequencing of DLAF libraries followed by a novel analysis pipeline enables the profiling of both 5' ends and polyadenylation sites at near-base resolution. Therefore, DLAF offers the first genomics tool to obtain the 'full-length' transcriptome with a single library.

  17. Microarrays and high-throughput transcriptomic analysis in species with incomplete availability of genomic sequences.

    Science.gov (United States)

    Pariset, Lorraine; Chillemi, Giovanni; Bongiorni, Silvia; Romano Spica, Vincenzo; Valentini, Alessio

    2009-06-01

    Microarrays produce a measurement of gene expression based on the relative measures of dye intensities that correspond to the amount of target RNA. This technology is fast developing and its application is expanding from Homo sapiens to a wide number of species, where enough information on sequences and annotations exist. Anyway, the number of species for which a dedicated platform exists is not high. The use of heterologous array hybridization, screening for gene expression in one species using an array developed for another one, is still quite frequent, even though cross-species microarray hybridization has raised many arguments. Some methods which are high throughput and do not rely on knowledge of the DNA/RNA sequence exist, namely serial analysis of gene expression (SAGE), Massively Parallel Signature Sequencing (MPSS) and deep sequencing of full transcriptome. Although very powerful, particularly the latter, they are still quite costly and cumbersome methods. In some species where genome sequences are largely unknown, several anonymous sequences are deposited in gene banks as a result of Expressed Sequence Tags (ESTs) sequencing projects. The ESTs databases represent a valuable knowledge that can be exploited with some bioinformatic effort to build species-specific microarrays. We present here a method of high-density in situ synthesized microarrays starting from available EST sequences in, Ovis aries. Our data indicate that the method is very efficient and can be easily extended to other species of which genetic sequences are present in public databases, but neglected so far with advanced devices like microarrays. As a perspective, the approach can be applied also to species of which no sequences are available to date, thanks to high-throughput deep sequencing methods.

  18. Comparative performance of transcriptome assembly methods for non-model organisms.

    Science.gov (United States)

    Huang, Xin; Chen, Xiao-Guang; Armbruster, Peter A

    2016-07-27

    The technological revolution in next-generation sequencing has brought unprecedented opportunities to study any organism of interest at the genomic or transcriptomic level. Transcriptome assembly is a crucial first step for studying the molecular basis of phenotypes of interest using RNA-Sequencing (RNA-Seq). However, the optimal strategy for assembling vast amounts of short RNA-Seq reads remains unresolved, especially for organisms without a sequenced genome. This study compared four transcriptome assembly methods, including a widely used de novo assembler (Trinity), two transcriptome re-assembly strategies utilizing proteomic and genomic resources from closely related species (reference-based re-assembly and TransPS) and a genome-guided assembler (Cufflinks). These four assembly strategies were compared using a comprehensive transcriptomic database of Aedes albopictus, for which a genome sequence has recently been completed. The quality of the various assemblies was assessed by the number of contigs generated, contig length distribution, percent paired-end read mapping, and gene model representation via BLASTX. Our results reveal that de novo assembly generates a similar number of gene models relative to genome-guided assembly with a fragmented reference, but produces the highest level of redundancy and requires the most computational power. Using a closely related reference genome to guide transcriptome assembly can generate biased contig sequences. Increasing the number of reads used in the transcriptome assembly tends to increase the redundancy within the assembly and decrease both median contig length and percent identity between contigs and reference protein sequences. This study provides general guidance for transcriptome assembly of RNA-Seq data from organisms with or without a sequenced genome. The optimal transcriptome assembly strategy will depend upon the subsequent downstream analyses. However, our results emphasize the efficacy of de novo assembly

  19. Prevalence of single nucleotide polymorphism among 27 diverse alfalfa genotypes as assessed by transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Li Xuehui

    2012-10-01

    Full Text Available Abstract Background Alfalfa, a perennial, outcrossing species, is a widely planted forage legume producing highly nutritious biomass. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker assisted breeding strategies can enhance alfalfa improvement efforts, particularly if many genome-wide markers are available. Transcriptome sequencing enables efficient high-throughput discovery of single nucleotide polymorphism (SNP markers for a complex polyploid species. Result The transcriptomes of 27 alfalfa genotypes, including elite breeding genotypes, parents of mapping populations, and unimproved wild genotypes, were sequenced using an Illumina Genome Analyzer IIx. De novo assembly of quality-filtered 72-bp reads generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, with an average read depth of 55.9-fold for each genotype. Overall, 21,954 (87.2% of the 25,183 contigs represented 14,878 unique protein accessions. Gene ontology (GO analysis suggested that a broad diversity of genes was represented in the resulting sequences. The realignment of individual reads to the contigs enabled the detection of 872,384 SNPs and 31,760 InDels. High resolution melting (HRM analysis was used to validate 91% of 192 putative SNPs identified by sequencing. Both allelic variants at about 95% of SNP sites identified among five wild, unimproved genotypes are still present in cultivated alfalfa, and all four US breeding programs also contain a high proportion of these SNPs. Thus, little evidence exists among this dataset for loss of significant DNA sequence diversity from either domestication or breeding of alfalfa. Structure analysis indicated that individuals from the subspecies falcata, the diploid subspecies caerulea, and the tetraploid subspecies sativa (cultivated tetraploid alfalfa were clearly separated. Conclusion We used transcriptome sequencing to discover large numbers of SNPs

  20. Development of microsatellite markers by transcriptome sequencing in two species of Amorphophallus (Araceae).

    Science.gov (United States)

    Zheng, Xingfei; Pan, Cheng; Diao, Ying; You, Yongning; Yang, Chaozhu; Hu, Zhongli

    2013-07-19

    Amorphophallus is a genus of perennial plants widely distributed in the tropics or subtropics of West Africa and South Asia. Its corms contain a high level of water-soluble glucomannan; therefore, it has long been used as a medicinal herb and food source. Genetic studies of Amorphophallus have been hindered by a lack of genetic markers. A large number of molecular markers are required for genetic diversity study and improving disease resistance in Amorphophallus. Here, we report large scale of transcriptome sequencing of two species: Amorphophallus konjac and Amorphophallus bulbifer using deep sequencing technology, and microsatellite (SSR) markers were identified based on these transcriptome sequences. cDNAs of A. konjac and A. bulbifer were sequenced using Illumina HiSeq™ 2000 sequencing technology. A total of 135,822 non-redundant unigenes were assembled from about 9.66 gigabases, and 19,596 SSRs were identified in 16,027 non-redundant unigenes. Di-nucleotide SSRs were the most abundant motif (61.6%), followed by tri- (30.3%), tetra- (5.6%), penta- (1.5%), and hexa-nucleotides (1%) repeats. The top di- and tri-nucleotide repeat motifs included AG/CT (45.2%) and AGG/CCT (7.1%), respectively. A total of 10,754 primer pairs were designed for marker development. Of these, 320 primers were synthesized and used for validation of amplification and assessment of polymorphisms in 25 individual plants. The total of 275 primer pairs yielded PCR amplification products, of which 205 were polymorphic. The number of alleles ranged from 2 to 14 and the polymorphism information content valued ranged from 0.10 to 0.90. Genetic diversity analysis was done using 177 highly polymorphic SSR markers. A phenogram based on Jaccard's similarity coefficients was constructed, which showed a distinct cluster of 25 Amorphophallus individuals. A total of 10,754 SSR markers have been identified in Amorphophallus using transcriptome sequencing. One hundred and seventy-seven polymorphic

  1. Efficient extraction of small and large RNAs in bacteria for excellent total RNA sequencing and comprehensive transcriptome analysis.

    Science.gov (United States)

    Heera, Rajandas; Sivachandran, Parimannan; Chinni, Suresh V; Mason, Joanne; Croft, Larry; Ravichandran, Manickam; Yin, Lee Su

    2015-12-08

    Next-generation transcriptome sequencing (RNA-Seq) has become the standard practice for studying gene splicing, mutations and changes in gene expression to obtain valuable, accurate biological conclusions. However, obtaining good sequencing coverage and depth to study these is impeded by the difficulties of obtaining high quality total RNA with minimal genomic DNA contamination. With this in mind, we evaluated the performance of Phenol-free total RNA purification kit (Amresco) in comparison with TRI Reagent (MRC) and RNeasy Mini (Qiagen) for the extraction of total RNA of Pseudomonas aeruginosa which was grown in glucose-supplemented (control) and polyethylene-supplemented (growth-limiting condition) minimal medium. All three extraction methods were coupled with an in-house DNase I treatment before the yield, integrity and size distribution of the purified RNA were assessed. RNA samples extracted with the best extraction kit were then sequenced using the Illumina HiSeq 2000 platform. TRI Reagent gave the lowest yield enriched with small RNAs (sRNAs), while RNeasy gave moderate yield of good quality RNA with trace amounts of sRNAs. The Phenol-free kit, on the other hand, gave the highest yield and the best quality RNA (RIN value of 9.85 ± 0.3) with good amounts of sRNAs. Subsequent bioinformatic analysis of the sequencing data revealed that 5435 coding genes, 452 sRNAs and 7 potential novel intergenic sRNAs were detected, indicating excellent sequencing coverage across RNA size ranges. In addition, detection of low abundance transcripts and consistency of their expression profiles across replicates from the same conditions demonstrated the reproducibility of the RNA extraction technique. Amresco's Phenol-free Total RNA purification kit coupled with DNase I treatment yielded the highest quality RNAs containing good ratios of high and low molecular weight transcripts with minimal genomic DNA. These RNA extracts gave excellent non-biased sequencing coverage useful

  2. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  3. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  4. Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon

    Directory of Open Access Journals (Sweden)

    Emrich Scott J

    2010-05-01

    Full Text Available Abstract Background Several recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task. Results Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae. The Expressed Sequence Tags (ESTs generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments. Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average and highly covered (> 9.6× on average contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons to the length of the best match silkworm (Bombyx mori protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5. Conclusions Ultimately, these assemblies and SNP data will be used to generate microarrays for

  5. Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon

    Science.gov (United States)

    2010-01-01

    Background Several recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task. Results Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae) and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments. Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6× on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (Bombyx mori) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5. Conclusions Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining

  6. Analyzing AbrB-Knockout Effects through Genome and Transcriptome Sequencing of Bacillus licheniformis DW2

    Science.gov (United States)

    Shu, Cheng-Cheng; Wang, Dong; Guo, Jing; Song, Jia-Ming; Chen, Shou-Wen; Chen, Ling-Ling; Gao, Jun-Xiang

    2018-01-01

    As an industrial bacterium, Bacillus licheniformis DW2 produces bacitracin which is an important antibiotic for many pathogenic microorganisms. Our previous study showed AbrB-knockout could significantly increase the production of bacitracin. Accordingly, it was meaningful to understand its genome features, expression differences between wild and AbrB-knockout (ΔAbrB) strains, and the regulation of bacitracin biosynthesis. Here, we sequenced, de novo assembled and annotated its genome, and also sequenced the transcriptomes in three growth phases. The genome of DW2 contained a DNA molecule of 4,468,952 bp with 45.93% GC content and 4,717 protein coding genes. The transcriptome reads were mapped to the assembled genome, and obtained 4,102∼4,536 expressed genes from different samples. We investigated transcription changes in B. licheniformis DW2 and showed that ΔAbrB caused hundreds of genes up-regulation and down-regulation in different growth phases. We identified a complete bacitracin synthetase gene cluster, including the location and length of bacABC, bcrABC, and bacT, as well as their arrangement. The gene cluster bcrABC were significantly up-regulated in ΔAbrB strain, which supported the hypothesis in previous study of bcrABC transporting bacitracin out of the cell to avoid self-intoxication, and was consistent with the previous experimental result that ΔAbrB could yield more bacitracin. This study provided a high quality reference genome for B. licheniformis DW2, and the transcriptome data depicted global alterations across two strains and three phases offered an understanding of AbrB regulation and bacitracin biosynthesis through gene expression. PMID:29599755

  7. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars.

    Science.gov (United States)

    Kim, Jungeun; Park, June Hyun; Lim, Chan Ju; Lim, Jae Yun; Ryu, Jee-Youn; Lee, Bong-Woo; Choi, Jae-Pil; Kim, Woong Bom; Lee, Ha Yeon; Choi, Yourim; Kim, Donghyun; Hur, Cheol-Goo; Kim, Sukweon; Noh, Yoo-Sun; Shin, Chanseok; Kwon, Suk-Yoon

    2012-11-21

    Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants--making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: 'Vital', 'Maroussia', and 'Sympathy' and Rosa rugosa Thunb., respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand

  8. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    Science.gov (United States)

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  9. A Joint Bayesian Model for Integrating Microarray and RNA Sequencing Transcriptomic Data.

    Science.gov (United States)

    Ma, Tianzhou; Liang, Faming; Oesterreich, Steffi; Tseng, George C

    2017-07-01

    As the sequencing cost continued to drop in the past decade, RNA sequencing (RNA-seq) has replaced microarray to become the standard high-throughput experimental tool to analyze transcriptomic profile. As more and more datasets are generated and accumulated in the public domain, meta-analysis to combine multiple transcriptomic studies to increase statistical power has received increasing popularity. In this article, we propose a Bayesian hierarchical model to jointly integrate microarray and RNA-seq studies. Since systematic fold change differences across RNA-seq and microarray for detecting differentially expressed genes have been previously reported, we replicated this finding in several real datasets and showed that incorporation of a normalization procedure to account for the bias improves the detection accuracy and power. We compared our method with the popular two-stage Fisher's method using simulations and two real applications in a histological subtype (invasive lobular carcinoma) of breast cancer comparing PR+ versus PR- and early-stage versus late-stage patients. The result showed improved detection power and more significant and interpretable pathways enriched in the detected biomarkers from the proposed Bayesian model.

  10. Sequencing and analysis of the gastrula transcriptome of the brittle star Ophiocoma wendtii

    Directory of Open Access Journals (Sweden)

    Vaughn Roy

    2012-09-01

    Full Text Available Abstract Background The gastrula stage represents the point in development at which the three primary germ layers diverge. At this point the gene regulatory networks that specify the germ layers are established and the genes that define the differentiated states of the tissues have begun to be activated. These networks have been well-characterized in sea urchins, but not in other echinoderms. Embryos of the brittle star Ophiocoma wendtii share a number of developmental features with sea urchin embryos, including the ingression of mesenchyme cells that give rise to an embryonic skeleton. Notable differences are that no micromeres are formed during cleavage divisions and no pigment cells are formed during development to the pluteus larval stage. More subtle changes in timing of developmental events also occur. To explore the molecular basis for the similarities and differences between these two echinoderms, we have sequenced and characterized the gastrula transcriptome of O. wendtii. Methods Development of Ophiocoma wendtii embryos was characterized and RNA was isolated from the gastrula stage. A transcriptome data base was generated from this RNA and was analyzed using a variety of methods to identify transcripts expressed and to compare those transcripts to those expressed at the gastrula stage in other organisms. Results Using existing databases, we identified brittle star transcripts that correspond to 3,385 genes, including 1,863 genes shared with the sea urchin Strongylocentrotus purpuratus gastrula transcriptome. We characterized the functional classes of genes present in the transcriptome and compared them to those found in this sea urchin. We then examined those members of the germ-layer specific gene regulatory networks (GRNs of S. purpuratus that are expressed in the O. wendtii gastrula. Our results indicate that there is a shared ‘genetic toolkit’ central to the echinoderm gastrula, a key stage in embryonic development, though

  11. Detection bias in microarray and sequencing transcriptomic analysis identified by housekeeping genes.

    Science.gov (United States)

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-03-01

    This work includes the original data used to discover the gene ontology bias in transcriptomic analysis conducted by microarray and high throughput sequencing (Zhang et al., 2015) [1]. In the analysis, housekeeping genes were used to examine the differential detection ability by microarray and sequencing because these genes are probably the most reliably detected. The genes included here were compiled from 15 human housekeeping gene studies. The provided tables here comprise of detailed chromosomal location, detection breadth, normalized expression level, exon count, total exon length, and total intron length of each concerned gene and their related transcripts. We hope this information can help researchers better understand the differences in gene ontology-bias we discussed (Zhang et al., 2015) [1] and can encourage further improvement on these two technology platforms.

  12. Using the transcriptome to annotate the genome revisited: application of massively parallel signature sequencing (MPSS).

    Science.gov (United States)

    Shah, Trushar; de Villiers, Etienne; Nene, Vishvanath; Hass, Brian; Taracha, Evans; Gardner, Malcolm J; Sansom, Clare; Pelle, Roger; Bishop, Richard

    2006-01-17

    Transcriptome analysis can provide useful data for refining genome sequence annotation. Application of massively parallel signature sequencing (MPSS) revealed reproducible transcription, in multiple MPSS cycles, from 73% of computationally predicted genes in the Theileria parva schizont lifecycle stage. Signatures spanning consecutive exons confirmed 142 predicted introns. MPSS identified 83 putative genes, >100 codons overlooked by annotation software, and 139 potentially incorrect gene models (with either truncated ORFs or overlooked exons) by interfacing signature locations with stop codon maps. Twenty representative models were confirmed as likely to be incorrect using reverse transcription PCR amplification from independent schizont cDNA preparations. More than 50% of the 60 putative single copy genes in T. parva that were absent from the genome of the closely related T. annulata had MPSS signatures. This study illustrates the utility of MPSS for improving annotation of small, gene-rich microbial eukaryotic genomes.

  13. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    Directory of Open Access Journals (Sweden)

    Guoyan Liu

    Full Text Available Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods.We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish.We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information

  14. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    Science.gov (United States)

    Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying

  15. Gene discovery in the threatened elkhorn coral: 454 sequencing of the Acropora palmata transcriptome.

    Directory of Open Access Journals (Sweden)

    Nicholas R Polato

    Full Text Available BACKGROUND: Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. RESULTS: A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000. The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. CONCLUSIONS: Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite

  16. Gene discovery in the threatened elkhorn coral: 454 sequencing of the Acropora palmata transcriptome.

    Science.gov (United States)

    Polato, Nicholas R; Vera, J Cristobal; Baums, Iliana B

    2011-01-01

    Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000). The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite considerable exposure to genotoxic stress over long life

  17. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    Energy Technology Data Exchange (ETDEWEB)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in

  18. Gene expression profiling of oxidative stress response of C. elegans aging defective AMPK mutants using massively parallel transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Baillie David L

    2011-02-01

    Full Text Available Abstract Background A strong association between stress resistance and longevity in multicellular organisms has been established as many mutations that extend lifespan also show increased resistance to stress. AAK-2, the C. elegans homolog of an alpha subunit of AMP-activated protein kinase (AMPK is an intracellular fuel sensor that regulates cellular energy homeostasis and functions in stress resistance and lifespan extension. Findings Here, we investigated global transcriptional responses of aak-2 mutants to oxidative stress and in turn identified potential downstream targets of AAK-2 involved in stress resistance in C. elegans. We employed massively parallel Illumina sequencing technology and performed comprehensive comparative transcriptome analysis. Specifically, we compared the transcriptomes of aak-2 and wild type animals under normal conditions and conditions of induced oxidative stress. This research has presented a snapshot of genome-wide transcriptional activities that take place in C. elegans in response to oxidative stress both in the presence and absence of AAK-2. Conclusions The analysis presented in this study has enabled us to identify potential genes involved in stress resistance that may be either directly or indirectly under the control of AAK-2. Furthermore, we have extended our current knowledge of general defense responses of C. elegans against oxidative stress supporting the function for AAK-2 in inhibition of biosynthetic processes, especially lipid synthesis, under oxidative stress and transcriptional regulation of genes involved in reproductive processes.

  19. Global Transcriptome Sequencing Reveals Molecular Profiles of Summer Diapause Induction Stage of Onion Maggot, Delia antiqua (Diptera: Anthomyiidae

    Directory of Open Access Journals (Sweden)

    Shuang Ren

    2018-01-01

    Full Text Available The onion maggot, Delia antiqua, is a worldwide subterranean pest and can enter diapause during the summer and winter seasons. The molecular regulation of the ontogenesis transition remains largely unknown. Here we used high-throughput RNA sequencing to identify candidate genes and processes linked to summer diapause (SD induction by comparing the transcriptome differences between the most sensitive larval developmental stage of SD and nondiapause (ND. Nine pairwise comparisons were performed, and significantly differentially regulated transcripts were identified. Several functional terms related to lipid, carbohydrate, and energy metabolism, environmental adaption, immune response, and aging were enriched during the most sensitive SD induction period. A subset of genes, including circadian clock genes, were expressed differentially under diapause induction conditions, and there was much more variation in the most sensitive period of ND- than SD-destined larvae. These expression variations probably resulted in a deep restructuring of metabolic pathways. Potential regulatory elements of SD induction including genes related to lipid, carbohydrate, energy metabolism, and environmental adaption. Collectively, our results suggest the circadian clock is one of the key drivers for integrating environmental signals into the SD induction. Our transcriptome analysis provides insight into the fundamental role of the circadian clock in SD induction in this important model insect species, and contributes to the in-depth elucidation of the molecular regulation mechanism of insect diapause induction.

  20. RNA Sequencing Analysis Reveals Transcriptomic Variations in Tobacco (Nicotiana tabacum Leaves Affected by Climate, Soil, and Tillage Factors

    Directory of Open Access Journals (Sweden)

    Bo Lei

    2014-04-01

    Full Text Available The growth and development of plants are sensitive to their surroundings. Although numerous studies have analyzed plant transcriptomic variation, few have quantified the effect of combinations of factors or identified factor-specific effects. In this study, we performed RNA sequencing (RNA-seq analysis on tobacco leaves derived from 10 treatment combinations of three groups of ecological factors, i.e., climate factors (CFs, soil factors (SFs, and tillage factors (TFs. We detected 4980, 2916, and 1605 differentially expressed genes (DEGs that were affected by CFs, SFs, and TFs, which included 2703, 768, and 507 specific and 703 common DEGs (simultaneously regulated by CFs, SFs, and TFs, respectively. GO and KEGG enrichment analyses showed that genes involved in abiotic stress responses and secondary metabolic pathways were overrepresented in the common and CF-specific DEGs. In addition, we noted enrichment in CF-specific DEGs related to the circadian rhythm, SF-specific DEGs involved in mineral nutrient absorption and transport, and SF- and TF-specific DEGs associated with photosynthesis. Based on these results, we propose a model that explains how plants adapt to various ecological factors at the transcriptomic level. Additionally, the identified DEGs lay the foundation for future investigations of stress resistance, circadian rhythm and photosynthesis in tobacco.

  1. Fast skeletal muscle transcriptome of the Gilthead sea bream (Sparus aurata determined by next generation sequencing

    Directory of Open Access Journals (Sweden)

    Garcia de la serrana Daniel

    2012-05-01

    Full Text Available Abstract Background The gilthead sea bream (Sparus aurata L. occurs around the Mediterranean and along Eastern Atlantic coasts from Great Britain to Senegal. It is tolerant of a wide range of temperatures and salinities and is often found in brackish coastal lagoons and estuarine areas, particularly early in its life cycle. Gilthead sea bream are extensively cultivated in the Mediterranean with an annual production of 125,000 metric tonnes. Here we present a de novo assembly of the fast skeletal muscle transcriptome of gilthead sea bream using 454 reads and identify gene paralogues, splice variants and microsatellite repeats. An annotated transcriptome of the skeletal muscle will facilitate understanding of the genetic and molecular basis of traits linked to production in this economically important species. Results Around 2.7 million reads of mRNA sequence data were generated from the fast myotomal of adult fish (~2 kg and juvenile fish (~0.09 kg that had been either fed to satiation, fasted for 3-5d or transferred to low (11°C or high (33°C temperatures for 3-5d. Newbler v2.5 assembly resulted in 43,461 isotigs >100 bp. The number of sequences annotated by searching protein and gene ontology databases was 10,465. The average coverage of the annotated isotigs was x40 containing 5655 unique gene IDs and 785 full-length cDNAs coding for proteins containing 58–1536 amino acids. The v2.5 assembly was found to be of good quality based on validation using 200 full-length cDNAs from GenBank. Annotated isotigs from the reference transcriptome were attributable to 344 KEGG pathway maps. We identified 26 gene paralogues (20 of them teleost-specific and 43 splice variants, of which 12 had functional domains missing that were likely to affect their biological function. Many key transcription factors, signaling molecules and structural proteins necessary for myogenesis and muscle growth have been identified. Physiological status affected the

  2. A complete sequence and transcriptomic analyses of date palm (Phoenix dactylifera L.) mitochondrial genome.

    Science.gov (United States)

    Fang, Yongjun; Wu, Hao; Zhang, Tongwu; Yang, Meng; Yin, Yuxin; Pan, Linlin; Yu, Xiaoguang; Zhang, Xiaowei; Hu, Songnian; Al-Mssallem, Ibrahim S; Yu, Jun

    2012-01-01

    Based on next-generation sequencing data, we assembled the mitochondrial (mt) genome of date palm (Phoenix dactylifera L.) into a circular molecule of 715,001 bp in length. The mt genome of P. dactylifera encodes 38 proteins, 30 tRNAs, and 3 ribosomal RNAs, which constitute a gene content of 6.5% (46,770 bp) over the full length. The rest, 93.5% of the genome sequence, is comprised of cp (chloroplast)-derived (10.3% with respect to the whole genome length) and non-coding sequences. In the non-coding regions, there are 0.33% tandem and 2.3% long repeats. Our transcriptomic data from eight tissues (root, seed, bud, fruit, green leaf, yellow leaf, female flower, and male flower) showed higher gene expression levels in male flower, root, bud, and female flower, as compared to four other tissues. We identified 120 potential SNPs among three date palm cultivars (Khalas, Fahal, and Sukry), and successfully found seven SNPs in the coding sequences. A phylogenetic analysis, based on 22 conserved genes of 15 representative plant mitochondria, showed that P. dactylifera positions at the root of all sequenced monocot mt genomes. In addition, consistent with previous discoveries, there are three co-transcribed gene clusters-18S-5S rRNA, rps3-rpl16 and nad3-rps12-in P. dactylifera, which are highly conserved among all known mitochondrial genomes of angiosperms.

  3. A complete sequence and transcriptomic analyses of date palm (Phoenix dactylifera L. mitochondrial genome.

    Directory of Open Access Journals (Sweden)

    Yongjun Fang

    Full Text Available Based on next-generation sequencing data, we assembled the mitochondrial (mt genome of date palm (Phoenix dactylifera L. into a circular molecule of 715,001 bp in length. The mt genome of P. dactylifera encodes 38 proteins, 30 tRNAs, and 3 ribosomal RNAs, which constitute a gene content of 6.5% (46,770 bp over the full length. The rest, 93.5% of the genome sequence, is comprised of cp (chloroplast-derived (10.3% with respect to the whole genome length and non-coding sequences. In the non-coding regions, there are 0.33% tandem and 2.3% long repeats. Our transcriptomic data from eight tissues (root, seed, bud, fruit, green leaf, yellow leaf, female flower, and male flower showed higher gene expression levels in male flower, root, bud, and female flower, as compared to four other tissues. We identified 120 potential SNPs among three date palm cultivars (Khalas, Fahal, and Sukry, and successfully found seven SNPs in the coding sequences. A phylogenetic analysis, based on 22 conserved genes of 15 representative plant mitochondria, showed that P. dactylifera positions at the root of all sequenced monocot mt genomes. In addition, consistent with previous discoveries, there are three co-transcribed gene clusters-18S-5S rRNA, rps3-rpl16 and nad3-rps12-in P. dactylifera, which are highly conserved among all known mitochondrial genomes of angiosperms.

  4. Comprehensive transcriptome assembly of Chickpea (Cicer arietinum L.) using sanger and next generation sequencing platforms: development and applications.

    Science.gov (United States)

    Kudapa, Himabindu; Azam, Sarwar; Sharpe, Andrew G; Taran, Bunyamin; Li, Rong; Deonovic, Benjamin; Cameron, Connor; Farmer, Andrew D; Cannon, Steven B; Varshney, Rajeev K

    2014-01-01

    A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs) from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinumTranscriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201), comprising 46,369 transcript assembly contigs (TACs) has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8%) of the TACs and gene ontology assignments were determined for 21,471 (46.3%). The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs) and intron spanning regions (ISRs) for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC) of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding applications in

  5. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data.

    Directory of Open Access Journals (Sweden)

    Daniel Ramsköld

    2009-12-01

    Full Text Available The parts of the genome transcribed by a cell or tissue reflect the biological processes and functions it carries out. We characterized the features of mammalian tissue transcriptomes at the gene level through analysis of RNA deep sequencing (RNA-Seq data across human and mouse tissues and cell lines. We observed that roughly 8,000 protein-coding genes were ubiquitously expressed, contributing to around 75% of all mRNAs by message copy number in most tissues. These mRNAs encoded proteins that were often intracellular, and tended to be involved in metabolism, transcription, RNA processing or translation. In contrast, genes for secreted or plasma membrane proteins were generally expressed in only a subset of tissues. The distribution of expression levels was broad but fairly continuous: no support was found for the concept of distinct expression classes of genes. Expression estimates that included reads mapping to coding exons only correlated better with qRT-PCR data than estimates which also included 3' untranslated regions (UTRs. Muscle and liver had the least complex transcriptomes, in that they expressed predominantly ubiquitous genes and a large fraction of the transcripts came from a few highly expressed genes, whereas brain, kidney and testis expressed more complex transcriptomes with the vast majority of genes expressed and relatively small contributions from the most expressed genes. mRNAs expressed in brain had unusually long 3'UTRs, and mean 3'UTR length was higher for genes involved in development, morphogenesis and signal transduction, suggesting added complexity of UTR-based regulation for these genes. Our results support a model in which variable exterior components feed into a large, densely connected core composed of ubiquitously expressed intracellular proteins.

  6. Cardiac transcriptome profiling of diabetic Akita mice using microarray and next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Varun Kesherwani

    Full Text Available Although diabetes mellitus (DM causes cardiomyopathy and exacerbates heart failure, the underlying molecular mechanisms for diabetic cardiomyopathy/heart failure are poorly understood. Insulin2 mutant (Ins2+/- Akita is a mouse model of T1DM, which manifests cardiac dysfunction. However, molecular changes at cardiac transcriptome level that lead to cardiomyopathy remain unclear. To understand the molecular changes in the heart of diabetic Akita mice, we profiled cardiac transcriptome of Ins2+/- Akita and Ins2+/+ control mice using next generation sequencing (NGS and microarray, and determined the implications of differentially expressed genes on various heart failure signaling pathways using Ingenuity pathway (IPA analysis. First, we validated hyperglycemia, increased cardiac fibrosis, and cardiac dysfunction in twelve-week male diabetic Akita. Then, we analyzed the transcriptome levels in the heart. NGS analyses on Akita heart revealed 137 differentially expressed transcripts, where Bone Morphogenic Protein-10 (BMP10 was the most upregulated and hairy and enhancer of split-related (HELT was the most downregulated gene. Moreover, twelve long non-coding RNAs (lncRNAs were upregulated. The microarray analyses on Akita heart showed 351 differentially expressed transcripts, where vomeronasal-1 receptor-180 (Vmn1r180 was the most upregulated and WD Repeat Domain 83 Opposite Strand (WDR83OS was the most downregulated gene. Further, miR-101c and H19 lncRNA were upregulated but Neat1 lncRNA was downregulated in Akita heart. Eleven common genes were upregulated in Akita heart in both NGS and microarray analyses. IPA analyses revealed the role of these differentially expressed genes in key signaling pathways involved in diabetic cardiomyopathy. Our results provide a platform to initiate focused future studies by targeting these genes and/or non-coding RNAs, which are differentially expressed in Akita hearts and are involved in diabetic cardiomyopathy.

  7. Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

    Directory of Open Access Journals (Sweden)

    Xiaobai Li

    Full Text Available Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs, 41,690 into 58 gene ontology (GO terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium.

  8. Transcriptome analysis of the model protozoan, Tetrahymena thermophila, using Deep RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Jie Xiong

    Full Text Available BACKGROUND: The ciliated protozoan Tetrahymena thermophila is a well-studied single-celled eukaryote model organism for cellular and molecular biology. However, the lack of extensive T. thermophila cDNA libraries or a large expressed sequence tag (EST database limited the quality of the original genome annotation. METHODOLOGY/PRINCIPAL FINDINGS: This RNA-seq study describes the first deep sequencing analysis of the T. thermophila transcriptome during the three major stages of the life cycle: growth, starvation and conjugation. Uniquely mapped reads covered more than 96% of the 24,725 predicted gene models in the somatic genome. More than 1,000 new transcribed regions were identified. The great dynamic range of RNA-seq allowed detection of a nearly six order-of-magnitude range of measurable gene expression orchestrated by this cell. RNA-seq also allowed the first prediction of transcript untranslated regions (UTRs and an updated (larger size estimate of the T. thermophila transcriptome: 57 Mb, or about 55% of the somatic genome. Our study identified nearly 1,500 alternative splicing (AS events distributed over 5.2% of T. thermophila genes. This percentage represents a two order-of-magnitude increase over previous EST-based estimates in Tetrahymena. Evidence of stage-specific regulation of alternative splicing was also obtained. Finally, our study allowed us to completely confirm about 26.8% of the genes originally predicted by the gene finder, to correct coding sequence boundaries and intron-exon junctions for about a third, and to reassign microarray probes and correct earlier microarray data. CONCLUSIONS/SIGNIFICANCE: RNA-seq data significantly improve the genome annotation and provide a fully comprehensive view of the global transcriptome of T. thermophila. To our knowledge, 5.2% of T. thermophila genes with AS is the highest percentage of genes showing AS reported in a unicellular eukaryote. Tetrahymena thus becomes an excellent unicellular

  9. Complete Genome Sequence of Sporisorium scitamineum and Biotrophic Interaction Transcriptome with Sugarcane.

    Directory of Open Access Journals (Sweden)

    Lucas M Taniguti

    Full Text Available Sporisorium scitamineum is a biotrophic fungus responsible for the sugarcane smut, a worldwide spread disease. This study provides the complete sequence of individual chromosomes of S. scitamineum from telomere to telomere achieved by a combination of PacBio long reads and Illumina short reads sequence data, as well as a draft sequence of a second fungal strain. Comparative analysis to previous available sequences of another strain detected few polymorphisms among the three genomes. The novel complete sequence described herein allowed us to identify and annotate extended subtelomeric regions, repetitive elements and the mitochondrial DNA sequence. The genome comprises 19,979,571 bases, 6,677 genes encoding proteins, 111 tRNAs and 3 assembled copies of rDNA, out of our estimated number of copies as 130. Chromosomal reorganizations were detected when comparing to sequences of S. reilianum, the closest smut relative, potentially influenced by repeats of transposable elements. Repetitive elements may have also directed the linkage of the two mating-type loci. The fungal transcriptome profiling from in vitro and from interaction with sugarcane at two time points (early infection and whip emergence revealed that 13.5% of the genes were differentially expressed in planta and particular to each developmental stage. Among them are plant cell wall degrading enzymes, proteases, lipases, chitin modification and lignin degradation enzymes, sugar transporters and transcriptional factors. The fungus also modulates transcription of genes related to surviving against reactive oxygen species and other toxic metabolites produced by the plant. Previously described effectors in smut/plant interactions were detected but some new candidates are proposed. Ten genomic islands harboring some of the candidate genes unique to S. scitamineum were expressed only in planta. RNAseq data was also used to reassure gene predictions.

  10. Characterizing the Genetic Basis for Nicotine Induced Cancer Development: A Transcriptome Sequencing Study.

    Directory of Open Access Journals (Sweden)

    Jasmin H Bavarva

    Full Text Available Nicotine is a known risk factor for cancer development and has been shown to alter gene expression in cells and tissue upon exposure. We used Illumina® Next Generation Sequencing (NGS technology to gain unbiased biological insight into the transcriptome of normal epithelial cells (MCF-10A to nicotine exposure. We generated expression data from 54,699 transcripts using triplicates of control and nicotine stressed cells. As a result, we identified 138 differentially expressed transcripts, including 39 uncharacterized genes. Additionally, 173 transcripts that are primarily associated with DNA replication, recombination, and repair showed evidence for alternative splicing. We discovered the greatest nicotine stress response by HPCAL4 (up-regulated by 4.71 fold and NPAS3 (down-regulated by -2.73 fold; both are genes that have not been previously implicated in nicotine exposure but are linked to cancer. We also discovered significant down-regulation (-2.3 fold and alternative splicing of NEAT1 (lncRNA that may have an important, yet undiscovered regulatory role. Gene ontology analysis revealed nicotine exposure influenced genes involved in cellular and metabolic processes. This study reveals previously unknown consequences of nicotine stress on the transcriptome of normal breast epithelial cells and provides insight into the underlying biological influence of nicotine on normal cells, marking the foundation for future studies.

  11. RNA sequencing reveals pronounced changes in the noncoding transcriptome of aging synaptosomes.

    Science.gov (United States)

    Chen, Bei Jun; Ueberham, Uwe; Mills, James D; Kirazov, Ludmil; Kirazov, Evgeni; Knobloch, Mara; Bochmann, Jana; Jendrek, Renate; Takenaka, Konii; Bliim, Nicola; Arendt, Thomas; Janitz, Michael

    2017-08-01

    Normal aging is associated with impairments in cognitive functions. These alterations are caused by diminutive changes in the biology of synapses, and ineffective neurotransmission, rather than loss of neurons. Hitherto, only a few studies, exploring molecular mechanisms of healthy brain aging in higher vertebrates, utilized synaptosomal fractions to survey local changes in aging-related transcriptome dynamics. Here we present, for the first time, a comparative analysis of the synaptosomes transcriptome in the aging mouse brain using RNA sequencing. Our results show changes in the expression of genes contributing to biological pathways related to neurite guidance, synaptosomal physiology, and RNA splicing. More intriguingly, we also discovered alterations in the expression of thousands of novel, unannotated lincRNAs during aging. Further, detailed characterization of the cleavage and polyadenylation factor I subunit 1 (Clp1) mRNA and protein expression indicates its increased expression in neuronal processes of hippocampal stratum radiatum in aging mice. Together, our study uncovers a new layer of transcriptional regulation which is targeted by aging within the local environment of interconnecting neuronal cells. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Transcriptome analysis of the Chinese giant salamander (Andrias davidianus using RNA-sequencing

    Directory of Open Access Journals (Sweden)

    Yong Huang

    2017-12-01

    Full Text Available The Chinese giant salamander (Andrias davidianus is an economically important animal on academic value. However, the genomic information of this species has been less studied. In our study, the transcripts of A. davidianus were obtained by RNA-seq to conduct a transcriptomic analysis. In total 132,912 unigenes were generated with an average length of 690 bp and N50 of 1263 bp by de novo assembly using Trinity software. Using a sequence similarity search against the nine public databases (CDD, KOG, NR, NT, PFAM, Swiss-prot, TrEMBL, GO and KEGG databases, a total of 24,049, 18,406, 36,711, 15,858, 20,500, 27,515, 36,705, 28,879 and 10,958 unigenes were annotated in databases, respectively. Of these, 6323 unigenes were annotated in all database and 39,672 unigenes were annotated in at least one database. Blasted with KEGG pathway, 10,958 unigenes were annotated, and it was divided into 343 categories according to different pathways. In addition, we also identified 29,790 SSRs. This study provided a valuable resource for understanding transcriptomic information of A. davidianus and laid a foundation for further research on functional gene cloning, genomics, genetic diversity analysis and molecular marker exploitation in A. davidianus.

  13. Transcriptome analysis of stem development in the tumourous stem mustard Brassica juncea var. tumida Tsen et Lee by RNA sequencing

    Directory of Open Access Journals (Sweden)

    Sun Quan

    2012-04-01

    Full Text Available Abstract Background Tumourous stem mustard (Brassica juncea var. tumida Tsen et Lee is an economically and nutritionally important vegetable crop of the Cruciferae family that also provides the raw material for Fuling mustard. The genetics breeding, physiology, biochemistry and classification of mustards have been extensively studied, but little information is available on tumourous stem mustard at the molecular level. To gain greater insight into the molecular mechanisms underlying stem swelling in this vegetable and to provide additional information for molecular research and breeding, we sequenced the transcriptome of tumourous stem mustard at various stem developmental stages and compared it with that of a mutant variety lacking swollen stems. Results Using Illumina short-read technology with a tag-based digital gene expression (DGE system, we performed de novo transcriptome assembly and gene expression analysis. In our analysis, we assembled genetic information for tumourous stem mustard at various stem developmental stages. In addition, we constructed five DGE libraries, which covered the strains Yong'an and Dayejie at various development stages. Illumina sequencing identified 146,265 unigenes, including 11,245 clusters and 135,020 singletons. The unigenes were subjected to a BLAST search and annotated using the GO and KO databases. We also compared the gene expression profiles of three swollen stem samples with those of two non-swollen stem samples. A total of 1,042 genes with significantly different expression levels occurring simultaneously in the six comparison groups were screened out. Finally, the altered expression levels of a number of randomly selected genes were confirmed by quantitative real-time PCR. Conclusions Our data provide comprehensive gene expression information at the transcriptional level and the first insight into the understanding of the molecular mechanisms and regulatory pathways of stem swelling and development

  14. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    OpenAIRE

    Scoté-Blachon Céline; Wincker Patrick; Dossat Carole; Faure Claudine; Gay Nadine; Keime Céline; Hanriot Lucie; Peyron Christelle; Gandrillon Olivier

    2008-01-01

    Abstract Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables t...

  15. Efficient Detection of Novel Nuclear Markers for Brassicaceae by Transcriptome Sequencing.

    Directory of Open Access Journals (Sweden)

    Reinhold Stockenhuber

    Full Text Available The lack of DNA sequence information for most non-model organisms impairs the design of primers that are universally applicable for the study of molecular polymorphisms in nuclear markers. Next-generation sequencing (NGS techniques nowadays provide a powerful approach to overcome this limitation. We present a flexible and inexpensive method to identify large numbers of nuclear primer pairs that amplify in most Brassicaceae species. We first obtained and mapped NGS transcriptome sequencing reads from two of the distantly related Brassicaceae species, Cardamine hirsuta and Arabis alpina, onto the Arabidopsis thaliana reference genome, and then identified short conserved sequence motifs among the three species bioinformatically. From these, primer pairs to amplify coding regions (nuclear protein coding loci, NPCL and exon-primed intron-crossing sequences (EPIC were developed. We identified 2,334 universally applicable primer pairs, targeting 1,164 genes, which provide a large pool of markers as readily usable genomic resource that will help addressing novel questions in the Brassicaceae family. Testing a subset of the newly designed nuclear primer pairs revealed that a great majority yielded a single amplicon in all of the 30 investigated Brassicaceae taxa. Sequence analysis and phylogenetic reconstruction with a subset of these markers on different levels of phylogenetic divergence in the mustard family were compared with previous studies. The results corroborate the usefulness of the newly developed primer pairs, e.g., for phylogenetic analyses or population genetic studies. Thus, our method provides a cost-effective approach for designing nuclear loci across a broad range of taxa and is compatible with current NGS technologies.

  16. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    Directory of Open Access Journals (Sweden)

    Scoté-Blachon Céline

    2008-09-01

    Full Text Available Abstract Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression, LongSAGE and MPSS (Massively Parallel Signature Sequencing are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method.

  17. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome.

    Science.gov (United States)

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-09-16

    "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method.

  18. Whole-exome and transcriptome sequencing of refractory diffuse large B-cell lymphoma.

    Science.gov (United States)

    Park, Ha Young; Lee, Seung-Bok; Yoo, Hae-Yong; Kim, Seok-Jin; Kim, Won-Seog; Kim, Jong-Il; Ko, Young-Hyeh

    2016-12-27

    Diffuse large B-cell lymphoma (DLBCL) is the most common type of non-Hodgkin lymphoma. Although rituximab therapy improves clinical outcome, some patients develop resistant DLBCL; however, the genetic alterations in these patients are not well documented. To identify the genetic background of refractory DLBCL, we conducted whole-exome sequencing and transcriptome sequencing for six patients with refractory and seven with responsive DLBCL. The average numbers of pathogenic somatic single nucleotide variants and indels in coding regions were 71 in refractory patients (range 28-120) and 38 (range 19-66) in responsive patients. Missense mutations of TP53 were exclusive in 50% (3/6) of refractory patients and involved the DNA-binding domain of TP53. All missense mutations of TP53 were accompanied by copy number deletions. RAB11FIP5, PRKCB, PRDM15, FNBP4, AHR, CEP128, BRE, DHX16, MYO6, and NMT1 mutations were recurrent in refractory patients. MYD88, B2M, SORCS3, and WDFY3 mutations were more frequent in refractory patients than in responsive patients. REL-BCL11A fusion was found in two refractory patients; one had both fusion and copy number gain. Recurrent copy gains of POU2AF1, SLC1A4, REL11, FANCL, CACNA1D, TRRAP, and CUX1 with significantly increased average expression were found in refractory patients. The expression profile revealed enriched gene sets associated with treatment resistance, including oxidative phosphorylation and ATP-binding cassette transporters. In conclusion, this study integrated both genomic and transcriptomic alterations associated with refractory DLBCL and found several treatment-resistance alterations that may contribute to refractoriness.

  19. Transcriptome sequencing of the naked mole rat (Heterocephalus glaber and identification of hypoxia tolerance genes

    Directory of Open Access Journals (Sweden)

    Bang Xiao

    2017-12-01

    Full Text Available The naked mole rat (NMR; Heterocephalus glaber is a small rodent species found in regions of Kenya, Ethiopia and Somalia. It has a high tolerance for hypoxia and is thus considered one of the most important natural models for studying hypoxia tolerance mechanisms. The various mechanisms underlying the NMR's hypoxia tolerance are beginning to be understood at different levels of organization, and next-generation sequencing methods promise to expand this understanding to the level of gene expression. In this study, we examined the sequence and transcript abundance data of the muscle transcriptome of NMRs exposed to hypoxia using the Illumina HiSeq 2500 system to clarify the possible genomic adaptive responses to the hypoxic underground surroundings. The RNA-seq raw FastQ data were mapped against the NMR genome. We identified 2337 differentially expressed genes (DEGs by comparison of the hypoxic and control groups. Functional annotation of the DEGs by gene ontology (GO analysis revealed enrichment of hypoxia stress-related GO categories, including ‘biological regulation’, ‘cellular process’, ‘ion transport’ and ‘cell-cell signaling’. Enrichment of DEGs in signaling pathways was analyzed against the Kyoto Encyclopedia of Genes and Genomes (KEGG database to identify possible interactions between DEGs. The results revealed significant enrichment of DEGs in focal adhesion, the mitogen-activated protein kinase (MAPK signaling pathway and the glycine, serine and threonine metabolism pathway. Furthermore, inhibition of DEGs (STMN1, MAPK8IP1 and MAPK10 expression induced apoptosis and arrested cell growth in NMR fibroblasts following hypoxia. Thus, this global transcriptome analysis of NMRs can provide an important genetic resource for the study of hypoxia tolerance in mammals. Furthermore, the identified DEGs may provide important molecular targets for biomedical research into therapeutic strategies for stroke and cardiovascular diseases.

  20. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    Directory of Open Access Journals (Sweden)

    Kim Jungeun

    2012-11-01

    Full Text Available Abstract Background Roses (Rosa sp., which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO terms, Plant Ontology (PO terms, and MIPS Functional Catalogue (FunCat terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a

  1. Transcriptome Sequencing and Analysis for Culm Elongation of the World's Largest Bamboo (Dendrocalamus sinicus.

    Directory of Open Access Journals (Sweden)

    Kai Cui

    Full Text Available Dendrocalamus sinicus is the world's largest bamboo species with strong woody culms, and known for its fast-growing culms. As an economic bamboo species, it was popularized for multi-functional applications including furniture, construction, and industrial paper pulp. To comprehensively elucidate the molecular processes involved in its culm elongation, Illumina paired-end sequencing was conducted. About 65.08 million high-quality reads were produced, and assembled into 81,744 unigenes with an average length of 723 bp. A total of 64,338 (79% unigenes were annotated for their functions, of which, 56,587 were annotated in the NCBI non-redundant protein database and 35,262 were annotated in the Swiss-Prot database. Also, 42,508 and 21,009 annotated unigenes were allocated to gene ontology (GO categories and clusters of orthologous groups (COG, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG, 33,920 unigenes were assigned to 128 KEGG pathways. Meanwhile, 8,553 simple sequence repeats (SSRs and 81,534 single-nucleotide polymorphism (SNPs were identified, respectively. Additionally, 388 transcripts encoding lignin biosynthesis were detected, among which, 27 transcripts encoding Shikimate O-hydroxycinnamoyltransferase (HCT specifically expressed in D. sinicus when compared to other bamboo species and rice. The phylogenetic relationship between D. sinicus and other plants was analyzed, suggesting functional diversity of HCT unigenes in D. sinicus. We conjectured that HCT might lead to the high lignin content and giant culm. Given that the leaves are not yet formed and culm is covered with sheaths during culm elongation, the existence of photosynthesis of bamboo culm is usually neglected. Surprisedly, 109 transcripts encoding photosynthesis were identified, including photosystem I and II, cytochrome b6/f complex, photosynthetic electron transport and F-type ATPase, and 24 transcripts were characterized

  2. Next-generation sequencing of the Chrysanthemum nankingense (Asteraceae transcriptome permits large-scale unigene assembly and SSR marker discovery.

    Directory of Open Access Journals (Sweden)

    Haibin Wang

    Full Text Available BACKGROUND: Simple sequence repeats (SSRs are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST sequences have been acquired to date, so the number of available EST-SSR markers is very low. METHODOLOGY/PRINCIPAL FINDINGS: Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59% unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all of the assays were transferable across species and that they exposed a significant amount of allelic diversity. CONCLUSIONS/SIGNIFICANCE: SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera.

  3. Generation and analysis of blueberry transcriptome sequences from leaves, developing fruit, and flower buds from cold acclimation through deacclimation.

    Science.gov (United States)

    Rowland, Lisa J; Alkharouf, Nadim; Darwish, Omar; Ogden, Elizabeth L; Polashock, James J; Bassil, Nahla V; Main, Dorrie

    2012-04-02

    There has been increased consumption of blueberries in recent years fueled in part because of their many recognized health benefits. Blueberry fruit is very high in anthocyanins, which have been linked to improved night vision, prevention of macular degeneration, anti-cancer activity, and reduced risk of heart disease. Very few genomic resources have been available for blueberry, however. Further development of genomic resources like expressed sequence tags (ESTs), molecular markers, and genetic linkage maps could lead to more rapid genetic improvement. Marker-assisted selection could be used to combine traits for climatic adaptation with fruit and nutritional quality traits. Efforts to sequence the transcriptome of the commercial highbush blueberry (Vaccinium corymbosum) cultivar Bluecrop and use the sequences to identify genes associated with cold acclimation and fruit development and develop SSR markers for mapping studies are presented here. Transcriptome sequences were generated from blueberry fruit at different stages of development, flower buds at different stages of cold acclimation, and leaves by next-generation Roche 454 sequencing. Over 600,000 reads were assembled into approximately 15,000 contigs and 124,000 singletons. The assembled sequences were annotated and functionally mapped to Gene Ontology (GO) terms. Frequency of the most abundant sequences in each of the libraries was compared across all libraries to identify genes that are potentially differentially expressed during cold acclimation and fruit development. Real-time PCR was performed to confirm their differential expression patterns. Overall, 14 out of 17 of the genes examined had differential expression patterns similar to what was predicted from their reads alone. The assembled sequences were also mined for SSRs. From these sequences, 15,886 blueberry EST-SSR loci were identified. Primers were designed from 7,705 of the SSR-containing sequences with adequate flanking sequence. One hundred

  4. Generation and analysis of blueberry transcriptome sequences from leaves, developing fruit, and flower buds from cold acclimation through deacclimation

    Directory of Open Access Journals (Sweden)

    Rowland Lisa J

    2012-04-01

    Full Text Available Abstract Background There has been increased consumption of blueberries in recent years fueled in part because of their many recognized health benefits. Blueberry fruit is very high in anthocyanins, which have been linked to improved night vision, prevention of macular degeneration, anti-cancer activity, and reduced risk of heart disease. Very few genomic resources have been available for blueberry, however. Further development of genomic resources like expressed sequence tags (ESTs, molecular markers, and genetic linkage maps could lead to more rapid genetic improvement. Marker-assisted selection could be used to combine traits for climatic adaptation with fruit and nutritional quality traits. Results Efforts to sequence the transcriptome of the commercial highbush blueberry (Vaccinium corymbosum cultivar Bluecrop and use the sequences to identify genes associated with cold acclimation and fruit development and develop SSR markers for mapping studies are presented here. Transcriptome sequences were generated from blueberry fruit at different stages of development, flower buds at different stages of cold acclimation, and leaves by next-generation Roche 454 sequencing. Over 600,000 reads were assembled into approximately 15,000 contigs and 124,000 singletons. The assembled sequences were annotated and functionally mapped to Gene Ontology (GO terms. Frequency of the most abundant sequences in each of the libraries was compared across all libraries to identify genes that are potentially differentially expressed during cold acclimation and fruit development. Real-time PCR was performed to confirm their differential expression patterns. Overall, 14 out of 17 of the genes examined had differential expression patterns similar to what was predicted from their reads alone. The assembled sequences were also mined for SSRs. From these sequences, 15,886 blueberry EST-SSR loci were identified. Primers were designed from 7,705 of the SSR-containing sequences

  5. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2011-03-01

    Full Text Available Abstract Background Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs. Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. Findings We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads, and a complete genome re-sequencing (45.3 Million reads. This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. Conclusions This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.

  6. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx

    OpenAIRE

    Colbourne John K; Abrego David; Buchanan-Carter Jade; Wang Shi; Aglyamova Galina V; Meyer Eli; Willis Bette L; Matz Mikhail V

    2009-01-01

    Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing ...

  7. De novo Assembly and Characterization of the Global Transcriptome for Rhyacionia leptotubula Using Illumina Paired-End Sequencing

    OpenAIRE

    Zhu, Jia-Ying; Li, Yong-He; Yang, Song; Li, Qin-Wen

    2013-01-01

    BACKGROUND: The pine tip moth, Rhyacionia leptotubula (Lepidoptera: Tortricidae) is one of the most destructive forestry pests in Yunnan Province, China. Despite its importance, less is known regarding all aspects of this pest. Understanding the genetic information of it is essential for exploring the specific traits at the molecular level. Thus, we here sequenced the transcriptome of R. leptotubula with high-throughput Illumina sequencing. METHODOLOGY/PRINCIPAL FINDINGS: In a single run, mor...

  8. Identification of Reproduction-Related Gene Polymorphisms Using Whole Transcriptome Sequencing in the Large White Pig Population

    OpenAIRE

    Fischer, Daniel; Laiho, Asta; Gyenesei, Attila; Sironen, Anu

    2015-01-01

    Recent developments in high-throughput sequencing techniques have enabled large-scale analysis of genetic variations and gene expression in different tissues and species, but gene expression patterns and genetic variations in livestock are not well-characterized. In this study, we have used high-throughput transcriptomic sequencing of the Finnish Large White to identify gene expression patterns and coding polymorphisms within the breed in the testis and oviduct. The main objective of this stu...

  9. Genome and transcriptome sequencing of the halophilic fungus Wallemia ichthyophaga: haloadaptations present and absent.

    Science.gov (United States)

    Zajc, Janja; Liu, Yongfeng; Dai, Wenkui; Yang, Zhenyu; Hu, Jingzhi; Gostinčar, Cene; Gunde-Cimerman, Nina

    2013-09-13

    The basidomycete Wallemia ichthyophaga from the phylogenetically distinct class Wallemiomycetes is the most halophilic fungus known to date. It requires at least 10% NaCl and thrives in saturated salt solution. To investigate the genomic basis of this exceptional phenotype, we obtained a de-novo genome sequence of the species type-strain and analysed its transcriptomic response to conditions close to the limits of its lower and upper salinity range. The unusually compact genome is 9.6 Mb large and contains 1.67% repetitive sequences. Only 4884 predicted protein coding genes cover almost three quarters of the sequence. Of 639 differentially expressed genes, two thirds are more expressed at lower salinity. Phylogenomic analysis based on the largest dataset used to date (whole proteomes) positions Wallemiomycetes as a 250-million-year-old sister group of Agaricomycotina. Contrary to the closely related species Wallemia sebi, W. ichthyophaga appears to have lost the ability for sexual reproduction. Several protein families are significantly expanded or contracted in the genome. Among these, there are the P-type ATPase cation transporters, but not the sodium/ hydrogen exchanger family. Transcription of all but three cation transporters is not salt dependent. The analysis also reveals a significant enrichment in hydrophobins, which are cell-wall proteins with multiple cellular functions. Half of these are differentially expressed, and most contain an unusually large number of acidic amino acids. This discovery is of particular interest due to the numerous applications of hydrophobines from other fungi in industry, pharmaceutics and medicine. W. ichthyophaga is an extremophilic specialist that shows only low levels of adaptability and genetic recombination. This is reflected in the characteristics of its genome and its transcriptomic response to salt. No unusual traits were observed in common salt-tolerance mechanisms, such as transport of inorganic ions or synthesis of

  10. The porcelain crab transcriptome and PCAD, the porcelain crab microarray and sequence database.

    Directory of Open Access Journals (Sweden)

    Abderrahmane Tagmount

    transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.

  11. Discovery and Annotation of Plant Endogenous Target Mimicry Sequences from Public Transcriptome Libraries: A Case Study of Prunus persica.

    Science.gov (United States)

    Karakülah, Gökhan

    2017-06-28

    Novel transcript discovery through RNA sequencing has substantially improved our understanding of the transcriptome dynamics of biological systems. Endogenous target mimicry (eTM) transcripts, a novel class of regulatory molecules, bind to their target microRNAs (miRNAs) by base pairing and block their biological activity. The objective of this study was to provide a computational analysis framework for the prediction of putative eTM sequences in plants, and as an example, to discover previously un-annotated eTMs in Prunus persica (peach) transcriptome. Therefore, two public peach transcriptome libraries downloaded from Sequence Read Archive (SRA) and a previously published set of long non-coding RNAs (lncRNAs) were investigated with multi-step analysis pipeline, and 44 putative eTMs were found. Additionally, an eTM-miRNA-mRNA regulatory network module associated with peach fruit organ development was built via integration of the miRNA target information and predicted eTM-miRNA interactions. My findings suggest that one of the most widely expressed miRNA families among diverse plant species, miR156, might be potentially sponged by seven putative eTMs. Besides, the study indicates eTMs potentially play roles in the regulation of development processes in peach fruit via targeting specific miRNAs. In conclusion, by following the step-by step instructions provided in this study, novel eTMs can be identified and annotated effectively in public plant transcriptome libraries.

  12. Annotation and re-sequencing of genes from de novo transcriptome assembly of Abies alba (Pinaceae)1

    Science.gov (United States)

    Roschanski, Anna M.; Fady, Bruno; Ziegenhagen, Birgit; Liepelt, Sascha

    2013-01-01

    • Premise of the study: We present a protocol for the annotation of transcriptome sequence data and the identification of candidate genes therein using the example of the nonmodel conifer Abies alba. • Methods and Results: A normalized cDNA library was built from an A. alba seedling. The sequencing on a 454 platform yielded more than 1.5 million reads that were de novo assembled into 25149 contigs. Two complementary approaches were applied to annotate gene fragments that code for (1) well-known proteins and (2) proteins that are potentially adaptively relevant. Primer development and testing yielded 88 amplicons that could successfully be resequenced from genomic DNA. • Conclusions: The annotation workflow offers an efficient way to identify potential adaptively relevant genes from the large quantity of transcriptome sequence data. The primer set presented should be prioritized for single-nucleotide polymorphism detection in adaptively relevant genes in A. alba. PMID:25202477

  13. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    Science.gov (United States)

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Transcriptome walking: a laboratory-oriented GUI-based approach to mRNA identification from deep-sequenced data.

    Science.gov (United States)

    French, Andrew S

    2012-12-05

    Deep sequencing technology provides efficient and economical production of large numbers of randomly positioned, relatively short, estimates of base identities in DNA molecules. Application of this technology to mRNA samples allows rapid examination of the molecular genetic environment in individual cells or tissues, the transcriptome. However, assembly of such short sequences into complete mRNA creates a challenge that limits the usefulness of the technology, particularly when no, or limited, genomic data is available. Several approaches to this problem have been developed, but there is still no general method to rapidly obtain an mRNA sequence from deep sequence data when a specific molecule, or family of molecules, are of interest. A frequent requirement is to identify specific mRNA molecules from tissues that are being investigated by methods such as electrophysiology, immunocytology and pharmacology. To be widely useful, any approach must be relatively simple to use in the laboratory by operators without extensive statistical or bioinformatics knowledge, and with readily available hardware. An approach was developed that allows de novo assembly of individual mRNA sequences in two linked stages: sequence discovery and sequence completion. Both stages rely on computer assisted, Graphical User Interface (GUI)-guided, user interaction with the data, but proceed relatively efficiently once discovery is complete. The method grows a discovered sequence by repeated passes through the complete raw data in a series of steps, and is hence termed 'transcriptome walking'. All of the operations required for transcriptome analysis are combined in one program that presents a relatively simple user interface and runs on a standard desktop, or laptop computer, but takes advantage of multi-core processors, when available. Complete mRNA sequence identifications usually require less than 24 hours. This approach has already identified previously unknown mRNA sequences in two animal

  15. Prediction of Toxin Genes from Chinese Yellow Catfish Based on Transcriptomic and Proteomic Sequencing

    Directory of Open Access Journals (Sweden)

    Bing Xie

    2016-04-01

    Full Text Available Fish venom remains a virtually untapped resource. There are so few fish toxin sequences for reference, which increases the difficulty to study toxins from venomous fish and to develop efficient and fast methods to dig out toxin genes or proteins. Here, we utilized Chinese yellow catfish (Pelteobagrus fulvidraco as our research object, since it is a representative species in Siluriformes with its venom glands embedded in the pectoral and dorsal fins. In this study, we set up an in-house toxin database and a novel toxin-discovering protocol to dig out precise toxin genes by combination of transcriptomic and proteomic sequencing. Finally, we obtained 15 putative toxin proteins distributed in five groups, namely Veficolin, Ink toxin, Adamalysin, Za2G and CRISP toxin. It seems that we have developed a novel bioinformatics method, through which we could identify toxin proteins with high confidence. Meanwhile, these toxins can also be useful for comparative studies in other fish and development of potential drugs.

  16. Prediction of Toxin Genes from Chinese Yellow Catfish Based on Transcriptomic and Proteomic Sequencing

    Science.gov (United States)

    Xie, Bing; Li, Xiaofeng; Lin, Zhilong; Ruan, Zhiqiang; Wang, Min; Liu, Jie; Tong, Ting; Li, Jia; Huang, Yu; Wen, Bo; Sun, Ying; Shi, Qiong

    2016-01-01

    Fish venom remains a virtually untapped resource. There are so few fish toxin sequences for reference, which increases the difficulty to study toxins from venomous fish and to develop efficient and fast methods to dig out toxin genes or proteins. Here, we utilized Chinese yellow catfish (Pelteobagrus fulvidraco) as our research object, since it is a representative species in Siluriformes with its venom glands embedded in the pectoral and dorsal fins. In this study, we set up an in-house toxin database and a novel toxin-discovering protocol to dig out precise toxin genes by combination of transcriptomic and proteomic sequencing. Finally, we obtained 15 putative toxin proteins distributed in five groups, namely Veficolin, Ink toxin, Adamalysin, Za2G and CRISP toxin. It seems that we have developed a novel bioinformatics method, through which we could identify toxin proteins with high confidence. Meanwhile, these toxins can also be useful for comparative studies in other fish and development of potential drugs. PMID:27089325

  17. Transcriptomic Analysis of C. elegans RNA Sequencing Data Through the Tuxedo Suite on the Galaxy Project.

    Science.gov (United States)

    Amrit, Francis R G; Ghazi, Arjumand

    2017-04-08

    Next generation sequencing (NGS) technologies have revolutionized the nature of biological investigation. Of these, RNA Sequencing (RNA-Seq) has emerged as a powerful tool for gene-expression analysis and transcriptome mapping. However, handling RNA-Seq datasets requires sophisticated computational expertise and poses inherent challenges for biology researchers. This bottleneck has been mitigated by the open access Galaxy project that allows users without bioinformatics skills to analyze RNA-Seq data, and the Database for Annotation, Visualization, and Integrated Discovery (DAVID), a Gene Ontology (GO) term analysis suite that helps derive biological meaning from large data sets. However, for first-time users and bioinformatics' amateurs, self-learning and familiarization with these platforms can be time-consuming and daunting. We describe a straightforward workflow that will help C. elegans researchers to isolate worm RNA, conduct an RNA-Seq experiment and analyze the data using Galaxy and DAVID platforms. This protocol provides stepwise instructions for using the various Galaxy modules for accessing raw NGS data, quality-control checks, alignment, and differential gene expression analysis, guiding the user with parameters at every step to generate a gene list that can be screened for enrichment of gene classes or biological processes using DAVID. Overall, we anticipate that this article will provide information to C. elegans researchers undertaking RNA-Seq experiments for the first time as well as frequent users running a small number of samples.

  18. De novo transcriptome sequencing and comparative analysis to discover genes involved in ovarian maturity in Strongylocentrotus nudus.

    Science.gov (United States)

    Jia, Zhiying; Wang, Qiai; Wu, Kaikai; Wei, Zhenlin; Zhou, Zunchun; Liu, Xiaolin

    2017-09-01

    Strongylocentrotus nudus is an edible sea urchin, mainly harvested in China. Correlation studies indicated that S. nudus with larger diameter have a prolonged marketing time and better palatability owing to their precocious gonads and extended maturation process. However, the molecular mechanism underlying this phenomenon is still unknown. Here, transcriptome sequencing was applied to study the ovaries of adult S. nudus with different shell diameters to explore the possible mechanism. In this study, four independent cDNA libraries were constructed, including two from the big size urchins and two from the small ones using a HiSeq™2500 platform. A total of 88,581 unigenes were acquired with a mean length of 1354bp, of which 66,331 (74.88%) unigenes could be annotated using six major publicly available databases. Comparative analysis revealed that 353 unigenes were differentially expressed (with log2(ratio)≥1, FDR≤0.001) between the two groups. Of these, 20 differentially expressed genes (DEGs) were selected to confirm the accuracy of RNA-seq data by quantitative real-time RT-PCR. Furthermore, gene ontology and KEGG pathway enrichment analyses were performed to find the putative genes and pathways related to ovarian maturity. Eight unigenes were identified as significant DEGs involved in reproduction related pathways; these included Mos, Cdc20, Rec8, YP30, cytochrome P450 2U1, ovoperoxidase, proteoliaisin, and rendezvin. Our research fills the gap in the studies on the S. nudus ovaries using transcriptome analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. The Arabidopsis Root Transcriptome by Serial Analysis of Gene Expression. Gene Identification Using the Genome Sequence1

    Science.gov (United States)

    Fizames, Cécile; Muños, Stéphane; Cazettes, Céline; Nacry, Philippe; Boucherez, Jossia; Gaymard, Frédéric; Piquemal, David; Delorme, Valérie; Commes, Thérèse; Doumas, Patrick; Cooke, Richard; Marti, Jacques; Sentenac, Hervé; Gojon, Alain

    2004-01-01

    Large-scale identification of genes expressed in roots of the model plant Arabidopsis was performed by serial analysis of gene expression (SAGE), on a total of 144,083 sequenced tags, representing at least 15,964 different mRNAs. For tag to gene assignment, we developed a computational approach based on 26,620 genes annotated from the complete sequence of the genome. The procedure selected warrants the identification of the genes corresponding to the majority of the tags found experimentally, with a high level of reliability, and provides a reference database for SAGE studies in Arabidopsis. This new resource allowed us to characterize the expression of more than 3,000 genes, for which there is no expressed sequence tag (EST) or cDNA in the databases. Moreover, 85% of the tags were specific for one gene. To illustrate this advantage of SAGE for functional genomics, we show that our data allow an unambiguous analysis of most of the individual genes belonging to 12 different ion transporter multigene families. These results indicate that, compared with EST-based tag to gene assignment, the use of the annotated genome sequence greatly improves gene identification in SAGE studies. However, more than 6,000 different tags remained with no gene match, suggesting that a significant proportion of transcripts present in the roots originate from yet unknown or wrongly annotated genes. The root transcriptome characterized in this study markedly differs from those obtained in other organs, and provides a unique resource for investigating the functional specificities of the root system. As an example of the use of SAGE for transcript profiling in Arabidopsis, we report here the identification of 270 genes differentially expressed between roots of plants grown either with NO3- or NH4NO3 as N source. PMID:14730065

  20. Integrated mRNA and microRNA transcriptome sequencing characterizes sequence variants and mRNA–microRNA regulatory network in nasopharyngeal carcinoma model systems

    Directory of Open Access Journals (Sweden)

    Carol Ying-Ying Szeto

    2014-01-01

    Full Text Available Nasopharyngeal carcinoma (NPC is a prevalent malignancy in Southeast Asia among the Chinese population. Aberrant regulation of transcripts has been implicated in many types of cancers including NPC. Herein, we characterized mRNA and miRNA transcriptomes by RNA sequencing (RNASeq of NPC model systems. Matched total mRNA and small RNA of undifferentiated Epstein–Barr virus (EBV-positive NPC xenograft X666 and its derived cell line C666, well-differentiated NPC cell line HK1, and the immortalized nasopharyngeal epithelial cell line NP460 were sequenced by Solexa technology. We found 2812 genes and 149 miRNAs (human and EBV to be differentially expressed in NP460, HK1, C666 and X666 with RNASeq; 533 miRNA–mRNA target pairs were inversely regulated in the three NPC cell lines compared to NP460. Integrated mRNA/miRNA expression profiling and pathway analysis show extracellular matrix organization, Beta-1 integrin cell surface interactions, and the PI3K/AKT, EGFR, ErbB, and Wnt pathways were potentially deregulated in NPC. Real-time quantitative PCR was performed on selected mRNA/miRNAs in order to validate their expression. Transcript sequence variants such as short insertions and deletions (INDEL, single nucleotide variant (SNV, and isomiRs were characterized in the NPC model systems. A novel TP53 transcript variant was identified in NP460, HK1, and C666. Detection of three previously reported novel EBV-encoded BART miRNAs and their isomiRs were also observed. Meta-analysis of a model system to a clinical system aids the choice of different cell lines in NPC studies. This comprehensive characterization of mRNA and miRNA transcriptomes in NPC cell lines and the xenograft provides insights on miRNA regulation of mRNA and valuable resources on transcript variation and regulation in NPC, which are potentially useful for mechanistic and preclinical studies.

  1. Transcriptome sequencing revealed significant alteration of cortical promoter usage and splicing in schizophrenia.

    Directory of Open Access Journals (Sweden)

    Jing Qin Wu

    Full Text Available While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression.The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22 from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDR<0.05. Both types of transcriptional isoforms were exemplified by reads aligned to the neurodevelopmentally significant doublecortin-like kinase 1 (DCLK1 gene.This study provided the first deep and un-biased analysis of schizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia.

  2. Comprehensive transcriptome assembly of Chickpea (Cicer arietinum L. using sanger and next generation sequencing platforms: development and applications.

    Directory of Open Access Journals (Sweden)

    Himabindu Kudapa

    Full Text Available A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinumTranscriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201, comprising 46,369 transcript assembly contigs (TACs has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8% of the TACs and gene ontology assignments were determined for 21,471 (46.3%. The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs and intron spanning regions (ISRs for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding

  3. Melon Transcriptome Characterization: Simple Sequence Repeats and Single Nucleotide Polymorphisms Discovery for High Throughput Genotyping across the Species

    Directory of Open Access Journals (Sweden)

    José Miguel Blanca

    2011-07-01

    Full Text Available Melon ( L. ranks among the highest-valued fruit crops worldwide. Some genomic tools are available for this crop, including a Sanger transcriptome. We report the generation of 689,054 high-quality expressed sequence tags (ESTs from two 454 sequencing runs, using normalized and nonnormalized complementary DNA (cDNA libraries prepared from four genotypes belonging to the two subspecies and the main commercial types. 454 ESTs were combined with the Sanger available ESTs and de novo assembled into 53,252 unigenes. Over 63% of the unigenes were functionally annotated with Gene Ontology (GO terms and 21% had known orthologs of (L. Heynh. Annotation distribution followed similar tendencies than that reported for , suggesting that the dataset represents a fairly complete melon transcriptome. Furthermore, we identified a set of 3298 unigenes with microsatellite motifs and 14,417 sequences with single nucleotide variants of which 11,655 single nucleotide polymorphism met criteria for use with high-throughput genotyping platforms, and 453 could be detected as cleaved amplified polymorphic sequence (CAPS. A set of markers were validated, 90% of them being polymorphic in a number of variable accessions. This transcriptome provides an invaluable new tool for biological research, more so when it includes transcripts not described previously. It is being used for genome annotation and has provided a large collection of markers that will allow speeding up the process of breeding new melon varieties.

  4. Prediction of hybrid performance in maize with transcriptome data

    OpenAIRE

    Zenke-Philippi, Carola Anna Luise

    2017-01-01

    Most studies on genomic prediction of hybrids employ genetic markers as the main carrier of information. Very few use transcriptomic or metabolomic data despite the fact that the end product of gene expression, i.e., the protein, might carry more information than genetic markers. The main goal of the present study was therefore to investigate whether gene expression profiles can be employed successfully for hybrid prediction in maize. With RR-BLUP, similar accuracies were found for ALFP ma...

  5. De Novo Sequencing-Based Transcriptome and Digital Gene Expression Analysis Reveals Insecticide Resistance-Relevant Genes in Propylaea japonica (Thunberg) (Coleoptea: Coccinellidae)

    Science.gov (United States)

    Jin, Feng-Liang; Qiu, Bao-Li; Wu, Jian-Hui; Ren, Shun-Xiang

    2014-01-01

    The ladybird Propylaea japonica (Thunberg) is one of most important natural enemies of aphids in China. This species is threatened by the extensive use of insecticides but genomics-based information on the molecular mechanisms underlying insecticide resistance is limited. Hence, we analyzed the transcriptome and expression profile data of P. japonica in order to gain a deeper understanding of insecticide resistance in ladybirds. We performed de novo assembly of a transcriptome using Illumina's Solexa sequencing technology and short reads. A total of 27,243,552 reads were generated. These were assembled into 81,458 contigs and 33,647 unigenes (6,862 clusters and 26,785 singletons). Of the unigenes, 23,965 (71.22%) have putative homologues in the non-redundant (nr) protein database from NCBI, using BLASTX, with a cut-off E-value of 10−5. We examined COG, GO and KEGG annotations to better understand the functions of these unigenes. Digital gene expression (DGE) libraries showed differences in gene expression profiles between two insecticide resistant strains. When compared with an insecticide susceptible profile, a total of 4,692 genes were significantly up- or down- regulated in a moderately resistant strain. Among these genes, 125 putative insecticide resistance genes were identified. To confirm the DGE results, 16 selected genes were validated using quantitative real time PCR (qRT-PCR). This study is the first to report genetic information on P. japonica and has greatly enriched the sequence data for ladybirds. The large number of gene sequences produced from the transcriptome and DGE sequencing will greatly improve our understanding of this important insect, at the molecular level, and could contribute to the in-depth research into insecticide resistance mechanisms. PMID:24959827

  6. De novo sequencing-based transcriptome and digital gene expression analysis reveals insecticide resistance-relevant genes in Propylaea japonica (Thunberg (Coleoptea: Coccinellidae.

    Directory of Open Access Journals (Sweden)

    Liang-De Tang

    Full Text Available The ladybird Propylaea japonica (Thunberg is one of most important natural enemies of aphids in China. This species is threatened by the extensive use of insecticides but genomics-based information on the molecular mechanisms underlying insecticide resistance is limited. Hence, we analyzed the transcriptome and expression profile data of P. japonica in order to gain a deeper understanding of insecticide resistance in ladybirds. We performed de novo assembly of a transcriptome using Illumina's Solexa sequencing technology and short reads. A total of 27,243,552 reads were generated. These were assembled into 81,458 contigs and 33,647 unigenes (6,862 clusters and 26,785 singletons. Of the unigenes, 23,965 (71.22% have putative homologues in the non-redundant (nr protein database from NCBI, using BLASTX, with a cut-off E-value of 10(-5. We examined COG, GO and KEGG annotations to better understand the functions of these unigenes. Digital gene expression (DGE libraries showed differences in gene expression profiles between two insecticide resistant strains. When compared with an insecticide susceptible profile, a total of 4,692 genes were significantly up- or down- regulated in a moderately resistant strain. Among these genes, 125 putative insecticide resistance genes were identified. To confirm the DGE results, 16 selected genes were validated using quantitative real time PCR (qRT-PCR. This study is the first to report genetic information on P. japonica and has greatly enriched the sequence data for ladybirds. The large number of gene sequences produced from the transcriptome and DGE sequencing will greatly improve our understanding of this important insect, at the molecular level, and could contribute to the in-depth research into insecticide resistance mechanisms.

  7. Transcriptome Sequence and Plasmid Copy Number Analysis of the Brewery Isolate Pediococcus claussenii ATCC BAA-344T during Growth in Beer

    Science.gov (United States)

    Pittet, Vanessa; Phister, Trevor G.; Ziola, Barry

    2013-01-01

    Growth of specific lactic acid bacteria in beer leads to spoiled product and economic loss for the brewing industry. Microbial growth is typically inhibited by the combined stresses found in beer (e.g., ethanol, hops, low pH, minimal nutrients); however, certain bacteria have adapted to grow in this harsh environment. Considering little is known about the mechanisms used by bacteria to grow in and spoil beer, transcriptome sequencing was performed on a variant of the beer-spoilage organism Pediococcus claussenii ATCC BAA-344T (Pc344-358). Illumina sequencing was used to compare the transcript levels in Pc344-358 growing mid-exponentially in beer to those in nutrient-rich MRS broth. Various operons demonstrated high gene expression in beer, several of which are involved in nutrient acquisition and overcoming the inhibitory effects of hop compounds. As well, genes functioning in cell membrane modification and biosynthesis demonstrated significantly higher transcript levels in Pc344-358 growing in beer. Three plasmids had the majority of their genes showing increased transcript levels in beer, whereas the two cryptic plasmids showed slightly decreased gene expression. Follow-up analysis of plasmid copy number in both growth environments revealed similar trends, where more copies of the three non-cryptic plasmids were found in Pc344-358 growing in beer. Transcriptome sequencing also enabled the addition of several genes to the P . claussenii ATCC BAA-344T genome annotation, some of which are putatively transcribed as non-coding RNAs. The sequencing results not only provide the first transcriptome description of a beer-spoilage organism while growing in beer, but they also highlight several targets for future exploration, including genes that may have a role in the general stress response of lactic acid bacteria. PMID:24040005

  8. Deep mRNA sequencing of the Tritonia diomedea brain transcriptome provides access to gene homologues for neuronal excitability, synaptic transmission and peptidergic signalling.

    Directory of Open Access Journals (Sweden)

    Adriano Senatore

    Full Text Available The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia, has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level.We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes. BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA.Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain.

  9. Complete genome sequence and transcriptomic analysis of a novel marine strain Bacillus weihaiensis reveals the mechanism of brown algae degradation.

    Science.gov (United States)

    Zhu, Yueming; Chen, Peng; Bao, Yunjuan; Men, Yan; Zeng, Yan; Yang, Jiangang; Sun, Jibin; Sun, Yuanxia

    2016-11-30

    A novel marine strain representing efficient degradation ability toward brown algae was isolated, identified, and assigned to Bacillus weihaiensis Alg07. The alga-associated marine bacteria promote the nutrient cycle and perform important functions in the marine ecosystem. The de novo sequencing of the B. weihaiensis Alg07 genome was carried out. Results of gene annotation and carbohydrate-active enzyme analysis showed that the strain harbored enzymes that can completely degrade alginate and laminarin, which are the specific polysaccharides of brown algae. We also found genes for the utilization of mannitol, the major storage monosaccharide in the cell of brown algae. To understand the process of brown algae decomposition by B. weihaiensis Alg07, RNA-seq transcriptome analysis and qRT-PCR were performed. The genes involved in alginate metabolism were all up-regulated in the initial stage of kelp degradation, suggesting that the strain Alg07 first degrades alginate to destruct the cell wall so that the laminarin and mannitol are released and subsequently decomposed. The key genes involved in alginate and laminarin degradation were expressed in Escherichia coli and characterized. Overall, the model of brown algae degradation by the marine strain Alg07 was established, and novel alginate lyases and laminarinase were discovered.

  10. Efficient development of highly polymorphic microsatellite markers based on polymorphic repeats in transcriptome sequences of multiple individuals.

    Science.gov (United States)

    Vukosavljev, M; Esselink, G D; van 't Westende, W P C; Cox, P; Visser, R G F; Arens, P; Smulders, M J M

    2015-01-01

    The first hurdle in developing microsatellite markers, cloning, has been overcome by next-generation sequencing. The second hurdle is testing to differentiate polymorphic from nonpolymorphic loci. The third hurdle, somewhat hidden, is that only polymorphic markers with a large effective number of alleles are sufficiently informative to be deployed in multiple studies. Both steps are laborious and still performed manually. We have developed a strategy in which we first screen reads from multiple genotypes for repeats that show the most length variants, and only these are subsequently developed into markers. We validated our strategy in tetraploid garden rose using Illumina paired-end transcriptome sequences of 11 roses. Of 48 tested two markers failed to amplify, but all others were polymorphic. Ten loci amplified more than one locus, indicating duplicated genes or gene families. Completely avoiding duplicated loci will be difficult because the range of numbers of predicted alleles of highly polymorphic single- and multilocus markers largely overlapped. Of the remainder, half were replicate markers (i.e. multiple primer pairs for one locus), indicating the difficulty of correctly filtering short reads containing repeat sequences. We subsequently refined the approach to eliminate multiple primer sets to the same loci. The remaining 18 markers were all highly polymorphic, amplifying on average 11.7 alleles per marker (range = 6-20) in 11 tetraploid roses, exceeding the 8.2 alleles per marker of the 24 most polymorphic markers genotyped previously. This strategy therefore represents a major step forward in the development of highly polymorphic microsatellite markers. © 2014 John Wiley & Sons Ltd.

  11. Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis

    Science.gov (United States)

    2011-01-01

    Background Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer. Results By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions. Conclusions We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data. PMID:21699700

  12. Transcriptome Sequencing in a Tibetan Barley Landrace with High Resistance to Powdery Mildew

    Directory of Open Access Journals (Sweden)

    Xing-Quan Zeng

    2014-01-01

    Full Text Available Hulless barley is an important cereal crop worldwide, especially in Tibet of China. However, this crop is usually susceptible to powdery mildew caused by Blumeria graminis f. sp. hordei. In this study, we aimed to understand the functions and pathways of genes involved in the disease resistance by transcriptome sequencing of a Tibetan barley landrace with high resistance to powdery mildew. A total of 831 significant differentially expressed genes were found in the infected seedlings, covering 19 functions. Either “cell,” “cell part,” and “extracellular region” in the cellular component category or “binding” and “catalytic” in the category of molecular function as well as “metabolic process” and “cellular process” in the biological process category together demonstrated that these functions may be involved in the resistance to powdery mildew of the hulless barley. In addition, 330 KEGG pathways were found using BLASTx with an E-value cut-off of <10−5. Among them, three pathways, namely, “photosynthesis,” “plant-pathogen interaction,” and “photosynthesis-antenna proteins” had significant matches in the database. Significant expressions of the three pathways were detected at 24 h, 48 h, and 96 h after infection, respectively. These results indicated a complex process of barley response to powdery mildew infection.

  13. Whole Blood Transcriptome Sequencing Reveals Gene Expression Differences between Dapulian and Landrace Piglets

    Directory of Open Access Journals (Sweden)

    Jiaqing Hu

    2016-01-01

    Full Text Available There is little genomic information regarding gene expression differences at the whole blood transcriptome level of different pig breeds at the neonatal stage. To solve this, we characterized differentially expressed genes (DEGs in the whole blood of Dapulian (DPL and Landrace piglets using RNA-seq (RNA-sequencing technology. In this study, 83 DEGs were identified between the two breeds. Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analyses identified immune response and metabolism as the most commonly enriched terms and pathways in the DEGs. Genes related to immunity and lipid metabolism were more highly expressed in the DPL piglets, while genes related to body growth were more highly expressed in the Landrace piglets. Additionally, the DPL piglets had twofold more single nucleotide polymorphisms (SNPs and alternative splicing (AS than the Landrace piglets. These results expand our knowledge of the genes transcribed in the piglet whole blood of two breeds and provide a basis for future research of the molecular mechanisms underlying the piglet differences.

  14. Transcriptome sequencing and analysis of wild Amur Ide (Leuciscus waleckii inhabiting an extreme alkaline-saline lake reveals insights into stress adaptation.

    Directory of Open Access Journals (Sweden)

    Jian Xu

    Full Text Available BACKGROUND: Amur ide (Leuciscus waleckii is an economically and ecologically important species in Northern Asia. The Dali Nor population inhabiting Dali Nor Lake, a typical saline-alkaline lake in Inner Mongolia, is well-known for its adaptation to extremely high alkalinity. Genome information is needed for conservation and aquaculture purposes, as well as to gain further understanding into the genetics of stress tolerance. The objective of the study is to sequence the transcriptome and obtain a well-assembled transcriptome of Amur ide. RESULTS: The transcriptome of Amur ide was sequenced using the Illumina platform and assembled into 53,632 cDNA contigs, with an average length of 647 bp and a N50 length of 1,094 bp. A total of 19,338 unique proteins were identified, and gene ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes analyses classified all contigs into functional categories. Open Reading Frames (ORFs were detected from 34,888 (65.1% of contigs with an average length of 577 bp, while 9,638 full-length cDNAs were identified. Comparative analyses revealed that 31,790 (59.3% contigs have a significant similarity to zebrafish proteins, and 27,096 (50.5%, 27,524 (51.3% and 27,996 (52.2% to teraodon, medaka and three-spined stickleback proteins, respectively. A total of 10,395 microsatellites and 34,299 SNPs were identified and classified. A dN/dS analysis on unigenes was performed, which identified that 61 of the genes were under strong positive selection. Most of the genes are associated with stress adaptation and immunity, suggesting that the extreme alkaline-saline environment resulted in fast evolution of certain genes. CONCLUSIONS: The transcriptome of Amur ide had been deeply sequenced, assembled and characterized, providing a valuable resource for a better understanding of the Amur ide genome. The transcriptome data will facilitate future functional studies on the Amur ide genome, as well as provide insight into potential

  15. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    Directory of Open Access Journals (Sweden)

    Ching-Ping Lin

    Full Text Available We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911. In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum, two monocots (Oryza sativa and Zea mays, two gymnosperms (Pinus taeda and Ginkgo biloba and one moss (Physcomitrella patens. Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants.

  16. Discovery of Putative Herbicide Resistance Genes and Its Regulatory Network in Chickpea Using Transcriptome Sequencing

    Directory of Open Access Journals (Sweden)

    Mir A. Iquebal

    2017-06-01

    Full Text Available Background: Chickpea (Cicer arietinum L. contributes 75% of total pulse production. Being cheaper than animal protein, makes it important in dietary requirement of developing countries. Weed not only competes with chickpea resulting into drastic yield reduction but also creates problem of harboring fungi, bacterial diseases and insect pests. Chemical approach having new herbicide discovery has constraint of limited lead molecule options, statutory regulations and environmental clearance. Through genetic approach, transgenic herbicide tolerant crop has given successful result but led to serious concern over ecological safety thus non-transgenic approach like marker assisted selection is desirable. Since large variability in tolerance limit of herbicide already exists in chickpea varieties, thus the genes offering herbicide tolerance can be introgressed in variety improvement programme. Transcriptome studies can discover such associated key genes with herbicide tolerance in chickpea.Results: This is first transcriptomic studies of chickpea or even any legume crop using two herbicide susceptible and tolerant genotypes exposed to imidazoline (Imazethapyr. Approximately 90 million paired-end reads generated from four samples were processed and assembled into 30,803 contigs using reference based assembly. We report 6,310 differentially expressed genes (DEGs, of which 3,037 were regulated by 980 miRNAs, 1,528 transcription factors associated with 897 DEGs, 47 Hub proteins, 3,540 putative Simple Sequence Repeat-Functional Domain Marker (SSR-FDM, 13,778 genic Single Nucleotide Polymorphism (SNP putative markers and 1,174 Indels. Randomly selected 20 DEGs were validated using qPCR. Pathway analysis suggested that xenobiotic degradation related gene, glutathione S-transferase (GST were only up-regulated in presence of herbicide. Down-regulation of DNA replication genes and up-regulation of abscisic acid pathway genes were observed. Study further reveals

  17. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes).

    Science.gov (United States)

    Kukekova, Anna V; Johnson, Jennifer L; Teiling, Clotilde; Li, Lewyn; Oskina, Irina N; Kharlamova, Anastasiya V; Gulevich, Rimma G; Padte, Ravee; Dubreuil, Michael M; Vladimirova, Anastasiya V; Shepeleva, Darya V; Shikhevich, Svetlana G; Sun, Qi; Ponnala, Lalit; Temnykh, Svetlana V; Trut, Lyudmila N; Acland, Gregory M

    2011-10-03

    Two strains of the silver fox (Vulpes vulpes), with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence). Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p < 0.05) was observed for 335 genes, fewer than 3% of the total number of genes identified in the fox transcriptome. Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information.

  18. De novo sequencing, assembly, and analysis of Iris lactea var. chinensis roots' transcriptome in response to salt stress.

    Science.gov (United States)

    Gu, Chunsun; Xu, Sheng; Wang, Zhiquan; Liu, Liangqin; Zhang, Yongxia; Deng, Yanming; Huang, Suzhen

    2018-04-01

    As a halophyte, Iris lactea var. chinensis (I. lactea var. chinensis) is widely distributed and has good drought and heavy metal resistance. Moreover, it is an excellent ornamental plant. I. lactea var. chinensis has extensive application prospects owing to the global impacts of salinization. To better understand its molecular mechanism involved in salt resistance, the de novo sequencing, assembly, and analysis of I. lactea var. chinensis roots' transcriptome in response to salt-stress conditions was performed. On average, 74.17% of the clean reads were mapped to unigenes. A total of 121,093 unigenes were constructed and 56,398 (46.57%) were annotated. Among these, 13,522 differentially expressed genes (DEGs) were identified between salt-treated and control samples Compared to the transcriptional level of control, 7037 DEGs were up-regulated and 6539 down-regulated. In addition, 129 up-regulated and 1609 down-regulated genes were simultaneously detected in all three pairwise comparisons between control and salt-stressed libraries. At least 247 and 250 DEGs encoding transcription factors and transporter proteins were identified. Meanwhile, 130 DEGs regarding reactive oxygen species (ROS) scavenging system were also summarized. Based on real-time quantitative RT-PCR, we verified the changes in the expression patterns of 10 unigenes. Our study identified potential salt-responsive candidate genes and increased the understanding of halophyte responses to salinity stress. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  19. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    Energy Technology Data Exchange (ETDEWEB)

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real

  20. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    Directory of Open Access Journals (Sweden)

    Chen Qi

    2011-02-01

    Full Text Available Abstract Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs. Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010. Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were

  1. De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum Seedlings and Identification of Salt Tolerance Genes

    Directory of Open Access Journals (Sweden)

    Jiangtao Liu

    2016-03-01

    Full Text Available Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS. We further identified 1663 differentially-expressed genes (DEGs between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO, using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including “oxidation reduction”, “transcription factor activity”, and “ion channel transporter”. Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress.

  2. De novo Transcriptome Sequencing Reveals a Considerable Bias in the Incidence of Simple Sequence Repeats towards the Downstream of ‘Pre-miRNAs’ of Black Pepper

    Science.gov (United States)

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176

  3. De novo transcriptome sequencing reveals a considerable bias in the incidence of simple sequence repeats towards the downstream of 'Pre-miRNAs' of black pepper.

    Directory of Open Access Journals (Sweden)

    Nisha Joy

    Full Text Available Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L., an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs. We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.

  4. Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum) Using High Throughput Sequencing.

    Science.gov (United States)

    Farooq, Muhammad; Mansoor, Shahid; Guo, Hui; Amin, Imran; Chee, Peng W; Azim, M Kamran; Paterson, Andrew H

    2017-01-01

    MicroRNAs (miRNAs) are small 20-24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum , it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA ∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format.

  5. De novo characterization of the Dialeurodes citri transcriptome: mining genes involved in stress resistance and simple sequence repeats (SSRs) discovery.

    Science.gov (United States)

    Chen, E-H; Wei, D-D; Shen, G-M; Yuan, G-R; Bai, P-P; Wang, J-J

    2014-02-01

    The citrus whitefly, Dialeurodes citri (Ashmead), is one of the three economically important whitefly species that infest citrus plants around the world; however, limited genetic research has been focused on D. citri, partly because of lack of genomic resources. In this study, we performed de novo assembly of a transcriptome using Illumina paired-end sequencing technology (Illumina Inc., San Diego, CA, USA). In total, 36,766 unigenes with a mean length of 497 bp were identified. Of these unigenes, we identified 17,788 matched known proteins in the National Center for Biotechnology Information database, as determined by Blast search, with 5731, 4850 and 14,441 unigenes assigned to clusters of orthologous groups (COG), gene ontology (GO), and SwissProt, respectively. In total, 7507 unigenes were assigned to 308 known pathways. In-depth analysis of the data showed that 117 unigenes were identified as potentially involved in the detoxification of xenobiotics and 67 heat shock protein (Hsp) genes were associated with environmental stress. In addition, these enzymes were searched against the GO and COG database, and the results showed that the three major detoxification enzymes and Hsps were classified into 18 and 3, 6, and 8 annotations, respectively. In addition, 149 simple sequence repeats were detected. The results facilitate the investigation of molecular resistance mechanisms to insecticides and environmental stress, and contribute to molecular marker development. The findings greatly improve our genetic understanding of D. citri, and lay the foundation for future functional genomics studies on this species. © 2013 The Royal Entomological Society.

  6. Transcriptome profiling analysis of Mactra veneriformis by deep sequencing after exposure to 2,2',4,4'-tetrabromodiphenyl ether

    Science.gov (United States)

    Shi, Pengju; Dong, Shihang; Zhang, Huanjun; Wang, Peiliang; Niu, Zhuang; Fang, Yan

    2017-06-01

    Polybrominated diphenyl ethers (PBDEs) are ubiquitous global pollutants, which are known to have immune, development, reproduction, and endocrine toxicity in aquatic organisms, including bivalves. 2,2',4,4'-Tetrabromodiphenyl ether (BDE-47) is the predominant PBDE congener detected in environmental samples and the tissues of organisms. However, the mechanism of its toxicity remains unclear. In this study, high-throughput sequencing was performed using the clam Mactra veneriformis, a good model for toxicological research, to clarify the transcriptomic response to BDE-47 and the mechanism responsible for the toxicity of BDE-47. The clams were exposed to 5 μg/L BDE-47 for 3 days and the digestive glands were sampled for high-throughput sequencing analysis. We obtained 127648, 154225, and 124985 unigenes by de novo assembly of the control group reads (CG), BDE-47 group reads (BDEG), and control and BDE-47 reads (CG & BDEG), respectively. We annotated 32176 unigenes from the CG & BDEG reads using the NR database. We categorized 24401 unigenes into 25 functional COG clusters and 21749 unigenes were assigned to 259 KEGG pathways. Moreover, 17625 differentially expressed genes (DEGs) were detected, with 10028 upregulated DEGs and 7597 downregulated DEGs. Functional enrichment analysis showed that the DEGs were involved with detoxification, antioxidant defense, immune response, apoptosis, and other functions. The mRNA expression levels of 26 DEGs were verified by quantitative real-time PCR, which demonstrated the high agreement between the two methods. These results provide a good basis for future research using the M. veneriformis model into the mechanism of PBDEs toxicity and molecular biomarkers for BDE-47 pollution. The regulation and interaction of the DEGs would be studied in the future for clarifying the mechanism of PBDEs toxicity.

  7. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae), by High-Throughput Illumina Sequencing.

    Science.gov (United States)

    Wang, Jia; Xiong, Ke-Cai; Liu, Ying-Hong

    2016-01-01

    The Chinese citrus fly, Bactrocera minax (Enderlein), is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr), Swiss-Prot, Clusters of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps) genes, 26 glutathione S-transferases (GSTs) genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR) and ultraspiracle (USP), were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs) were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  8. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae, by High-Throughput Illumina Sequencing.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available The Chinese citrus fly, Bactrocera minax (Enderlein, is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr, Swiss-Prot, Clusters of Orthologous Groups (COG, Gene Ontology (GO, and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps genes, 26 glutathione S-transferases (GSTs genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR and ultraspiracle (USP, were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species.

  9. Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum Using High Throughput Sequencing

    Directory of Open Access Journals (Sweden)

    Muhammad Farooq

    2017-06-01

    Full Text Available MicroRNAs (miRNAs are small 20–24nt molecules that have been well studied over the past decade due to their important regulatory roles in different cellular processes. The mature sequences are more conserved across vast phylogenetic scales than their precursors and some are conserved within entire kingdoms, hence, their loci and function can be predicted by homology searches. Different studies have been performed to elucidate miRNAs using de novo prediction methods but due to complex regulatory mechanisms or false positive in silico predictions, not all of them express in reality and sometimes computationally predicted mature transcripts differ from the actual expressed ones. With the availability of a complete genome sequence of Gossypium arboreum, it is important to annotate the genome for both coding and non-coding regions using high confidence transcript evidence, for this cotton species that is highly resistant to various biotic and abiotic stresses. Here we have analyzed the small RNA transcriptome of G. arboreum leaves and provided genome annotation of miRNAs with evidence from miRNA/miRNA∗ transcripts. A total of 446 miRNAs clustered into 224 miRNA families were found, among which 48 families are conserved in other plants and 176 are novel. Four short RNA libraries were used to shortlist best predictions based on high reads per million. The size, origin, copy numbers and transcript depth of all miRNAs along with their isoforms and targets has been reported. The highest gene copy number was observed for gar-miR7504 followed by gar-miR166, gar-miR8771, gar-miR156, and gar-miR7484. Altogether, 1274 target genes were found in G. arboreum that are enriched for 216 KEGG pathways. The resultant genomic annotations are provided in UCSC, BED format.

  10. Comparison of grouper infection with two different iridoviruses using transcriptome sequencing and multiple reference species selection.

    Science.gov (United States)

    Liu, Chun-Cheng; Ho, Li-Ping; Yang, Cin-Hang; Kao, Tsung-Yu; Chou, Hsin-Yiu; Pai, Tun-Wen

    2017-12-01

    Due to high-density aquafarming in Taiwan, groupers are commonly infected with two different iridoviruses: Megalocytivirus (grouper iridovirus of Taiwan, TGIV) and Ranavirus (grouper iridovirus, GIV). Iridoviral diseases cause mass mortality, and surviving fish retain these pathogens, which can then be horizontally transferred. These viruses have therefore become a major challenge for grouper aquaculture. In this study, comparisons of the biological responses of groupers to infection with these two different iridoviruses were performed. A novel approach for transcriptomic analysis was proposed to enhance the discovery of differentially expressed genes and associated biological pathways. In this method, suitable and available reference species are selected from the NCBI taxonomy tree and the Ensembl and KEGG databases instead of either choosing only one model species or adopting the NCBI non-redundant dataset as references. Our results show that selection of multiple appropriate model species as references increases the efficiency and performance of analyses compared to those of traditional approaches. Using this method, 17 shared pathways and 5 specific pathways were found to be significantly differentially expressed following infection with the two iridoviruses, among which 11 pathways were additionally identified based on the proposed method of multiple reference species selection. Among the pathways responsive to infection with a specific iridovirus, the spliceosomal pathway (ko03040; p-value = 0.0011) was exclusively associated with TGIV infection, while the glycolysis/gluconeogenesis pathway (ko00010; p-value = 0.0032) was associated with GIV infection. These findings and designed corresponding biological experiments may facilitate a deeper understanding of the mechanisms by which both TGIV and GIV cause fatal infections, as well as the ways in which they induce different pathologies and symptoms. We believe that the proposed novel mechanism for de novo

  11. Whole RNA-Sequencing and Transcriptome Assembly of Candida albicans and Candida africana under Chlamydospore-Inducing Conditions.

    Science.gov (United States)

    Giosa, Domenico; Felice, Maria Rosa; Lawrence, Travis J; Gulati, Megha; Scordino, Fabio; Giuffrè, Letterio; Lo Passo, Carla; D'Alessandro, Enrico; Criseo, Giuseppe; Ardell, David H; Hernday, Aaron D; Nobile, Clarissa J; Romeo, Orazio

    2017-07-01

    Candida albicans is the most common cause of life-threatening fungal infections in humans, especially in immunocompromised individuals. Crucial to its success as an opportunistic pathogen is the considerable dynamism of its genome, which readily undergoes genetic changes generating new phenotypes and shaping the evolution of new strains. Candida africana is an intriguing C. albicans biovariant strain that exhibits remarkable genetic and phenotypic differences when compared with standard C. albicans isolates. Candida africana is well-known for its low degree of virulence compared with C. albicans and for its inability to produce chlamydospores that C. albicans, characteristically, produces under certain environmental conditions. Chlamydospores are large, spherical structures, whose biological function is still unknown. For this reason, we have sequenced, assembled, and annotated the whole transcriptomes obtained from an efficient C. albicans chlamydospore-producing clinical strain (GE1), compared with the natural chlamydospore-negative C. africana clinical strain (CBS 11016). The transcriptomes of both C. albicans (GE1) and C. africana (CBS 11016) clinical strains, grown under chlamydospore-inducing conditions, were sequenced and assembled into 7,442 (GE1 strain) and 8,370 (CBS 11016 strain) high quality transcripts, respectively. The release of the first assembly of the C. africana transcriptome will allow future comparative studies to better understand the biology and evolution of this important human fungal pathogen. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Inference of Interactions in Cyanobacterial-Heterotrophic Co-Cultures via Transcriptome Sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Beliaev, Alex S.; Romine, Margaret F.; Serres, Margaret; Bernstein, Hans C.; Linggi, Bryan E.; Markillie, Lye Meng; Isern, Nancy G.; Chrisler, William B.; Kucek, Leo A.; Hill, Eric A.; Pinchuk, Grigoriy; Bryant, Donald A.; Wiley, H. S.; Fredrickson, Jim K.; Konopka, Allan

    2014-04-29

    We employed deep sequencing technology to identify transcriptional adaptation of the euryhaline unicellular cyanobacterium Synechococcus sp. PCC 7002 and the marine facultative aerobe Shewanella putrefaciens W3-18-1 to growth in a co-culture and infer the effect of carbon flux distributions on photoautotroph-heterotroph interactions. The overall transcriptome response of both organisms to co-cultivation was shaped by their respective physiologies and growth constraints. Carbon limitation resulted in the expansion of metabolic capacities which was manifested through the transcriptional upregulation of transport and catabolic pathways. While growth coupling occurred via lactate oxidation or secretion of photosynthetically fixed carbon, there was evidence of specific metabolic interactions between the two organisms. On one hand, the production and excretion of specific amino acids (methionine and alanine) by the cyanobacterium correlated with the putative downregulation of the corresponding biosynthetic machinery of Shewanella W3-18-1. On the other hand, the broad and consistent decrease of mRNA levels for many Fe-regulated Synechococcus 7002 genes during co-cultivation suggested increased Fe availability as well as more facile and energy-efficient mechanisms for Fe acquisition by the cyanobacterium. Furthermore, evidence pointed at potentially novel interactions between oxygenic photoautotrophs and heterotrophs related to the oxidative stress response as transcriptional patterns suggested that Synechococcus 7002 rather than Shewanella W3-18-1 provided scavenging functions for reactive oxygen species under co-culture conditions. This study provides an initial insight into the complexity of photoautotrophic-heterotrophic interactions and brings new perspectives of their role in the robustness and stability of the association.

  13. Deep Sequencing Transcriptome Analysis of Murine Wound Healing: Effects of a Multicomponent, Multitarget Natural Product Therapy-Tr14.

    Science.gov (United States)

    St Laurent, Georges; Seilheimer, Bernd; Tackett, Michael; Zhou, Jianhua; Shtokalo, Dmitry; Vyatkin, Yuri; Ri, Maxim; Toma, Ian; Jones, Dan; McCaffrey, Timothy A

    2017-01-01

    Wound healing involves an orchestrated response that engages multiple processes, such as hemostasis, cellular migration, extracellular matrix synthesis, and in particular, inflammation. Using a murine model of cutaneous wound repair, the transcriptome was mapped from 12 h to 8 days post-injury, and in response to a multicomponent, multi-target natural product, Tr14. Using single-molecule RNA sequencing (RNA-seq), there were clear temporal changes in known transcripts related to wound healing pathways, and additional novel transcripts of both coding and non-coding genes. Tr14 treatment modulated >100 transcripts related to key wound repair pathways, such as response to wounding, wound contraction, and cytokine response. The results provide the most precise and comprehensive characterization to date of the transcriptome's response to skin damage, repair, and multicomponent natural product therapy. By understanding the wound repair process, and the effects of natural products, it should be possible to intervene more effectively in diseases involving aberrant repair.

  14. Identification of gene fusion transcripts by transcriptome sequencing in BRCA1-mutated breast cancers and cell lines.

    Science.gov (United States)

    Ha, Kevin C H; Lalonde, Emilie; Li, Lili; Cavallone, Luca; Natrajan, Rachael; Lambros, Maryou B; Mitsopoulos, Costas; Hakas, Jarle; Kozarewa, Iwanka; Fenwick, Kerry; Lord, Chris J; Ashworth, Alan; Vincent-Salomon, Anne; Basik, Mark; Reis-Filho, Jorge S; Majewski, Jacek; Foulkes, William D

    2011-10-27

    Gene fusions arising from chromosomal translocations have been implicated in cancer. However, the role of gene fusions in BRCA1-related breast cancers is not well understood. Mutations in BRCA1 are associated with an increased risk for breast cancer (up to 80% lifetime risk) and ovarian cancer (up to 50%). We sought to identify putative gene fusions in the transcriptomes of these cancers using high-throughput RNA sequencing (RNA-Seq). We used Illumina sequencing technology to sequence the transcriptomes of five BRCA1-mutated breast cancer cell lines, three BRCA1-mutated primary tumors, two secretory breast cancer primary tumors and one non-tumorigenic breast epithelial cell line. Using a bioinformatics approach, our initial attempt at discovering putative gene fusions relied on analyzing single-end reads and identifying reads that aligned across exons of two different genes. Subsequently, latter samples were sequenced with paired-end reads and at longer cycles (producing longer reads). We then refined our approach by identifying misaligned paired reads, which may flank a putative gene fusion junction. As a proof of concept, we were able to identify two previously characterized gene fusions in our samples using both single-end and paired-end approaches. In addition, we identified three novel in-frame fusions, but none were recurrent. Two of the candidates, WWC1-ADRBK2 in HCC3153 cell line and ADNP-C20orf132 in a primary tumor, were confirmed by Sanger sequencing and RT-PCR. RNA-Seq expression profiling of these two fusions showed a distinct overexpression of the 3' partner genes, suggesting that its expression may be under the control of the 5' partner gene's regulatory elements. In this study, we used both single-end and paired-end sequencing strategies to discover gene fusions in breast cancer transcriptomes with BRCA1 mutations. We found that the use of paired-end reads is an effective tool for transcriptome profiling of gene fusions. Our findings suggest that

  15. Identification of gene fusion transcripts by transcriptome sequencing in BRCA1-mutated breast cancers and cell lines

    Directory of Open Access Journals (Sweden)

    Ha Kevin CH

    2011-10-01

    Full Text Available Abstract Background Gene fusions arising from chromosomal translocations have been implicated in cancer. However, the role of gene fusions in BRCA1-related breast cancers is not well understood. Mutations in BRCA1 are associated with an increased risk for breast cancer (up to 80% lifetime risk and ovarian cancer (up to 50%. We sought to identify putative gene fusions in the transcriptomes of these cancers using high-throughput RNA sequencing (RNA-Seq. Methods We used Illumina sequencing technology to sequence the transcriptomes of five BRCA1-mutated breast cancer cell lines, three BRCA1-mutated primary tumors, two secretory breast cancer primary tumors and one non-tumorigenic breast epithelial cell line. Using a bioinformatics approach, our initial attempt at discovering putative gene fusions relied on analyzing single-end reads and identifying reads that aligned across exons of two different genes. Subsequently, latter samples were sequenced with paired-end reads and at longer cycles (producing longer reads. We then refined our approach by identifying misaligned paired reads, which may flank a putative gene fusion junction. Results As a proof of concept, we were able to identify two previously characterized gene fusions in our samples using both single-end and paired-end approaches. In addition, we identified three novel in-frame fusions, but none were recurrent. Two of the candidates, WWC1-ADRBK2 in HCC3153 cell line and ADNP-C20orf132 in a primary tumor, were confirmed by Sanger sequencing and RT-PCR. RNA-Seq expression profiling of these two fusions showed a distinct overexpression of the 3' partner genes, suggesting that its expression may be under the control of the 5' partner gene's regulatory elements. Conclusions In this study, we used both single-end and paired-end sequencing strategies to discover gene fusions in breast cancer transcriptomes with BRCA1 mutations. We found that the use of paired-end reads is an effective tool for

  16. Application of the whole-transcriptome shotgun sequencing approach to the study of Philadelphia-positive acute lymphoblastic leukemia

    International Nuclear Information System (INIS)

    Iacobucci, I; Ferrarini, A; Sazzini, M; Giacomelli, E; Lonetti, A; Xumerle, L; Ferrari, A; Papayannidis, C; Malerba, G; Luiselli, D; Boattini, A; Garagnani, P; Vitale, A; Soverini, S; Pane, F; Baccarani, M; Delledonne, M; Martinelli, G

    2012-01-01

    Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL

  17. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Dan-Dan Wei

    Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying

  18. The venom gland transcriptome of Latrodectus tredecimguttatus revealed by deep sequencing and cDNA library analysis.

    Directory of Open Access Journals (Sweden)

    Quanze He

    Full Text Available Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity.

  19. Sequencing and de novo assembly of the Asian clam (Corbicula fluminea) transcriptome using the Illumina GAIIx method.

    Science.gov (United States)

    Chen, Huihui; Zha, Jinmiao; Liang, Xuefang; Bu, Jihong; Wang, Miao; Wang, Zijian

    2013-01-01

    The Asian clam (Corbicula fluminea) is currently one of the most economically important aquatic species in China and has been used as a test organism in many environmental studies. However, the lack of genomic resources, such as sequenced genome, expressed sequence tags (ESTs) and transcriptome sequences has hindered the research on C. fluminea. Recent advances in large-scale RNA-Seq enable generation of genomic resources in a short time, and provide large expression datasets for functional genomic analysis. We used a next-generation high-throughput DNA sequencing technique with an Illumina GAIIx method to analyze the transcriptome from the whole bodies of C. fluminea. More than 62,250,336 high-quality reads were generated based on the raw data, and 134,684 unigenes with a mean length of 791 bp were assembled using the Velvet and Oases software. All of the assembly unigenes were annotated by running BLASTx and BLASTn similarity searches on the Nt, Nr, Swiss-Prot, COG and KEGG databases. In addition, the Clusters of Orthologous Groups (COGs), Gene Ontology (GO) terms and Kyoto Encyclopedia of Gene and Genome (KEGG) annotations were also assigned to each unigene transcript. To provide a preliminary verification of the assembly and annotation results, and search for potential environmental pollution biomarkers, 15 functional genes (five antioxidase genes, two cytochrome P450 genes, three GABA receptor-related genes and five heat shock protein genes) were cloned and identified. Expressions of the 15 selected genes following fluoxetine exposure confirmed that the genes are indeed linked to environmental stress. The C. fluminea transcriptome advances the underlying molecular understanding of this freshwater clam, provides a basis for further exploration of C. fluminea as an environmental test organism and promotes further studies on other bivalve organisms.

  20. Sequencing and de novo assembly of the Asian clam (Corbicula fluminea transcriptome using the Illumina GAIIx method.

    Directory of Open Access Journals (Sweden)

    Huihui Chen

    Full Text Available BACKGROUND: The Asian clam (Corbicula fluminea is currently one of the most economically important aquatic species in China and has been used as a test organism in many environmental studies. However, the lack of genomic resources, such as sequenced genome, expressed sequence tags (ESTs and transcriptome sequences has hindered the research on C. fluminea. Recent advances in large-scale RNA-Seq enable generation of genomic resources in a short time, and provide large expression datasets for functional genomic analysis. METHODOLOGY/PRINCIPAL FINDINGS: We used a next-generation high-throughput DNA sequencing technique with an Illumina GAIIx method to analyze the transcriptome from the whole bodies of C. fluminea. More than 62,250,336 high-quality reads were generated based on the raw data, and 134,684 unigenes with a mean length of 791 bp were assembled using the Velvet and Oases software. All of the assembly unigenes were annotated by running BLASTx and BLASTn similarity searches on the Nt, Nr, Swiss-Prot, COG and KEGG databases. In addition, the Clusters of Orthologous Groups (COGs, Gene Ontology (GO terms and Kyoto Encyclopedia of Gene and Genome (KEGG annotations were also assigned to each unigene transcript. To provide a preliminary verification of the assembly and annotation results, and search for potential environmental pollution biomarkers, 15 functional genes (five antioxidase genes, two cytochrome P450 genes, three GABA receptor-related genes and five heat shock protein genes were cloned and identified. Expressions of the 15 selected genes following fluoxetine exposure confirmed that the genes are indeed linked to environmental stress. CONCLUSIONS/SIGNIFICANCE: The C. fluminea transcriptome advances the underlying molecular understanding of this freshwater clam, provides a basis for further exploration of C. fluminea as an environmental test organism and promotes further studies on other bivalve organisms.

  1. Identification of expressed resistance gene-like sequences by data mining in 454-derived transcriptomic sequences of common bean (Phaseolus vulgaris L.)

    Science.gov (United States)

    2012-01-01

    Background Common bean (Phaseolus vulgaris L.) is one of the most important legumes in the world. Several diseases severely reduce bean production and quality; therefore, it is very important to better understand disease resistance in common bean in order to prevent these losses. More than 70 resistance (R) genes which confer resistance against various pathogens have been cloned from diverse plant species. Most R genes share highly conserved domains which facilitates the identification of new candidate R genes from the same species or other species. The goals of this study were to isolate expressed R gene-like sequences (RGLs) from 454-derived transcriptomic sequences and expressed sequence tags (ESTs) of common bean, and to develop RGL-tagged molecular markers. Results A data-mining approach was used to identify tentative P. vulgaris R gene-like sequences from approximately 1.69 million 454-derived sequences and 116,716 ESTs deposited in GenBank. A total of 365 non-redundant sequences were identified and named as common bean (P. vulgaris = Pv) resistance gene-like sequences (PvRGLs). Among the identified PvRGLs, about 60% (218 PvRGLs) were from 454-derived sequences. Reverse transcriptase-polymerase chain reaction (RT-PCR) analysis confirmed that PvRGLs were actually expressed in the leaves of common bean. Upon comparison to P. vulgaris genomic sequences, 105 (28.77%) of the 365 tentative PvRGLs could be integrated into the existing common bean physical map. Based on the syntenic blocks between common bean and soybean, 237 (64.93%) PvRGLs were anchored on the P. vulgaris genetic map and will need to be mapped to determine order. In addition, 11 sequence-tagged-site (STS) and 19 cleaved amplified polymorphic sequence (CAPS) molecular markers were developed for 25 unique PvRGLs. Conclusions In total, 365 PvRGLs were successfully identified from 454-derived transcriptomic sequences and ESTs available in GenBank and about 65% of PvRGLs were integrated into the common

  2. Identification of expressed resistance gene-like sequences by data mining in 454-derived transcriptomic sequences of common bean (Phaseolus vulgaris L.

    Directory of Open Access Journals (Sweden)

    Liu Zhanji

    2012-03-01

    Full Text Available Abstract Background Common bean (Phaseolus vulgaris L. is one of the most important legumes in the world. Several diseases severely reduce bean production and quality; therefore, it is very important to better understand disease resistance in common bean in order to prevent these losses. More than 70 resistance (R genes which confer resistance against various pathogens have been cloned from diverse plant species. Most R genes share highly conserved domains which facilitates the identification of new candidate R genes from the same species or other species. The goals of this study were to isolate expressed R gene-like sequences (RGLs from 454-derived transcriptomic sequences and expressed sequence tags (ESTs of common bean, and to develop RGL-tagged molecular markers. Results A data-mining approach was used to identify tentative P. vulgaris R gene-like sequences from approximately 1.69 million 454-derived sequences and 116,716 ESTs deposited in GenBank. A total of 365 non-redundant sequences were identified and named as common bean (P. vulgaris = Pv resistance gene-like sequences (PvRGLs. Among the identified PvRGLs, about 60% (218 PvRGLs were from 454-derived sequences. Reverse transcriptase-polymerase chain reaction (RT-PCR analysis confirmed that PvRGLs were actually expressed in the leaves of common bean. Upon comparison to P. vulgaris genomic sequences, 105 (28.77% of the 365 tentative PvRGLs could be integrated into the existing common bean physical map. Based on the syntenic blocks between common bean and soybean, 237 (64.93% PvRGLs were anchored on the P. vulgaris genetic map and will need to be mapped to determine order. In addition, 11 sequence-tagged-site (STS and 19 cleaved amplified polymorphic sequence (CAPS molecular markers were developed for 25 unique PvRGLs. Conclusions In total, 365 PvRGLs were successfully identified from 454-derived transcriptomic sequences and ESTs available in GenBank and about 65% of PvRGLs were integrated

  3. Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.

    Science.gov (United States)

    Feldmeyer, Barbara; Wheat, Christopher W; Krezdorn, Nicolas; Rotter, Björn; Pfenninger, Markus

    2011-06-16

    Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology.

  4. Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata, and a comparison of assembler performance

    Directory of Open Access Journals (Sweden)

    Rotter Björn

    2011-06-01

    Full Text Available Abstract Background Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. Results A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Conclusion Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality

  5. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  6. Transcriptome Characterization for Non-Model Endangered Lycaenids, Protantigius superans and Spindasis takanosis, Using Illumina HiSeq 2500 Sequencing

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    2015-12-01

    Full Text Available The Lycaenidae butterflies, Protantigius superans and Spindasis takanosis, are endangered insects in Korea known for their symbiotic association with ants. However, necessary genomic and transcriptomics data are lacking in these species, limiting conservation efforts. In this study, the P. superans and S. takanosis transcriptomes were deciphered using Illumina HiSeq 2500 sequencing. The P. superans and S. takanosis transcriptome data included a total of 254,340,693 and 245,110,582 clean reads assembled into 159,074 and 170,449 contigs and 107,950 and 121,140 unigenes, respectively. BLASTX hits (E-value of 1.0 × 10−5 against the known protein databases annotated a total of 46,754 and 51,908 transcripts for P. superans and S. takanosis. Approximately 41.25% and 38.68% of the unigenes for P. superans and S. takanosis found homologous sequences in Protostome DB (PANM-DB. BLAST2GO analysis confirmed 18,611 unigenes representing Gene Ontology (GO terms and a total of 5259 unigenes assigned to 116 pathways for P. superans. For S. takanosis, a total of 6697 unigenes were assigned to 119 pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG pathway database. Additionally, 382,164 and 390,516 Simple Sequence Repeats (SSRs were compiled from the unigenes of P. superans and S. takanosis, respectively. This is the first report to record new genes and their utilization for conservation of lycaenid species population and as a reference information for closely related species.

  7. Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Jian [Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute of Sichuan Agricultural University, Wenjiang, Sichuan (China); Luo, Mao [Drug Discovery Research Center of Luzhou Medical College, Luzhou, Sichuan (China); Zhu, Ye; He, Ying; Wang, Qin [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China); Zhang, Chun, E-mail: zc83good@126.com [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China)

    2015-03-27

    Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.

  8. Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

    International Nuclear Information System (INIS)

    Gao, Jian; Luo, Mao; Zhu, Ye; He, Ying; Wang, Qin; Zhang, Chun

    2015-01-01

    Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR

  9. Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Cannon Charles H

    2011-07-01

    Full Text Available Abstract Background Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168 and one legume-specific family (miR2086. Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs in the transcriptomes of A. auriculiformis and A. mangium

  10. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes

    Directory of Open Access Journals (Sweden)

    Sun Qi

    2011-10-01

    Full Text Available Abstract Background Two strains of the silver fox (Vulpes vulpes, with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. Results cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence. Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p Conclusions Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information.

  11. De novo sequencing and analysis of the cranberry fruit transcriptome to identify putative genes involved in flavonoid biosynthesis, transport and regulation.

    Science.gov (United States)

    Sun, Haiyue; Liu, Yushan; Gai, Yuzhuo; Geng, Jinman; Chen, Li; Liu, Hongdi; Kang, Limin; Tian, Youwen; Li, Yadong

    2015-09-02

    Cranberries (Vaccinium macrocarpon Ait.), renowned for their excellent health benefits, are an important berry crop. Here, we performed transcriptome sequencing of one cranberry cultivar, from fruits at two different developmental stages, on the Illumina HiSeq 2000 platform. Our main goals were to identify putative genes for major metabolic pathways of bioactive compounds and compare the expression patterns between white fruit (W) and red fruit (R) in cranberry. In this study, two cDNA libraries of W and R were constructed. Approximately 119 million raw sequencing reads were generated and assembled de novo, yielding 57,331 high quality unigenes with an average length of 739 bp. Using BLASTx, 38,460 unigenes were identified as putative homologs of annotated sequences in public protein databases, including NCBI NR, NT, Swiss-Prot, KEGG, COG and GO. Of these, 21,898 unigenes mapped to 128 KEGG pathways, with the metabolic pathways, secondary metabolites, glycerophospholipid metabolism, ether lipid metabolism, starch and sucrose metabolism, purine metabolism, and pyrimidine metabolism being well represented. Among them, many candidate genes were involved in flavonoid biosynthesis, transport and regulation. Furthermore, digital gene expression (DEG) analysis identified 3,257 unigenes that were differentially expressed between the two fruit developmental stages. In addition, 14,473 simple sequence repeats (SSRs) were detected. Our results present comprehensive gene expression information about the cranberry fruit transcriptome that could facilitate our understanding of the molecular mechanisms of fruit development in cranberries. Although it will be necessary to validate the functions carried out by these genes, these results could be used to improve the quality of breeding programs for the cranberry and related species.

  12. High-throughput sequencing of microRNA transcriptome and expression assay in the sturgeon, Acipenser schrenckii.

    Directory of Open Access Journals (Sweden)

    Lihong Yuan

    Full Text Available Sturgeons are considered as living fossils and have very high evolutionary, economical and conservation values. The multiploidy of sturgeon that has been caused by chromosome duplication may lead to the emergence of new microRNAs (miRNAs involved in the ploidy and physiological processes. In the present study, we performed the first sturgeon miRNAs analysis by RNA-seq high-throughput sequencing combined with expression assay of microarray and real-time PCR, and aimed to discover the sturgeon-specific miRNAs, confirm the expressed pattern of miRNAs and illustrate the potential role of miRNAs-targets on sturgeon biological processes. A total of 103 miRNAs were identified, including 58 miRNAs with strongly detected signals (signal >500 and P≤0.01, which were detected by microarray. Real-time PCR assay supported the expression pattern obtained by microarray. Moreover, co-expression of 21 miRNAs in all five tissues and tissue-specific expression of 16 miRNAs implied the crucial and particular function of them in sturgeon physiological processes. Target gene prediction, especially the enriched functional gene groups (369 GO terms and pathways (37 KEGG regulated by 58 miRNAs (P<0.05, illustrated the interaction of miRNAs and putative mRNAs, and also the potential mechanism involved in these biological processes. Our new findings of sturgeon miRNAs expand the public database of transcriptome information for this species, contribute to our understanding of sturgeon biology, and also provide invaluable data that may be applied in sturgeon breeding.

  13. High-throughput sequencing of microRNA transcriptome and expression assay in the sturgeon, Acipenser schrenckii.

    Science.gov (United States)

    Yuan, Lihong; Zhang, Xiujuan; Li, Linmiao; Jiang, Haiying; Chen, Jinping

    2014-01-01

    Sturgeons are considered as living fossils and have very high evolutionary, economical and conservation values. The multiploidy of sturgeon that has been caused by chromosome duplication may lead to the emergence of new microRNAs (miRNAs) involved in the ploidy and physiological processes. In the present study, we performed the first sturgeon miRNAs analysis by RNA-seq high-throughput sequencing combined with expression assay of microarray and real-time PCR, and aimed to discover the sturgeon-specific miRNAs, confirm the expressed pattern of miRNAs and illustrate the potential role of miRNAs-targets on sturgeon biological processes. A total of 103 miRNAs were identified, including 58 miRNAs with strongly detected signals (signal >500 and P≤0.01), which were detected by microarray. Real-time PCR assay supported the expression pattern obtained by microarray. Moreover, co-expression of 21 miRNAs in all five tissues and tissue-specific expression of 16 miRNAs implied the crucial and particular function of them in sturgeon physiological processes. Target gene prediction, especially the enriched functional gene groups (369 GO terms) and pathways (37 KEGG) regulated by 58 miRNAs (P<0.05), illustrated the interaction of miRNAs and putative mRNAs, and also the potential mechanism involved in these biological processes. Our new findings of sturgeon miRNAs expand the public database of transcriptome information for this species, contribute to our understanding of sturgeon biology, and also provide invaluable data that may be applied in sturgeon breeding.

  14. RNA sequencing (RNA-Seq of lymph node, spleen, and thymus transcriptome from wild Peninsular Malaysian cynomolgus macaque (Macaca fascicularis

    Directory of Open Access Journals (Sweden)

    Joey Ee Uli

    2017-08-01

    Full Text Available The cynomolgus macaque (Macaca fascicularis is an extensively utilised nonhuman primate model for biomedical research due to its biological, behavioural, and genetic similarities to humans. Genomic information of cynomolgus macaque is vital for research in various fields; however, there is presently a shortage of genomic information on the Malaysian cynomolgus macaque. This study aimed to sequence, assemble, annotate, and profile the Peninsular Malaysian cynomolgus macaque transcriptome derived from three tissues (lymph node, spleen, and thymus using RNA sequencing (RNA-Seq technology. A total of 174,208,078 paired end 70 base pair sequencing reads were obtained from the Illumina Hi-Seq 2500 sequencer. The overall mapping percentage of the sequencing reads to the M. fascicularis reference genome ranged from 53–63%. Categorisation of expressed genes to Gene Ontology (GO and KEGG pathway categories revealed that GO terms with the highest number of associated expressed genes include Cellular process, Catalytic activity, and Cell part, while for pathway categorisation, the majority of expressed genes in lymph node, spleen, and thymus fall under the Global overview and maps pathway category, while 266, 221, and 138 genes from lymph node, spleen, and thymus were respectively enriched in the Immune system category. Enriched Immune system pathways include Platelet activation pathway, Antigen processing and presentation, B cell receptor signalling pathway, and Intestinal immune network for IgA production. Differential gene expression analysis among the three tissues revealed 574 differentially expressed genes (DEG between lymph and spleen, 5402 DEGs between lymph and thymus, and 7008 DEGs between spleen and thymus. Venn diagram analysis of expressed genes revealed a total of 2,630, 253, and 279 tissue-specific genes respectively for lymph node, spleen, and thymus tissues. This is the first time the lymph node, spleen, and thymus transcriptome of the

  15. Combining proteomics and transcriptome sequencing to identify active plant-cell-wall-degrading enzymes in a leaf beetle

    Directory of Open Access Journals (Sweden)

    Kirsch Roy

    2012-11-01

    Full Text Available Abstract Background The primary plant cell wall is a complex mixture of polysaccharides and proteins encasing living plant cells. Among these polysaccharides, cellulose is the most abundant and useful biopolymer present on earth. These polysaccharides also represent a rich source of energy for organisms which have evolved the ability to degrade them. A growing body of evidence suggests that phytophagous beetles, mainly species from the superfamilies Chrysomeloidea and Curculionoidea, possess endogenous genes encoding complex and diverse families of so-called plant cell wall degrading enzymes (PCWDEs. The presence of these genes in phytophagous beetles may have been a key element in their success as herbivores. Here, we combined a proteomics approach and transcriptome sequencing to identify PCWDEs present in larval gut contents of the mustard leaf beetle, Phaedon cochleariae. Results Using a two-dimensional proteomics approach, we recovered 11 protein bands, isolated using activity assays targeting cellulose-, pectin- and xylan-degrading enzymes. After mass spectrometry analyses, a total of 13 proteins putatively responsible for degrading plant cell wall polysaccharides were identified; these proteins belong to three glycoside hydrolase (GH families: GH11 (xylanases, GH28 (polygalacturonases or pectinases, and GH45 (β-1,4-glucanases or cellulases. Additionally, highly stable and proteolysis-resistant host plant-derived proteins from various pathogenesis-related protein (PRs families as well as polygalacturonase-inhibiting proteins (PGIPs were also identified from the gut contents proteome. In parallel, transcriptome sequencing revealed the presence of at least 19 putative PCWDE transcripts encoded by the P. cochleariae genome. All of these were specifically expressed in the insect gut rather than the rest of the body, and in adults as well as larvae. The discrepancy observed in the number of putative PCWDEs between transcriptome and proteome

  16. Transcriptome characterization and sequencing-based identification of salt-responsive genes in Millettia pinnata, a semi-mangrove plant.

    Science.gov (United States)

    Huang, Jianzi; Lu, Xiang; Yan, Hao; Chen, Shouyi; Zhang, Wanke; Huang, Rongfeng; Zheng, Yizhi

    2012-04-01

    Semi-mangroves form a group of transitional species between glycophytes and halophytes, and hold unique potential for learning molecular mechanisms underlying plant salt tolerance. Millettia pinnata is a semi-mangrove plant that can survive a wide range of saline conditions in the absence of specialized morphological and physiological traits. By employing the Illumina sequencing platform, we generated ~192 million short reads from four cDNA libraries of M. pinnata and processed them into 108,598 unisequences with a high depth of coverage. The mean length and total length of these unisequences were 606 bp and 65.8 Mb, respectively. A total of 54,596 (50.3%) unisequences were assigned Nr annotations. Functional classification revealed the involvement of unisequences in various biological processes related to metabolism and environmental adaptation. We identified 23,815 candidate salt-responsive genes with significantly differential expression under seawater and freshwater treatments. Based on the reverse transcription-polymerase chain reaction (RT-PCR) and real-time PCR analyses, we verified the changes in expression levels for a number of candidate genes. The functional enrichment analyses for the candidate genes showed tissue-specific patterns of transcriptome remodelling upon salt stress in the roots and the leaves. The transcriptome of M. pinnata will provide valuable gene resources for future application in crop improvement. In addition, this study sets a good example for large-scale identification of salt-responsive genes in non-model organisms using the sequencing-based approach.

  17. Deep transcriptome sequencing reveals differences in global gene expression between normal and pale, soft, and exudative turkey meat.

    Science.gov (United States)

    Malila, Y; Carr, K M; Ernst, C W; Velleman, S G; Reed, K M; Strasburg, G M

    2014-03-01

    Previous studies from our laboratory suggested that differential expression of genes between normal and pale, soft, and exudative (PSE) turkey is associated with development of the PSE syndrome. However, a detailed understanding of molecular mechanisms responsible for the development of this meat defect remains unclear. The objective of this study was to extend and complement our previous work by using deep transcriptome RNA sequence analysis to compare the respective transcriptome profiles and identify molecular mechanisms responsible for the etiology of PSE turkey meat. Turkey breasts (n = 43) were previously classified as normal or PSE using marinade uptake as an indicator of quality (high = normal; low = PSE). Total RNA from breast muscle samples with the highest (n = 4) and lowest (n = 4) marinade uptake were isolated and sequenced using the Illumina GA(IIX) platform. The results indicated differential expression of 494 loci (false discovery rate turkey was suggested by both dramatic downregulation of pyruvate dehydrogenase kinase, isozyme 4 (PDK4) mRNA, the most downregulated gene, and a decrease in the protein product (P = 0.0007) as determined by immunoblot analysis. These results support the hypothesis that differential expression of several genes and their protein products contribute to development of PSE turkey.

  18. De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube

    Directory of Open Access Journals (Sweden)

    Domenico Iaria

    2016-01-01

    Full Text Available In olive (Olea europaea L., the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca2+ binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.

  19. De novo assembly and characterization of the transcriptome of the pancreatic fluke Eurytrema pancreaticum (trematoda: Dicrocoeliidae) using Illumina paired-end sequencing.

    Science.gov (United States)

    Liu, Guo-Hua; Xu, Min-Jun; Song, Hui-Qun; Wang, Chun-Ren; Zhu, Xing-Quan

    2016-01-15

    Eurytrema pancreaticum is one of the most common trematodes living in the pancreatic and bile ducts of ruminants and also occasionally infects humans, causing eurytremiasis. In spite of its economic and medical importance, very little is known about the genomic resources of this parasite. Herein, we performed de novo sequencing, assembly and characterization of the transcriptome of adult E. pancreaticum. Approximately 36.4 million high-quality clean reads were obtained, and the length of the transcript contigs ranged from 66 to 19,968 nt with mean length of 479 nt and N50 length of 1094 nt, and then 23,573 unigenes were assembled. Of these unigenes, 15,353 (65.1%) were annotated by blast searches against the NCBI non-redundant protein database. Among these, 15,267 (64.8%), 2732 (11.6%) and 10,354 (43.9%) of the unigenes had significant similarity with proteins in the NR, NT and Swiss-Prot databases, respectively. 5510 (23.4%) and 4567 (19.4%) unigenes were assigned to GO and COG, respectively. 8886 (37.7%) unigenes were identified and mapped onto 254 pathways in the KEGG Pathway database. Furthermore, we found that 105 (1.18%) unigenes were related to pancreatic secretion and 61 (0.7%) to pancreatic cancer. The present study represents the first transcriptome of any members of the family Dicrocoeliidae, which has little genomic information available in the public databases. The novel transcriptome of E. pancreaticum should provide a useful resource for designing new strategies against pancreatic flukes and other trematodes of human and animal health significance. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Transcriptome analysis of Houttuynia cordata Thunb. by Illumina paired-end RNA sequencing and SSR marker discovery.

    Directory of Open Access Journals (Sweden)

    Lin Wei

    Full Text Available BACKGROUND: Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. PRINCIPAL FINDINGS: Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52% and 26,122 (40.84% of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10(-5, respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21% unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93% unigenes showed homology to Vitis vinifera (Vitaceae genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86% produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. CONCLUSIONS: This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for

  1. Dual RNA sequencing reveals the expression of unique transcriptomic signatures in lipopolysaccharide-induced BV-2 microglial cells.

    Directory of Open Access Journals (Sweden)

    Amitabh Das

    Full Text Available Microglial cells become rapidly activated through interactions with pathogens, and the persistent activation of these cells is associated with various neurodegenerative diseases. Previous studies have investigated the transcriptomic signatures in microglia or macrophages using microarray technologies. However, this method has numerous restrictions, such as spatial biases, uneven probe properties, low sensitivity, and dependency on the probes spotted. To overcome this limitation and identify novel transcribed genes in response to LPS, we used RNA Sequencing (RNA-Seq to determine the novel transcriptomic signatures in BV-2 microglial cells. Sequencing assessment and quality evaluation showed that approximately 263 and 319 genes (≥ 1.5 log2-fold, such as cytokines and chemokines, were strongly induced after 2 and 4 h, respectively, and the induction of several genes with unknown immunological functions was also observed. Importantly, we observed that previously unidentified transcription factors (TFs (irf1, irf7, and irf9, histone demethylases (kdm4a and DNA methyltransferases (dnmt3l were significantly and selectively expressed in BV-2 microglial cells. The gene expression levels, transcription start sites (TSS, isoforms, and differential promoter usage revealed a complex pattern of transcriptional and post-transcriptional gene regulation upon infection with LPS. In addition, gene ontology, molecular networks and pathway analyses identified the top significantly regulated functional classification, canonical pathways and network functions at each activation status. Moreover, we further analyzed differentially expressed genes to identify transcription factor (TF motifs (-950 to +50 bp of the 5' upstream promoters and epigenetic mechanisms. Furthermore, we confirmed that the expressions of key inflammatory genes as well as pro-inflammatory mediators in the supernatants were significantly induced in LPS treated primary microglial cells. This

  2. Identification of SSRs and differentially expressed genes in two cultivars of celery (Apium graveolens L.) by deep transcriptome sequencing.

    Science.gov (United States)

    Li, Meng-Yao; Wang, Feng; Jiang, Qian; Ma, Jing; Xiong, Ai-Sheng

    2014-01-01

    Celery (Apium graveolens L.) is one of the most important and widely grown vegetables in the Apiaceae family. Due to the lack of comprehensive genomic resources, research on celery has mainly utilized physiological and biochemical approaches, rather than molecular biology, to study this crop. Transcriptome sequencing has become an efficient and economic technology for obtaining information on gene expression that can greatly facilitate molecular and genomic studies of species for which a sequenced genome is not available. In the present study, 15 893 516 and 19 818 161 high-quality sequences were obtained by RNA-seq from two celery varieties 'Ventura' and 'Jinnan Shiqin', respectively. The obtained reads were assembled into 39 584 and 41 740 unigenes with mean lengths of 683 bp and 690 bp, respectively. A total of 1939 simple sequence repeat (SSR) markers were identified in 'Ventura' and 2004 SSRs in 'Jinnan Shiqin'. Di-nucleotide repeats were the most common repeat motif, accounting for 55.49% and 54.84% in 'Ventura' and 'Jinnan Shiqin', respectively. A comparison of expressed genes between the two libraries, identified 338 differentially expressed genes (DEGs). Three hundred and three of the DEGs were annotated based on a sequence similarity search utilizing eight public databases. Additionally, the expression profile of eight annotated DEGs was characterized in response to abiotic stresses. The collective data generated in the present research represent a valuable resource for further genetic and molecular studies in celery.

  3. Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability

    Directory of Open Access Journals (Sweden)

    Holder Thomas

    2013-01-01

    Full Text Available Abstract Background Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation. Results In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias, which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals. Conclusions Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation. Reviewer This article was reviewed by Sándor Pongor, L. Aravind and Anthony M. Poole.

  4. High-throughput sequencing and pathway analysis reveal alteration of the pituitary transcriptome by 17α-ethynylestradiol (EE2) in female coho salmon, Oncorhynchus kisutch

    Energy Technology Data Exchange (ETDEWEB)

    Harding, Louisa B. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Schultz, Irvin R. [Battelle, Marine Sciences Laboratory – Pacific Northwest National Laboratory, 1529 West Sequim Bay Road, Sequim, WA 98382 (United States); Goetz, Giles W. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Luckenbach, J. Adam [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Young, Graham [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Goetz, Frederick W. [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Manchester Research Station, P.O. Box 130, Manchester, WA 98353 (United States); Swanson, Penny, E-mail: penny.swanson@noaa.gov [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States)

    2013-10-15

    Highlights: •Studied impacts of ethynylestradiol (EE2) exposure on salmon pituitary transcriptome. •High-throughput sequencing, RNAseq, and pathway analysis were performed. •EE2 altered mRNAs for genes in circadian rhythm, GnRH, and TGFβ signaling pathways. •LH and FSH beta subunit mRNAs were most highly up- and down-regulated by EE2, respectively. •Estrogens may alter processes associated with reproductive timing in salmon. -- Abstract: Considerable research has been done on the effects of endocrine disrupting chemicals (EDCs) on reproduction and gene expression in the brain, liver and gonads of teleost fish, but information on impacts to the pituitary gland are still limited despite its central role in regulating reproduction. The aim of this study was to further our understanding of the potential effects of natural and synthetic estrogens on the brain–pituitary–gonad axis in fish by determining the effects of 17α-ethynylestradiol (EE2) on the pituitary transcriptome. We exposed sub-adult coho salmon (Oncorhynchus kisutch) to 0 or 12 ng EE2/L for up to 6 weeks and effects on the pituitary transcriptome of females were assessed using high-throughput Illumina{sup ®} sequencing, RNA-Seq and pathway analysis. After 1 or 6 weeks, 218 and 670 contiguous sequences (contigs) respectively, were differentially expressed in pituitaries of EE2-exposed fish relative to control. Two of the most highly up- and down-regulated contigs were luteinizing hormone β subunit (241-fold and 395-fold at 1 and 6 weeks, respectively) and follicle-stimulating hormone β subunit (−3.4-fold at 6 weeks). Additional contigs related to gonadotropin synthesis and release were differentially expressed in EE2-exposed fish relative to controls. These included contigs involved in gonadotropin releasing hormone (GNRH) and transforming growth factor-β signaling. There was an over-representation of significantly affected contigs in 33 and 18 canonical pathways at 1 and 6 weeks

  5. A deep sequencing approach to comparatively analyze the transcriptome of lifecycle stages of the filarial worm, Brugia malayi.

    Directory of Open Access Journals (Sweden)

    Young-Jun Choi

    2011-12-01

    Full Text Available Developing intervention strategies for the control of parasitic nematodes continues to be a significant challenge. Genomic and post-genomic approaches play an increasingly important role for providing fundamental molecular information about these parasites, thus enhancing basic as well as translational research. Here we report a comprehensive genome-wide survey of the developmental transcriptome of the human filarial parasite Brugia malayi.Using deep sequencing, we profiled the transcriptome of eggs and embryos, immature (≤3 days of age and mature microfilariae (MF, third- and fourth-stage larvae (L3 and L4, and adult male and female worms. Comparative analysis across these stages provided a detailed overview of the molecular repertoires that define and differentiate distinct lifecycle stages of the parasite. Genome-wide assessment of the overall transcriptional variability indicated that the cuticle collagen family and those implicated in molting exhibit noticeably dynamic stage-dependent patterns. Of particular interest was the identification of genes displaying sex-biased or germline-enriched profiles due to their potential involvement in reproductive processes. The study also revealed discrete transcriptional changes during larval development, namely those accompanying the maturation of MF and the L3 to L4 transition that are vital in establishing successful infection in mosquito vectors and vertebrate hosts, respectively.Characterization of the transcriptional program of the parasite's lifecycle is an important step toward understanding the developmental processes required for the infectious cycle. We find that the transcriptional program has a number of stage-specific pathways activated during worm development. In addition to advancing our understanding of transcriptome dynamics, these data will aid in the study of genome structure and organization by facilitating the identification of novel transcribed elements and splice variants.

  6. Characterization of the small RNA transcriptomes of androgen dependent and independent prostate cancer cell line by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Gang Xu

    2010-11-01

    Full Text Available Given the important roles of miRNA in post-transcriptional regulation and its implications for cancer, characterization of miRNA facilitates us to uncover molecular mechanisms underlying the progression of androgen-independent prostate cancer (PCa. The emergence of next-generation sequencing technologies has dramatically changed the speed of all aspects of sequencing in a rapid and cost-effective fashion, which can permit an unbiased, quantitive and in-depth investigation of small RNA transcriptome. In this study, we used high-throughput Illumina sequencing to comprehensively represent the full complement of individual small RNA and to characterize miRNA expression profiles in both the androgen dependent and independent Pca cell line. At least 83 miRNAs are significantly differentially expressed, of which 41 are up-regulated and 42 are down-regulated, indicating these miRNAs may be involved in the transition of LNCaP to an androgen-independent phenotype. In addition, we have identified 43 novel miRNAs from the androgen dependent and independent PCa library and 3 of them are specific to the androgen-independent PCa. Function annotation of target genes indicated that most of these differentially expressed miRNAs tend to target genes involved in signal transduction and cell communication, epically the MAPK signaling pathway. The small RNA transcriptomes obtained in this study provide considerable insights into a better understanding of the expression and function of small RNAs in the development of androgen-independent prostate cancer.

  7. De novo transcriptome sequencing of Isaria cateniannulata and comparative analysis of gene expression in response to heat and cold stresses.

    Directory of Open Access Journals (Sweden)

    Dingfeng Wang

    Full Text Available Isaria cateniannulata is a very important and virulent entomopathogenic fungus that infects many insect pest species. Although I. cateniannulata is commonly exposed to extreme environmental temperature conditions, little is known about its molecular response mechanism to temperature stress. Here, we sequenced and de novo assembled the transcriptome of I. cateniannulata in response to high and low temperature stresses using Illumina RNA-Seq technology. Our assembly encompassed 17,514 unigenes (mean length = 1,197 bp, in which 11,445 unigenes (65.34% showed significant similarities to known sequences in NCBI non-redundant protein sequences (Nr database. Using digital gene expression analysis, 4,483 differentially expressed genes (DEGs were identified after heat treatment, including 2,905 up-regulated genes and 1,578 down-regulated genes. Under cold stress, 1,927 DEGs were identified, including 1,245 up-regulated genes and 682 down-regulated genes. The expression patterns of 18 randomly selected candidate DEGs resulting from quantitative real-time PCR (qRT-PCR were consistent with their transcriptome analysis results. Although DEGs were involved in many pathways, we focused on the genes that were involved in endocytosis: In heat stress, the pathway of clathrin-dependent endocytosis (CDE was active; however at low temperature stresses, the pathway of clathrin-independent endocytosis (CIE was active. Besides, four categories of DEGs acting as temperature sensors were observed, including cell-wall-major-components-metabolism-related (CWMCMR genes, heat shock protein (Hsp genes, intracellular-compatible-solutes-metabolism-related (ICSMR genes and glutathione S-transferase (GST. These results enhance our understanding of the molecular mechanisms of I. cateniannulata in response to temperature stresses and provide a valuable resource for the future investigations.

  8. Genes involved in sex pheromone biosynthesis of Ephestia cautella, an important food storage pest, are determined by transcriptome sequencing

    KAUST Repository

    Antony, Binu

    2015-07-18

    Background Insects use pheromones, chemical signals that underlie all animal behaviors, for communication and for attracting mates. Synthetic pheromones are widely used in pest control strategies because they are environmentally safe. The production of insect pheromones in transgenic plants, which could be more economical and effective in producing isomerically pure compounds, has recently been successfully demonstrated. This research requires information regarding the pheromone biosynthetic pathways and the characterization of pheromone biosynthetic enzymes (PBEs). We used Illumina sequencing to characterize the pheromone gland (PG) transcriptome of the Pyralid moth, Ephestia cautella, a destructive storage pest, to reveal putative candidate genes involved in pheromone biosynthesis, release, transport and degradation. Results We isolated the E. cautella pheromone compound as (Z,E)-9,12-tetradecadienyl acetate, and the major pheromone precursors 16:acyl, 14:acyl, E14-16:acyl, E12-14:acyl and Z9,E12-14:acyl. Based on the abundance of precursors, two possible pheromone biosynthetic pathways are proposed. Both pathways initiate from C16:acyl-CoA, with one involving ∆14 and ∆9 desaturation to generate Z9,E12-14:acyl, and the other involving the chain shortening of C16:acyl-CoA to C14:acyl-CoA, followed by ∆12 and ∆9 desaturation to generate Z9,E12-14:acyl-CoA. Then, a final reduction and acetylation generates Z9,E12-14:OAc. Illumina sequencing yielded 83,792 transcripts, and we obtained a PG transcriptome of ~49.5 Mb. A total of 191 PBE transcripts, which included pheromone biosynthesis activating neuropeptides, fatty acid transport proteins, acetyl-CoA carboxylases, fatty acid synthases, desaturases, β-oxidation enzymes, fatty acyl-CoA reductases (FARs) and fatty acetyltransferases (FATs), were selected from the dataset. A comparison of the E. cautella transcriptome data with three other Lepidoptera PG datasets revealed that 45 % of the sequences were shared

  9. Transcriptomic Crosstalk between Fungal Invasive Pathogens and Their Host Cells: Opportunities and Challenges for Next-Generation Sequencing Methods

    Directory of Open Access Journals (Sweden)

    Francisco J. Enguita

    2016-01-01

    Full Text Available Fungal invasive infections are an increasing health problem. The intrinsic complexity of pathogenic fungi and the unmet clinical need for new and more effective treatments requires a detailed knowledge of the infection process. During infection, fungal pathogens are able to trigger a specific transcriptional program in their host cells. The detailed knowledge of this transcriptional program will allow for a better understanding of the infection process and consequently will help in the future design of more efficient therapeutic strategies. Simultaneous transcriptomic studies of pathogen and host by high-throughput sequencing (dual RNA-seq is an unbiased protocol to understand the intricate regulatory networks underlying the infectious process. This protocol is starting to be applied to the study of the interactions between fungal pathogens and their hosts. To date, our knowledge of the molecular basis of infection for fungal pathogens is still very limited, and the putative role of regulatory players such as non-coding RNAs or epigenetic factors remains elusive. The wider application of high-throughput transcriptomics in the near future will help to understand the fungal mechanisms for colonization and survival, as well as to characterize the molecular responses of the host cell against a fungal infection.

  10. Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2014-10-01

    Full Text Available As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript—Slc2a9, likely related to desert osmoregulation—undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

  11. Transcriptome sequencing and development of an expression microarray platform for liver infection in adenovirus type 5-infected Syrian golden hamsters.

    Science.gov (United States)

    Ying, Baoling; Toth, Karoly; Spencer, Jacqueline F; Aurora, Rajeev; Wold, William S M

    2015-11-01

    The Syrian golden hamster is an attractive animal for research on infectious diseases and other diseases. We report here the sequencing, assembly, and annotation of the Syrian hamster transcriptome. We include transcripts from ten pooled tissues from a naïve hamster and one stimulated with lipopolysaccharide. Our data set identified 42,707 non-redundant transcripts, representing 34,191 unique genes. Based on the transcriptome data, we generated a custom microarray and used this new platform to investigate the transcriptional response in the Syrian hamster liver following intravenous adenovirus type 5 (Ad5) infection. We found that Ad5 infection caused a massive change in regulation of liver transcripts, with robust up-regulation of genes involved in the antiviral response, indicating that the innate immune response functions in the host defense against Ad5 infection of the liver. The data and novel platforms developed in this study will facilitate further development of this important animal model. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

    Directory of Open Access Journals (Sweden)

    Qian Ding

    2015-01-01

    Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  13. TCW: transcriptome computational workbench.

    Science.gov (United States)

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

    2013-01-01

    The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.

  14. Differential transcriptome profiling of chilling stress response between shoots and rhizomes of Oryza longistaminata using RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Ting Zhang

    Full Text Available Rice (Oryza sativa is very sensitive to chilling stress at seedling and reproductive stages, whereas wild rice, O. longistaminata, tolerates non-freezing cold temperatures and has overwintering ability. Elucidating the molecular mechanisms of chilling tolerance (CT in O. longistaminata should thus provide a basis for rice CT improvement through molecular breeding. In this study, high-throughput RNA sequencing was performed to profile global transcriptome alterations and crucial genes involved in response to long-term low temperature in O. longistaminata shoots and rhizomes subjected to 7 days of chilling stress. A total of 605 and 403 genes were respectively identified as up- and down-regulated in O. longistaminata under 7 days of chilling stress, with 354 and 371 differentially expressed genes (DEGs found exclusively in shoots and rhizomes, respectively. GO enrichment and KEGG pathway analyses revealed that multiple transcriptional regulatory pathways were enriched in commonly induced genes in both tissues; in contrast, only the photosynthesis pathway was prevalent in genes uniquely induced in shoots, whereas several key metabolic pathways and the programmed cell death process were enriched in genes induced only in rhizomes. Further analysis of these tissue-specific DEGs showed that the CBF/DREB1 regulon and other transcription factors (TFs, including AP2/EREBPs, MYBs, and WRKYs, were synergistically involved in transcriptional regulation of chilling stress response in shoots. Different sets of TFs, such as OsERF922, OsNAC9, OsWRKY25, and WRKY74, and eight genes encoding antioxidant enzymes were exclusively activated in rhizomes under long-term low-temperature treatment. Furthermore, several cis-regulatory elements, including the ICE1-binding site, the GATA element for phytochrome regulation, and the W-box for WRKY binding, were highly abundant in both tissues, confirming the involvement of multiple regulatory genes and complex networks in the

  15. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    DEFF Research Database (Denmark)

    Camargo, A A; Samaia, H P; Dias-Neto, E

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of ...

  16. Population genetics, phylogenomics and hybrid speciation of Juglans in China determined from whole chloroplast genomes, transcriptomes, and genotyping-by-sequencing (GBS)

    Science.gov (United States)

    Peng Zhao; Hui-Juan Zhou; Daniel Potter; Yi-Heng Hu; Xiao-Jia Feng; Meng Dang; Li Feng; Saman Zulfiqar; Wen-Zhe Liu; Gui-Fang Zhao; Keith Woeste

    2018-01-01

    Genomic data are a powerful tool for elucidating the processes involved in the evolution and divergence of species. The speciation and phylogenetic relationships among Chinese Juglans remain unclear. Here, we used results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing (GBS), and whole chloroplast...

  17. RNA-Seq analysis of Cocos nucifera: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

    Science.gov (United States)

    Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

    2013-01-01

    Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.

  18. Design of a 9K illumina BeadChip for polar bears (Ursus maritimus) from RAD and transcriptome sequencing.

    Science.gov (United States)

    Malenfant, René M; Coltman, David W; Davis, Corey S

    2015-05-01

    Single-nucleotide polymorphisms (SNPs) offer numerous advantages over anonymous markers such as microsatellites, including improved estimation of population parameters, finer-scale resolution of population structure and more precise genomic dissection of quantitative traits. However, many SNPs are needed to equal the resolution of a single microsatellite, and reliable large-scale genotyping of SNPs remains a challenge in nonmodel species. Here, we document the creation of a 9K Illumina Infinium BeadChip for polar bears (Ursus maritimus), which will be used to investigate: (i) the fine-scale population structure among Canadian polar bears and (ii) the genomic architecture of phenotypic traits in the Western Hudson Bay subpopulation. To this end, we used restriction-site associated DNA (RAD) sequencing from 38 bears across their circumpolar range, as well as blood/fat transcriptome sequencing of 10 individuals from Western Hudson Bay. Six-thousand RAD SNPs and 3000 transcriptomic SNPs were selected for the chip, based primarily on genomic spacing and gene function respectively. Of the 9000 SNPs ordered from Illumina, 8042 were successfully printed, and - after genotyping 1450 polar bears - 5441 of these SNPs were found to be well clustered and polymorphic. Using this array, we show rapid linkage disequilibrium decay among polar bears, we demonstrate that in a subsample of 78 individuals, our SNPs detect known genetic structure more clearly than 24 microsatellites genotyped for the same individuals and that these results are not driven by the SNP ascertainment scheme. Here, we present one of the first large-scale genotyping resources designed for a threatened species. © 2014 John Wiley & Sons Ltd.

  19. Developing nuclear DNA phylogenetic markers in the angiosperm genus Leucadendron (Proteaceae): a next-generation sequencing transcriptomic approach.

    Science.gov (United States)

    Tonnabel, Jeanne; Olivieri, Isabelle; Mignot, Agnès; Rebelo, Anthony; Justy, Fabienne; Santoni, Sylvain; Caroli, Stéfanie; Sauné, Laure; Bouchez, Olivier; Douzery, Emmanuel J P

    2014-01-01

    Despite the recent advances in generating molecular data, reconstructing species-level phylogenies for non-models groups remains a challenge. The use of a number of independent genes is required to resolve phylogenetic relationships, especially for groups displaying low polymorphism. In such cases, low-copy nuclear exons and non-coding regions, such as 3' untranslated regions (3'-UTRs) or introns, constitute a potentially interesting source of nuclear DNA variation. Here, we present a methodology meant to identify new nuclear orthologous markers using both public-nucleotide databases and transcriptomic data generated for the group of interest by using next generation sequencing technology. To identify PCR primers for a non-model group, the genus Leucadendron (Proteaceae), we adopted a framework aimed at minimizing the probability of paralogy and maximizing polymorphism. We anchored when possible the right-hand primer into the 3'-UTR and the left-hand primer into the coding region. Seven new nuclear markers emerged from this search strategy, three of those included 3'-UTRs. We further compared the phylogenetic potential between our new markers and the ribosomal internal transcribed spacer region (ITS). The sequenced 3'-UTRs yielded higher polymorphism rates than the ITS region did. We did not find strong incongruences with the phylogenetic signal contained in the ITS region and the seven new designed markers but they strongly improved the phylogeny of the genus Leucadendron. Overall, this methodology is efficient in isolating orthologous loci and is valid for any non-model group given the availability of transcriptomic data. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. Mining genes involved in the stratification of Paris Polyphylla seeds using high-throughput embryo Transcriptome sequencing

    Science.gov (United States)

    2013-01-01

    Background Paris polyphylla var. yunnanensis is an important medicinal plant. Seed dormancy is one of the main factors restricting artificial cultivation. The molecular mechanisms of seed dormancy remain unclear, and little genomic or transcriptome data are available for this plant. Results In this study, massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform was used to generate a substantial sequence dataset for the P. polyphylla embryo. 369,496 high quality reads were obtained, ranging from 50 to 1146 bp, with a mean of 219 bp. These reads were assembled into 47,768 unigenes, which included 16,069 contigs and 31,699 singletons. Using BLASTX searches of public databases, 15,757 (32.3%) unique transcripts were identified. Gene Ontology and Cluster of Orthologous Groups of proteins annotations revealed that these transcripts were broadly representative of the P. polyphylla embryo transcriptome. The Kyoto Encyclopedia of Genes and Genomes assigned 5961 of the unique sequences to specific metabolic pathways. Relative expression levels analysis showed that eleven phytohormone-related genes and five other genes have different expression patterns in the embryo and endosperm in the seed stratification process. Conclusions Gene annotation and quantitative RT-PCR expression analysis identified 464 transcripts that may be involved in phytohormone catabolism and biosynthesis, hormone signal, seed dormancy, seed maturation, cell wall growth and circadian rhythms. In particular, the relative expression analysis of sixteen genes (CYP707A, NCED, GA20ox2, GA20ox3, ABI2, PP2C, ARP3, ARP7, IAAH, IAAS, BRRK, DRM, ELF1, ELF2, SFR6, and SUS) in embryo and endosperm and at two temperatures indicated that these related genes may be candidates for clarifying the molecular basis of seed dormancy in P. polyphlla var. yunnanensis. PMID:23718911

  1. Transcriptome sequencing of the blind subterranean mole rat, Spalax galili: utility and potential for the discovery of novel evolutionary patterns.

    Directory of Open Access Journals (Sweden)

    Assaf Malik

    Full Text Available The blind subterranean mole rat (Spalax ehrenbergi superspecies is a model animal for survival under extreme environments due to its ability to live in underground habitats under severe hypoxic stress and darkness. Here we report the transcriptome sequencing of Spalax galili, a chromosomal type of S. ehrenbergi. cDNA pools from muscle and brain tissues isolated from animals exposed to hypoxic and normoxic conditions were sequenced using Sanger, GS FLX, and GS FLX Titanium technologies. Assembly of the sequences yielded over 51,000 isotigs with homology to ∼12,000 mouse, rat or human genes. Based on these results, it was possible to detect large numbers of splice variants, SNPs, and novel transcribed regions. In addition, multiple differential expression patterns were detected between tissues and treatments. The results presented here will serve as a valuable resource for future studies aimed at identifying genes and gene regions evolved during the adaptive radiation associated with underground life of the blind mole rat.

  2. All 37 Mitochondrial Genes of Aphid Aphis craccivora Obtained from Transcriptome Sequencing: Implications for the Evolution of Aphids.

    Science.gov (United States)

    Song, Nan; Zhang, Hao; Li, Hu; Cai, Wanzhi

    2016-01-01

    The availability of mitochondrial genome data for Aphididae, one of the economically important insect pest families, in public databases is limited. The advent of next generation sequencing technology provides the potential to generate mitochondrial genome data for many species timely and cost-effectively. In this report, we used transcriptome sequencing technology to determine all the 37 mitochondrial genes of the cowpea aphid, Aphis craccivora. This method avoids the necessity of finding suitable primers for long PCRs or primer-walking amplicons, and is proved to be effective in obtaining the whole set of mitochondrial gene data for insects with difficulty in sequencing mitochondrial genome by PCR-based strategies. Phylogenetic analyses of aphid mitochondrial genome data show clustering based on tribe level, and strongly support the monophyly of the family Aphididae. Within the monophyletic Aphidini, three samples from Aphis grouped together. In another major clade of Aphididae, Pterocomma pilosum was recovered as a potential sister-group of Cavariella salicicola, as part of Macrosiphini.

  3. All 37 Mitochondrial Genes of Aphid Aphis craccivora Obtained from Transcriptome Sequencing: Implications for the Evolution of Aphids.

    Directory of Open Access Journals (Sweden)

    Nan Song

    Full Text Available The availability of mitochondrial genome data for Aphididae, one of the economically important insect pest families, in public databases is limited. The advent of next generation sequencing technology provides the potential to generate mitochondrial genome data for many species timely and cost-effectively. In this report, we used transcriptome sequencing technology to determine all the 37 mitochondrial genes of the cowpea aphid, Aphis craccivora. This method avoids the necessity of finding suitable primers for long PCRs or primer-walking amplicons, and is proved to be effective in obtaining the whole set of mitochondrial gene data for insects with difficulty in sequencing mitochondrial genome by PCR-based strategies. Phylogenetic analyses of aphid mitochondrial genome data show clustering based on tribe level, and strongly support the monophyly of the family Aphididae. Within the monophyletic Aphidini, three samples from Aphis grouped together. In another major clade of Aphididae, Pterocomma pilosum was recovered as a potential sister-group of Cavariella salicicola, as part of Macrosiphini.

  4. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae.

    Directory of Open Access Journals (Sweden)

    Blake T Hovde

    Full Text Available Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales, is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales, and transcriptome data collected at seven time points over a 24-hour light/dark cycle. The nuclear genome of C. tobin is small (59 Mb, compact (∼ 40% of the genome is protein coding and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two "red" RuBisCO activases that are shared across many algal lineages. The Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes.

  5. Bifunctional cis-Abienol Synthase from Abies balsamea Discovered by Transcriptome Sequencing and Its Implications for Diterpenoid Fragrance Production*

    Science.gov (United States)

    Zerbe, Philipp; Chiang, Angela; Yuen, Macaire; Hamberger, Björn; Hamberger, Britta; Draper, Jason A.; Britton, Robert; Bohlmann, Jörg

    2012-01-01

    The labdanoid diterpene alcohol cis-abienol is a major component of the aromatic oleoresin of balsam fir (Abies balsamea) and serves as a valuable bioproduct material for the fragrance industry. Using high-throughput 454 transcriptome sequencing and metabolite profiling of balsam fir bark tissue, we identified candidate diterpene synthase sequences for full-length cDNA cloning and functional characterization. We discovered a bifunctional class I/II cis-abienol synthase (AbCAS), along with the paralogous levopimaradiene/abietadiene synthase and isopimaradiene synthase, all of which are members of the gymnosperm-specific TPS-d subfamily. The AbCAS-catalyzed formation of cis-abienol proceeds via cyclization and hydroxylation at carbon C-8 of a postulated carbocation intermediate in the class II active site, followed by cleavage of the diphosphate group and termination of the reaction sequence without further cyclization in the class I active site. This reaction mechanism is distinct from that of synthases of the isopimaradiene- or levopimaradiene/abietadiene synthase type, which employ deprotonation reactions in the class II active site and secondary cyclizations in the class I active site, leading to tricyclic diterpenes. Comparative homology modeling suggested the active site residues Asp-348, Leu-617, Phe-696, and Gly-723 as potentially important for the specificity of AbCAS. As a class I/II bifunctional enzyme, AbCAS is a promising target for metabolic engineering of cis-abienol production. PMID:22337889

  6. Transcriptome sequencing, and rapid development and application of SNP markers for the legume pod borer Maruca vitrata (Lepidoptera: Crambidae.

    Directory of Open Access Journals (Sweden)

    Venu M Margam

    Full Text Available The legume pod borer, Maruca vitrata (Lepidoptera: Crambidae, is an insect pest species of crops grown by subsistence farmers in tropical regions of Africa. We present the de novo assembly of 3729 contigs from 454- and Sanger-derived sequencing reads for midgut, salivary, and whole adult tissues of this non-model species. Functional annotation predicted that 1320 M. vitrata protein coding genes are present, of which 631 have orthologs within the Bombyx mori gene model. A homology-based analysis assigned M. vitrata genes into a group of paralogs, but these were subsequently partitioned into putative orthologs following phylogenetic analyses. Following sequence quality filtering, a total of 1542 putative single nucleotide polymorphisms (SNPs were predicted within M. vitrata contig assemblies. Seventy one of 1078 designed molecular genetic markers were used to screen M. vitrata samples from five collection sites in West Africa. Population substructure may be present with significant implications in the insect resistance management recommendations pertaining to the release of biological control agents or transgenic cowpea that express Bacillus thuringiensis crystal toxins. Mutation data derived from transcriptome sequencing is an expeditious and economical source for genetic markers that allow evaluation of ecological differentiation.

  7. Transcriptome sequencing of the blind subterranean mole rat, Spalax galili: Utility and potential for the discovery of novel evolutionary patterns

    KAUST Repository

    Malik, Assaf

    2011-08-12

    The blind subterranean mole rat (Spalax ehrenbergi superspecies) is a model animal for survival under extreme environments due to its ability to live in underground habitats under severe hypoxic stress and darkness. Here we report the transcriptome sequencing of Spalax galili, a chromosomal type of S. ehrenbergi. cDNA pools from muscle and brain tissues isolated from animals exposed to hypoxic and normoxic conditions were sequenced using Sanger, GS FLX, and GS FLX Titanium technologies. Assembly of the sequences yielded over 51,000 isotigs with homology to ~12,000 mouse, rat or human genes. Based on these results, it was possible to detect large numbers of splice variants, SNPs, and novel transcribed regions. In addition, multiple differential expression patterns were detected between tissues and treatments. The results presented here will serve as a valuable resource for future studies aimed at identifying genes and gene regions evolved during the adaptive radiation associated with underground life of the blind mole rat. 2011 Malik et al.

  8. De novo transcriptome sequencing and comparative analysis of differentially expressed genes in dryoperis fragrans under temperature stress

    International Nuclear Information System (INIS)

    Wang, W.Z.; Tong, W.S.; Gao, R.

    2016-01-01

    Dryopteris fragrans is a species of fern and contains flavonoids compounds with medicinal value. This study explain the temperature stress impact flavonoids synthesis in D. fragrans tissue culture seedlings under the low temperature at 4 degree C, high temperature at 35 degree C and moderate temperature at 25 degree C. By using Illumina HiSeq 2000 sequencing, 80.9 million raw sequence reads were de novo assembled into 66,716 non-redundant unigenes. 38,486 unigenes (57.7%) were annotated for their function. 13,973 unigenes and 29,598 unigenes were allocated to gene ontology (GO) and clusters of orthologous group (COG), respectively. 18,989 sequences mapped to 118 Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 204 genes were involved in flavonoid biosynthesis, regulation and transport. 25,292 and 16,817 unigenes exhibited marked differential expression in response to temperature shifts of 25 degree C to 4 degree C and 25 degree C to 35 degree C, respectively. 4CL and CHS genes involved in flavonoid biosynthesis were tested and suggested that they were responsible for biosynthesis of flavonoids. This study provides the first published data to describe the D. fragrans transcriptome and should accelerate understanding of flavonoids biosynthesis, regulation and transport mechanisms. Since most unigenes described here were successfully annotated, these results should facilitate future functional genomic understanding and research of D. fragrans. (author)

  9. Transcriptome sequencing of Mycosphaerella fijiensis during association with Musa acuminata reveals candidate pathogenicity genes.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-08-30

    Mycosphaerella fijiensis, causative agent of the black Sigatoka disease of banana, is considered the most economically damaging banana disease. Despite its importance, the genetics of pathogenicity are poorly understood. Previous studies have characterized polyketide pathways with possible roles in pathogenicity. To identify additional candidate pathogenicity genes, we compared the transcriptome of this fungus during the necrotrophic phase of infection with that during saprophytic growth in medium. Transcriptome analysis was conducted, and the functions of differentially expressed genes were predicted by identifying conserved domains, Gene Ontology (GO) annotation and GO enrichment analysis, Carbohydrate-Active EnZymes (CAZy) annotation, and identification of genes encoding effector-like proteins. The analysis showed that genes commonly involved in secondary metabolism have higher expression in infected leaf tissue, including genes encoding cytochrome P450s, short-chain dehydrogenases, and oxidoreductases in the 2-oxoglutarate and Fe(II)-dependent oxygenase superfamily. Other pathogenicity-related genes with higher expression in infected leaf tissue include genes encoding salicylate hydroxylase-like proteins, hydrophobic surface binding proteins, CFEM domain-containing proteins, and genes encoding secreted cysteine-rich proteins characteristic of effectors. More genes encoding amino acid transporters, oligopeptide transporters, peptidases, proteases, proteinases, sugar transporters, and proteins containing Domain of Unknown Function (DUF) 3328 had higher expression in infected leaf tissue, while more genes encoding inhibitors of peptidases and proteinases had higher expression in medium. Sixteen gene clusters with higher expression in leaf tissue were identified including clusters for the synthesis of a non-ribosomal peptide. A cluster encoding a novel fusicoccane was also identified. Two putative dispensable scaffolds were identified with a large proportion of

  10. Transcriptome sequence analysis of an ornamental plant, Ananas comosus var. bracteatus, revealed the potential unigenes involved in terpenoid and phenylpropanoid biosynthesis.

    Directory of Open Access Journals (Sweden)

    Jun Ma

    Full Text Available Ananas comosus var. bracteatus (Red Pineapple is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies.The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis.The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus.

  11. Transcriptome sequence analysis of an ornamental plant, Ananas comosus var. bracteatus, revealed the potential unigenes involved in terpenoid and phenylpropanoid biosynthesis.

    Science.gov (United States)

    Ma, Jun; Kanakala, S; He, Yehua; Zhang, Junli; Zhong, Xiaolan

    2015-01-01

    Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis. The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus.

  12. Transcriptome sequences spanning key developmental states as a resource for the study of the cestode Schistocephalus solidus, a threespine stickleback parasite.

    Science.gov (United States)

    Hébert, François Olivier; Grambauer, Stephan; Barber, Iain; Landry, Christian R; Aubin-Horth, Nadia

    2016-06-02

    Schistocephalus solidus is a well-established model organism for studying the complex life cycle of cestodes and the mechanisms underlying host-parasite interactions. However, very few large-scale genetic resources for this species are available. We have sequenced and de novo-assembled the transcriptome of S. solidus using tissues from whole worms at three key developmental states - non-infective plerocercoid, infective plerocercoid and adult plerocercoid - to provide a resource for studying the evolution of complex life cycles and, more specifically, how parasites modulate their interactions with their hosts during development. The de novo transcriptome assembly reconstructed the coding sequence of 10,285 high-confidence unigenes from which 24,765 non-redundant transcripts were derived. 7,920 (77 %) of these unigenes were annotated with a protein name and 7,323 (71 %) were assigned at least one Gene Ontology term. Our raw transcriptome assembly (unfiltered transcripts) covers 92 % of the predicted transcriptome derived from the S. solidus draft genome assembly currently available on WormBase. It also provides new ecological information and orthology relationships to further annotate the current WormBase transcriptome and genome. This large-scale transcriptomic dataset provides a foundation for studies on how parasitic species with complex life cycles modulate their response to changes in biotic and abiotic conditions experienced inside their various hosts, which is a fundamental objective of parasitology. Furthermore, this resource will help in the validation of the S solidus gene features that have been predicted based on genomic sequence.

  13. De novo sequencing, assembly, and analysis of the root transcriptome of Persea americana (Mill.) in response to Phytophthora cinnamomi and flooding.

    Science.gov (United States)

    Reeksting, Bianca J; Coetzer, Nanette; Mahomed, Waheed; Engelbrecht, Juanita; van den Berg, Noëlani

    2014-01-01

    Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR) causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies.

  14. De novo sequencing, assembly, and analysis of the root transcriptome of Persea americana (Mill. in response to Phytophthora cinnamomi and flooding.

    Directory of Open Access Journals (Sweden)

    Bianca J Reeksting

    Full Text Available Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies.

  15. De Novo Sequencing, Assembly, and Analysis of the Root Transcriptome of Persea americana (Mill.) in Response to Phytophthora cinnamomi and Flooding

    Science.gov (United States)

    Reeksting, Bianca J.; Coetzer, Nanette; Mahomed, Waheed; Engelbrecht, Juanita; van den Berg, Noëlani

    2014-01-01

    Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR) causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies. PMID:24563685

  16. Sample Size Estimation for Detection of Splicing Events in Transcriptome Sequencing Data.

    Science.gov (United States)

    Kaisers, Wolfgang; Schwender, Holger; Schaal, Heiner

    2017-09-05

    Merging data from multiple samples is required to detect low expressed transcripts or splicing events that might be present only in a subset of samples. However, the exact number of required replicates enabling the detection of such rare events often remains a mystery but can be approached through probability theory. Here, we describe a probabilistic model, relating the number of observed events in a batch of samples with observation probabilities. Therein, samples appear as a heterogeneous collection of events, which are observed with some probability. The model is evaluated in a batch of 54 transcriptomes of human dermal fibroblast samples. The majority of putative splice-sites (alignment gap-sites) are detected in (almost) all samples or only sporadically, resulting in an U-shaped pattern for observation probabilities. The probabilistic model systematically underestimates event numbers due to a bias resulting from finite sampling. However, using an additional assumption, the probabilistic model can predict observed event numbers within a events (mean 7122 in alignments from TopHat alignments and 86,215 in alignments from STAR). We conclude that the probabilistic model provides an adequate description for observation of gap-sites in transcriptome data. Thus, the calculation of required sample sizes can be done by application of a simple binomial model to sporadically observed random events. Due to the large number of uniquely observed putative splice-sites and the known stochastic noise in the splicing machinery, it appears advisable to include observation of rare splicing events into analysis objectives. Therefore, it is beneficial to take scores for the validation of gap-sites into account.

  17. The impact of RNA sequence library construction protocols on transcriptomic profiling of leukemia.

    Science.gov (United States)

    Kumar, Ashwini; Kankainen, Matti; Parsons, Alun; Kallioniemi, Olli; Mattila, Pirkko; Heckman, Caroline A

    2017-08-17

    RNA sequencing (RNA-seq) has become an indispensable tool to identify disease associated transcriptional profiles and determine the molecular underpinnings of diseases. However, the broad adaptation of the methodology into the clinic is still hampered by inconsistent results from different RNA-seq protocols and involves further evaluation of its analytical reliability using patient samples. Here, we applied two commonly used RNA-seq library preparation protocols to samples from acute leukemia patients to understand how poly-A-tailed mRNA selection (PA) and ribo-depletion (RD) based RNA-seq library preparation protocols affect gene fusion detection, variant calling, and gene expression profiling. Overall, the protocols produced similar results with consistent outcomes. Nevertheless, the PA protocol was more efficient in quantifying expression of leukemia marker genes and showed better performance in the expression-based classification of leukemia. Independent qRT-PCR experiments verified that the PA protocol better represented total RNA compared to the RD protocol. In contrast, the RD protocol detected a higher number of non-coding RNA features and had better alignment efficiency. The RD protocol also recovered more known fusion-gene events, although variability was seen in fusion gene predictions. The overall findings provide a framework for the use of RNA-seq in a precision medicine setting with limited number of samples and suggest that selection of the library preparation protocol should be based on the objectives of the analysis.

  18. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery

    Directory of Open Access Journals (Sweden)

    Benkman Craig W

    2010-03-01

    Full Text Available Abstract Background Massively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp. have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda, genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies. Results We obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs. A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci

  19. RNA sequencing of the human milk fat layer transcriptome reveals distinct gene expression profiles at three stages of lactation.

    Directory of Open Access Journals (Sweden)

    Danielle G Lemay

    Full Text Available Aware of the important benefits of human milk, most U.S. women initiate breastfeeding but difficulties with milk supply lead some to quit earlier than intended. Yet, the contribution of maternal physiology to lactation difficulties remains poorly understood. Human milk fat globules, by enveloping cell contents during their secretion into milk, are a rich source of mammary cell RNA. Here, we pair this non-invasive mRNA source with RNA-sequencing to probe the milk fat layer transcriptome during three stages of lactation: colostral, transitional, and mature milk production. The resulting transcriptomes paint an exquisite portrait of human lactation. The resulting transcriptional profiles cluster not by postpartum day, but by milk Na:K ratio, indicating that women sampled during similar postpartum time frames could be at markedly different stages of gene expression. Each stage of lactation is characterized by a dynamic range (10(5-fold in transcript abundances not previously observed with microarray technology. We discovered that transcripts for isoferritins and cathepsins are strikingly abundant during colostrum production, highlighting the potential importance of these proteins for neonatal health. Two transcripts, encoding β-casein (CSN2 and α-lactalbumin (LALBA, make up 45% of the total pool of mRNA in mature lactation. Genes significantly expressed across all stages of lactation are associated with making, modifying, transporting, and packaging milk proteins. Stage-specific transcripts are associated with immune defense during the colostral stage, up-regulation of the machinery needed for milk protein synthesis during the transitional stage, and the production of lipids during mature lactation. We observed strong modulation of key genes involved in lactose synthesis and insulin signaling. In particular, protein tyrosine phosphatase, receptor type, F (PTPRF may serve as a biomarker linking insulin resistance with insufficient milk supply. This

  20. Innovative approach for transcriptomic analysis of obligate intracellular pathogen: selective capture of transcribed sequences of Ehrlichia ruminantium.

    Science.gov (United States)

    Emboulé, Loïc; Daigle, France; Meyer, Damien F; Mari, Bernard; Pinarello, Valérie; Sheikboudou, Christian; Magnone, Virginie; Frutos, Roger; Viari, Alain; Barbry, Pascal; Martinez, Dominique; Lefrançois, Thierry; Vachiéry, Nathalie

    2009-12-24

    Whole genome transcriptomic analysis is a powerful approach to elucidate the molecular mechanisms controlling the pathogenesis of obligate intracellular bacteria. However, the major hurdle resides in the low quantity of prokaryotic mRNAs extracted from host cells. Our model Ehrlichia ruminantium (ER), the causative agent of heartwater, is transmitted by tick Amblyomma variegatum. This bacterium affects wild and domestic ruminants and is present in Sub-Saharan Africa and the Caribbean islands. Because of its strictly intracellular location, which constitutes a limitation for its extensive study, the molecular mechanisms involved in its pathogenicity are still poorly understood. We successfully adapted the SCOTS method (Selective Capture of Transcribed Sequences) on the model Rickettsiales ER to capture mRNAs. Southern Blots and RT-PCR revealed an enrichment of ER's cDNAs and a diminution of ribosomal contaminants after three rounds of capture. qRT-PCR and whole-genome ER microarrays hybridizations demonstrated that SCOTS method introduced only a limited bias on gene expression. Indeed, we confirmed the differential gene expression between poorly and highly expressed genes before and after SCOTS captures. The comparative gene expression obtained from ER microarrays data, on samples before and after SCOTS at 96 hpi was significantly correlated (R2 = 0.7). Moreover, SCOTS method is crucial for microarrays analysis of ER, especially for early time points post-infection. There was low detection of transcripts for untreated samples whereas 24% and 70.7% were revealed for SCOTS samples at 24 and 96 hpi respectively. We conclude that this SCOTS method has a key importance for the transcriptomic analysis of ER and can be potentially used for other Rickettsiales. This study constitutes the first step for further gene expression analyses that will lead to a better understanding of both ER pathogenicity and the adaptation of obligate intracellular bacteria to their environment.

  1. Innovative approach for transcriptomic analysis of obligate intracellular pathogen: selective capture of transcribed sequences of Ehrlichia ruminantium

    Directory of Open Access Journals (Sweden)

    Lefrançois Thierry

    2009-12-01

    Full Text Available Abstract Background Whole genome transcriptomic analysis is a powerful approach to elucidate the molecular mechanisms controlling the pathogenesis of obligate intracellular bacteria. However, the major hurdle resides in the low quantity of prokaryotic mRNAs extracted from host cells. Our model Ehrlichia ruminantium (ER, the causative agent of heartwater, is transmitted by tick Amblyomma variegatum. This bacterium affects wild and domestic ruminants and is present in Sub-Saharan Africa and the Caribbean islands. Because of its strictly intracellular location, which constitutes a limitation for its extensive study, the molecular mechanisms involved in its pathogenicity are still poorly understood. Results We successfully adapted the SCOTS method (Selective Capture of Transcribed Sequences on the model Rickettsiales ER to capture mRNAs. Southern Blots and RT-PCR revealed an enrichment of ER's cDNAs and a diminution of ribosomal contaminants after three rounds of capture. qRT-PCR and whole-genome ER microarrays hybridizations demonstrated that SCOTS method introduced only a limited bias on gene expression. Indeed, we confirmed the differential gene expression between poorly and highly expressed genes before and after SCOTS captures. The comparative gene expression obtained from ER microarrays data, on samples before and after SCOTS at 96 hpi was significantly correlated (R2 = 0.7. Moreover, SCOTS method is crucial for microarrays analysis of ER, especially for early time points post-infection. There was low detection of transcripts for untreated samples whereas 24% and 70.7% were revealed for SCOTS samples at 24 and 96 hpi respectively. Conclusions We conclude that this SCOTS method has a key importance for the transcriptomic analysis of ER and can be potentially used for other Rickettsiales. This study constitutes the first step for further gene expression analyses that will lead to a better understanding of both ER pathogenicity and the adaptation

  2. De novo transcriptome sequencing and comprehensive analysis of the heat stress response genes in the basidiomycetes fungus Ganoderma lucidum.

    Science.gov (United States)

    Tan, Xiaoyan; Sun, Junshe; Ning, Huijuan; Qin, Zifang; Miao, Yuxin; Sun, Tian; Zhang, Xiuqing

    2018-03-29

    Ganoderma lucidum is a valuable basidiomycete with numerous pharmacological compounds, which is widely consumed throughout China. We previously found that the polysaccharide content of Ganoderma lucidum fruiting bodies could be significantly improved by 45.63% with treatment of 42 °C heat stress (HS) for 2 h. To further investigate genes involved in HS response and explore the mechanisms of HS regulating the carbohydrate metabolism in Ganoderma lucidum, high-throughput RNA-Seq was conducted to analyse the difference between control and heat-treated mycelia at transcriptome level. We sequenced six cDNA libraries with three from control group (mycelia cultivated at 28 °C) and three from heat-treated group (mycelia subjected to 42 °C for 2 h). A total of 99,899 transcripts were generated using Trinity method and 59,136 unigenes were annotated by seven public databases. Among them, 2790 genes were identified to be differential expressed genes (DEGs) under HS condition, which included 1991 up-regulated and 799 down-regulated. 176 DEGs were then manually classified into five main responsive-related categories according to their putative functions and possible metabolic pathways. These groups include stress resistance-related factors; protein assembly, transportation and degradation; signal transduction; carbohydrate metabolism and energy provision-related process; other related functions, suggesting that a series of metabolic pathways in Ganoderma lucidum are activated by HS and the response mechanism involves a complex molecular network which needs further study. Remarkably, 48 DEGs were found to regulate carbohydrate metabolism, both in carbohydrate hydrolysis for energy provision and polysaccharide synthesis. In summary, this comprehensive transcriptome analysis will provide enlarged resource for further investigation into the molecular mechanisms of basidiomycete under HS condition. Copyright © 2017. Published by Elsevier B.V.

  3. Development of SSR Markers Based on Transcriptome Sequencing and Association Analysis with Drought Tolerance in Perennial Grass Miscanthus from China

    Directory of Open Access Journals (Sweden)

    Gang Nie

    2017-05-01

    Full Text Available Drought has become a critical environmental stress affecting on plant in temperate area. As one of the promising bio-energy crops to sustainable biomass production, the genus Miscanthus has been widely studied around the world. However, the most widely used hybrid cultivar among this genus, Miscanthus × giganteus is proved poor drought tolerance compared to some parental species. Here we mainly focused on Miscanthus sinensis, which is one of the progenitors of M. × giganteus providing a comparable yield and well abiotic stress tolerance in some places. The main objectives were to characterize the physiological and photosynthetic respond to drought stress and to develop simple sequence repeats (SSRs markers associated with drought tolerance by transcriptome sequencing within an originally collection of 44 Miscanthus genotypes from southwest China. Significant phenotypic differences were observed among genotypes, and the average of leaf relative water content (RWC were severely affected by drought stress decreasing from 88.27 to 43.21%, which could well contribute to separating the drought resistant and drought sensitive genotype of Miscanthus. Furthermore, a total of 16,566 gene-associated SSRs markers were identified based on Illumina RNA sequencing under drought conditions, and 93 of them were randomly selected to validate. In total, 70 (75.3% SSRs were successfully amplified and the generated loci from 30 polymorphic SSRs were used to estimate the genetic differentiation and population structure. Finally, two optimum subgroups of the population were determined by structure analysis and based on association analysis, seven significant associations were identified including two markers with leaf RWC and five markers with photosynthetic traits. With the rich sequencing resources annotation, such associations would serve an efficient tool for Miscanthus drought response mechanism study and facilitate genetic improvement of drought resistant for

  4. De novo transcriptome sequencing analysis and comparison of differentially expressed genes (DEGs) in Macrobrachium rosenbergii in China.

    Science.gov (United States)

    Nguyen Thanh, Hai; Zhao, Liangjie; Liu, Qigen

    2014-01-01

    Giant freshwater prawn (GFP; Macrobrachium rosenbergii) is an exotic species that was introduced into China in 1976 and thereafter it became a major species in freshwater aquaculture. However the gene discovery in this species has been limited to small-scale data collection in China. We used the next generation sequencing technology for the experiment; the transcriptome was sequenced of samples of hepatopancreas organ in individuals from 4 GFP groups (A1, A2, B1 and B2). De novo transcriptome sequencing generated 66,953 isogenes. Using BLASTX to search the Non-redundant (NR), Search Tool for the Retrieval of Interacting Genes (STRING), and Kyoto Encyclopedia of Genes and Genome (KEGG) databases; 21,224 unigenes were annotated, 9,552 matched unigenes with the Gene Ontology (GO) classification; 5,782 matched unigenes in 25 categories of Clusters of Orthologous Groups of proteins (COG) and 20,859 unigenes were consequently assigned to 312 KEGG pathways. Between the A and B groups 147 differentially expressed genes (DEGs) were identified; between the A1 and A2 groups 6,860 DEGs were identified and between the B1 and B2 groups 5,229 DEGs were identified. After enrichment, the A and B groups identified 38 DEGs, but none of them were significantly enriched. The A1 and A2 groups identified 21,856 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function and the KEGG pathway defined 2,459 genes had a KEGG Ortholog-ID (KO-ID) and could be categorized into 251 pathways, of those, 9 pathways were significantly enriched. The B1 and B2 groups identified 5,940 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function, and the KEGG pathway defined 1,543 genes had a KO-ID and could be categorized into 240 pathways, of those, 2 pathways were significantly enriched. We investigated 99 queries (GO) which related to growth of GFP in 4 groups. After enrichment we

  5. De novo transcriptome sequencing analysis and comparison of differentially expressed genes (DEGs in Macrobrachium rosenbergii in China.

    Directory of Open Access Journals (Sweden)

    Hai Nguyen Thanh

    Full Text Available Giant freshwater prawn (GFP; Macrobrachium rosenbergii is an exotic species that was introduced into China in 1976 and thereafter it became a major species in freshwater aquaculture. However the gene discovery in this species has been limited to small-scale data collection in China. We used the next generation sequencing technology for the experiment; the transcriptome was sequenced of samples of hepatopancreas organ in individuals from 4 GFP groups (A1, A2, B1 and B2. De novo transcriptome sequencing generated 66,953 isogenes. Using BLASTX to search the Non-redundant (NR, Search Tool for the Retrieval of Interacting Genes (STRING, and Kyoto Encyclopedia of Genes and Genome (KEGG databases; 21,224 unigenes were annotated, 9,552 matched unigenes with the Gene Ontology (GO classification; 5,782 matched unigenes in 25 categories of Clusters of Orthologous Groups of proteins (COG and 20,859 unigenes were consequently assigned to 312 KEGG pathways. Between the A and B groups 147 differentially expressed genes (DEGs were identified; between the A1 and A2 groups 6,860 DEGs were identified and between the B1 and B2 groups 5,229 DEGs were identified. After enrichment, the A and B groups identified 38 DEGs, but none of them were significantly enriched. The A1 and A2 groups identified 21,856 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function and the KEGG pathway defined 2,459 genes had a KEGG Ortholog-ID (KO-ID and could be categorized into 251 pathways, of those, 9 pathways were significantly enriched. The B1 and B2 groups identified 5,940 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function, and the KEGG pathway defined 1,543 genes had a KO-ID and could be categorized into 240 pathways, of those, 2 pathways were significantly enriched. We investigated 99 queries (GO which related to growth of GFP in 4 groups. After

  6. Genome-Wide Gene Expression Profiling of Nucleus Accumbens Neurons Projecting to Ventral Pallidum Using both Microarray and Transcriptome Sequencing.

    Science.gov (United States)

    Chen, Hao; Liu, Zhimin; Gong, Suzhen; Wu, Xingjun; Taylor, William L; Williams, Robert W; Matta, Shannon G; Sharp, Burt M

    2011-01-01

    The cellular heterogeneity of brain poses a particularly thorny issue in genome-wide gene expression studies. Because laser capture microdissection (LCM) enables the precise extraction of a small area of tissue, we combined LCM with neuronal track tracing to collect nucleus accumbens shell neurons that project to ventral pallidum, which are of particular interest in the study of reward and addiction. Four independent biological samples of accumbens projection neurons were obtained. Approximately 500 pg of total RNA from each sample was then amplified linearly and subjected to Affymetrix microarray and Applied Biosystems sequencing by oligonucleotide ligation and detection (SOLiD) transcriptome sequencing (RNA-seq). A total of 375 million 50-bp reads were obtained from RNA-seq. Approximately 57% of these reads were mapped to the rat reference genome (Baylor 3.4/rn4). Approximately 11,000 unique RefSeq genes and 100,000 unique exons were identified from each sample. Of the unmapped reads, the quality scores were 4.74 ± 0.42 lower than the mapped reads. When RNA-seq and microarray data from the same samples were compared, Pearson correlations were between 0.764 and 0.798. The variances in data obtained for the four samples by microarray and RNA-seq were similar for medium to high abundance genes, but less among low abundance genes detected by microarray. Analysis of 34 genes by real-time polymerase chain reaction showed higher correlation with RNA-seq (0.66) than with microarray (0.46). Further analysis showed 20-30 million 50-bp reads are sufficient to provide estimates of gene expression levels comparable to those produced by microarray. In summary, this study showed that picogram quantities of total RNA obtained by LCM of ∼700 individual neurons is sufficient to take advantage of the benefits provided by the transcriptome sequencing technology, such as low background noise, high dynamic range, and high precision.

  7. De novo transcriptome sequencing and analysis of freshwater snail (Radix balthica) to discover genes and pathways affected by exposure to oxazepam.

    Science.gov (United States)

    Mazzitelli, Jean-Yves; Bonnafe, Elsa; Klopp, Christophe; Escudier, Frédéric; Geret, Florence

    2017-01-01

    Pharmaceuticals are increasingly found in aquatic ecosystems due to the non-efficiency of waste water treatment plants. Therefore, aquatic organisms are frequently exposed to a broad diversity of pharmaceuticals. Freshwater snail Radix balthica has been chosen as model to study the effects of oxazepam (psychotropic drug) on developmental stages ranging from trochophore to hatching. In order to provide a global insight of these effects, a transcriptome deep sequencing has been performed on exposed embryos. Eighteen libraries were sequenced, six libraries for three conditions: control, exposed to the lowest oxazepam concentration with a phenotypic effect (delayed hatching) (TA) and exposed to oxazepam concentration found in freshwater (TB). A total of 39,759,772 filtered raw reads were assembled into 56,435 contigs having a mean length of 1579.68 bp and mean depth of 378.96 reads. 44.91% of the contigs have at least one annotation. The differential expression analysis between the control condition and the two exposure conditions revealed 146 contigs differentially expressed of which 144 for TA and two for TB. 34.0% were annotated with biological function. There were four mainly impacted processes: two cellular signalling systems (Notch and JNK) and two biosynthesis pathways (Polyamine and Catecholamine pathways). This work reports a large-scale analysis of differentially transcribed genes of R. balthica exposed to oxazepam during egg development until hatching. In addition, these results enriched the de novo database of potential ecotoxicological models.

  8. Transcriptomic changes during tuber dormancy release process revealed by RNA sequencing in potato.

    Science.gov (United States)

    Liu, Bailin; Zhang, Ning; Wen, Yikai; Jin, Xin; Yang, Jiangwei; Si, Huaijun; Wang, Di

    2015-03-20

    Potato tuber dormancy release is a critical development process that allows potato to produce new plant. The first Illumina RNA sequencing to generate the expressed mRNAs at dormancy tuber (DT), dormancy release tuber (DRT) and sprouting tuber (ST) was performed. We identified 26,639 genes including 5,912 (3,450 up-regulated while 2,462 down-regulated) and 3,885 (2,141 up-regulated while 1,744 down-regulated) genes were differentially expressed from DT vs DRT and DRT vs ST. The RNA-Seq results were further verified using qRT-PCR. We found reserve mobilization events were activated before the bud emergence (DT vs DRT) and highlighted after dormancy release (DRT vs ST). Overexpressed genes related to metabolism of auxin, gibberellic acid, cytokinin and barssinosteriod were dominated in DT vs DRT, whereas overexpressed genes involved in metabolism of ethylene, jasmonate and salicylate were prominent in DRT vs ST. Various histone and cyclin isoforms associated genes involved in cell division/cycle were mainly up-regulated in DT vs DRT. Dormancy release process was also companied by stress response and redox regulation, those genes related to biotic stress, cell wall and second metabolism was preferentially overexpressed in DRT vs ST, which might accelerate dormancy breaking and sprout outgrowth. The metabolic processes activated during tuber dormancy release were also supported by plant seed models. These results represented the first comprehensive picture of a large number of genes involved in tuber dormancy release process. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Transcriptome profiling of bovine milk oligosaccharide metabolism genes using RNA-sequencing.

    Directory of Open Access Journals (Sweden)

    Saumya Wickramasinghe

    2011-04-01

    Full Text Available This study examines the genes coding for enzymes involved in bovine milk oligosaccharide metabolism by comparing the oligosaccharide profiles with the expressions of glycosylation-related genes. Fresh milk samples (n = 32 were collected from four Holstein and Jersey cows at days 1, 15, 90 and 250 of lactation and free milk oligosaccharide profiles were analyzed. RNA was extracted from milk somatic cells at days 15 and 250 of lactation (n = 12 and gene expression analysis was conducted by RNA-Sequencing. A list was created of 121 glycosylation-related genes involved in oligosaccharide metabolism pathways in bovine by analyzing the oligosaccharide profiles and performing an extensive literature search. No significant differences were observed in either oligosaccharide profiles or expressions of glycosylation-related genes between Holstein and Jersey cows. The highest concentrations of free oligosaccharides were observed in the colostrum samples and a sharp decrease was observed in the concentration of free oligosaccharides on day 15, followed by progressive decrease on days 90 and 250. Ninety-two glycosylation-related genes were expressed in milk somatic cells. Most of these genes exhibited higher expression in day 250 samples indicating increases in net glycosylation-related metabolism in spite of decreases in free milk oligosaccharides in late lactation milk. Even though fucosylated free oligosaccharides were not identified, gene expression indicated the likely presence of fucosylated oligosaccharides in bovine milk. Fucosidase genes were expressed in milk and a possible explanation for not detecting fucosylated free oligosaccharides is the degradation of large fucosylated free oligosaccharides by the fucosidases. Detailed characterization of enzymes encoded by the 92 glycosylation-related genes identified in this study will provide the basic knowledge for metabolic network analysis of oligosaccharides in mammalian milk. These candidate

  10. Discovery of J Chain in African Lungfish (Protopterus dolloi, Sarcopterygii) Using High Throughput Transcriptome Sequencing: Implications in Mucosal Immunity

    Science.gov (United States)

    Tacchi, Luca; Larragoite, Erin; Salinas, Irene

    2013-01-01

    J chain is a small polypeptide responsible for immunoglobulin (Ig) polymerization and transport of Igs across mucosal surfaces in higher vertebrates. We identified a J chain in dipnoid fish, the African lungfish (Protopterus dolloi) by high throughput sequencing of the transcriptome. P. dolloi J chain is 161 aa long and contains six of the eight Cys residues present in mammalian J chain. Phylogenetic studies place the lungfish J chain closer to tetrapod J chain than to the coelacanth or nurse shark sequences. J chain expression occurs in all P. dolloi immune tissues examined and it increases in the gut and kidney in response to an experimental bacterial infection. Double fluorescent in-situ hybridization shows that 88.5% of IgM+ cells in the gut co-express J chain, a significantly higher percentage than in the pre-pyloric spleen. Importantly, J chain expression is not restricted to the B-cell compartment since gut epithelial cells also express J chain. These results improve our current view of J chain from a phylogenetic perspective. PMID:23967082

  11. Neuropeptide signaling sequences identified by pyrosequencing of the American dog tick synganglion transcriptome during blood feeding and reproduction.

    Science.gov (United States)

    Donohue, Kevin V; Khalil, Sayed M S; Ross, E; Grozinger, Christina M; Sonenshine, Daniel E; Michael Roe, R

    2010-01-01

    Ticks are important vectors of numerous pathogens that impact human and animal health. The tick central nervous system represents an understudied area in tick biology and no tick synganglion-specific transcriptome has been described to date. Here we characterize whole or partial cDNA sequences of fourteen putative neuropeptides (allatostatin, insulin-like peptide, ion-transport peptide, sulfakinin, bursicon alpha/beta, eclosion hormone, glycoprotein hormone alpha/beta, corazonin, four orcokinins) and five neuropeptide receptors (gonadotropin receptor, leucokinin-like receptor, sulfakinin receptor, calcitonin receptor, pyrokinin receptor) translated from cDNA synthesized from the synganglion of unfed, partially fed and replete female American dog ticks, Dermacentor variabilis. Their homology to the same neuropeptides in other taxa is discussed. Many of these neuropeptides such as an allatostatin, insulin-like peptide, eclosion hormone, bursicon alpha and beta and glycoprotein hormone alpha and beta have not been previously described in the Chelicerata. An insulin-receptor substrate protein was also found indicating that an insulin signaling network is present in ticks. A putative type-2 proprotein processing convertase was also sequenced that may be involved in cleavage at monobasic and dibasic endoproteolytic cleavage sites in prohormones. The possible physiological role of the proteins discovered in adult tick blood feeding and reproduction will be discussed. Copyright (c) 2009 Elsevier Ltd. All rights reserved.

  12. Transcriptome sequencing reveals potential mechanism of cryptic 3' splice site selection in SF3B1-mutated cancers.

    Directory of Open Access Journals (Sweden)

    Christopher DeBoever

    2015-03-01

    Full Text Available Mutations in the splicing factor SF3B1 are found in several cancer types and have been associated with various splicing defects. Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3' splice sites (3'SSs are used in cancers with SF3B1 mutations. We define the necessary sequence context for the observed cryptic 3' SSs and propose that cryptic 3'SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point. While most cryptic 3'SSs are present at low frequency (<10% relative to nearby canonical 3'SSs, we identified ten genes that preferred out-of-frame cryptic 3'SSs. We show that cancers with mutations in the SF3B1 HEAT 5-9 repeats use cryptic 3'SSs downstream of the branch point and provide both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF3B1 mutation.

  13. Single-cell transcriptome sequencing reveals that cell division cycle 5-like protein is essential for porcine oocyte maturation.

    Science.gov (United States)

    Liu, Xiao-Man; Wang, Yan-Kui; Liu, Yun-Hua; Yu, Xiao-Xia; Wang, Pei-Chao; Li, Xuan; Du, Zhi-Qiang; Yang, Cai-Xia

    2018-02-02

    The brilliant cresyl blue (BCB) test is used in both basic biological research and assisted reproduction to identify oocytes likely to be developmentally competent. However, the underlying molecular mechanism targeted by the BCB test is still unclear. To explore this question, we first confirmed that BCB-positive porcine oocytes had higher rates of meiotic maturation, better rates of cleavage and development into blastocysts, and lower death rates. Subsequent single-cell transcriptome sequencing on porcine germinal vesicle (GV)-stage oocytes identified 155 genes that were significantly differentially expressed between BCB-negative and BCB-positive oocytes. These included genes such as cdc5l , ldha , spata22 , rgs2 , paip1 , wee1b , and hsp27 , which are enriched in functionally important signaling pathways including cell cycle regulation, oocyte meiosis, spliceosome formation, and nucleotide excision repair. In BCB-positive GV oocytes that additionally had a lower frequency of DNA double-strand breaks, the CDC5L protein was significantly more abundant. cdc5l /CDC5L inhibition by short interference (si)RNA or antibody microinjection significantly impaired porcine oocyte meiotic maturation and subsequent parthenote development. Taken together, our single-oocyte sequencing data point to a potential new role for CDC5L in porcine oocyte meiosis and early embryo development, and supports further analysis of this protein in the context of the BCB test. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  14. De novo transcriptome sequencing in Frankliniella occidentalis to identify genes involved in plant virus transmission and insecticide resistance.

    Science.gov (United States)

    Zhang, Zhijun; Zhang, Pengjun; Li, Weidi; Zhang, Jinming; Huang, Fang; Yang, Jian; Bei, Yawei; Lu, Yaobin

    2013-05-01

    The western flower thrips (WFT), Frankliniella occidentalis, a world-wide invasive insect, causes agricultural damage by directly feeding and by indirectly vectoring Tospoviruses, such as Tomato spotted wilt virus (TSWV). We characterized the transcriptome of WFT and analyzed global gene expression of WFT response to TSWV infection using Illumina sequencing platform. We compiled 59,932 unigenes, and identified 36,339 unigenes by similarity analysis against public databases, most of which were annotated using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. Within these annotated transcripts, we collected 278 sequences related to insecticide resistance. GO and KEGG analysis of different expression genes between TSWV-infected and non-infected WFT population revealed that TSWV can regulate cellular process and immune response, which might lead to low virus titers in thrips cells and no detrimental effects on F. occidentalis. This data-set not only enriches genomic resource for WFT, but also benefits research into its molecular genetics and functional genomics. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. OrchidBase: a collection of sequences of the transcriptome derived from orchids.

    Science.gov (United States)

    Fu, Chih-Hsiung; Chen, Yun-Wen; Hsiao, Yu-Yun; Pan, Zhao-Jun; Liu, Zhong-Jian; Huang, Yueh-Min; Tsai, Wen-Chieh; Chen, Hong-Hwa

    2011-02-01

    Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est.

  16. Transcriptome and Degradome Sequencing Reveals Dormancy Mechanisms of Cunninghamia lanceolata Seeds1

    Science.gov (United States)

    Xu, Huimin; Liu, Yongxiu; Soppe, Wim J.J.; Lin, Jinxing

    2016-01-01

    Seeds with physiological dormancy usually experience primary and secondary dormancy in the nature; however, little is known about the differential regulation of primary and secondary dormancy. We combined multiple approaches to investigate cytological changes, hormonal levels, and gene expression dynamics in Cunninghamia lanceolata seeds during primary dormancy release and secondary dormancy induction. Light microscopy and transmission electron microscopy revealed that protein bodies in the embryo cells coalesced during primary dormancy release and then separated during secondary dormancy induction. Transcriptomic profiling demonstrated that expression of genes negatively regulating gibberellic acid (GA) sensitivity reduced specifically during primary dormancy release, whereas the expression of genes positively regulating abscisic acid (ABA) biosynthesis increased during secondary dormancy induction. Parallel analysis of RNA ends revealed uncapped transcripts for ∼55% of all unigenes. A negative correlation between fold changes in expression levels of uncapped versus capped mRNAs was observed during primary dormancy release. However, this correlation was loose during secondary dormancy induction. Our analyses suggest that the reversible changes in cytology and gene expression during dormancy release and induction are related to ABA/GA balance. Moreover, mRNA degradation functions as a critical posttranscriptional regulator during primary dormancy release. These findings provide a mechanistic framework for understanding physiological dormancy in seeds. PMID:27760880

  17. Transcriptome and Degradome Sequencing Reveals Dormancy Mechanisms of Cunninghamia lanceolata Seeds.

    Science.gov (United States)

    Cao, Dechang; Xu, Huimin; Zhao, Yuanyuan; Deng, Xin; Liu, Yongxiu; Soppe, Wim J J; Lin, Jinxing

    2016-12-01

    Seeds with physiological dormancy usually experience primary and secondary dormancy in the nature; however, little is known about the differential regulation of primary and secondary dormancy. We combined multiple approaches to investigate cytological changes, hormonal levels, and gene expression dynamics in Cunninghamia lanceolata seeds during primary dormancy release and secondary dormancy induction. Light microscopy and transmission electron microscopy revealed that protein bodies in the embryo cells coalesced during primary dormancy release and then separated during secondary dormancy induction. Transcriptomic profiling demonstrated that expression of genes negatively regulating gibberellic acid (GA) sensitivity reduced specifically during primary dormancy release, whereas the expression of genes positively regulating abscisic acid (ABA) biosynthesis increased during secondary dormancy induction. Parallel analysis of RNA ends revealed uncapped transcripts for ∼55% of all unigenes. A negative correlation between fold changes in expression levels of uncapped versus capped mRNAs was observed during primary dormancy release. However, this correlation was loose during secondary dormancy induction. Our analyses suggest that the reversible changes in cytology and gene expression during dormancy release and induction are related to ABA/GA balance. Moreover, mRNA degradation functions as a critical posttranscriptional regulator during primary dormancy release. These findings provide a mechanistic framework for understanding physiological dormancy in seeds. © 2016 American Society of Plant Biologists. All Rights Reserved.

  18. Novel Kunitz-like Peptides Discovered in the Zoanthid Palythoa caribaeorum through Transcriptome Sequencing.

    Science.gov (United States)

    Liao, Qiwen; Li, Shengnan; Siu, Shirley Weng In; Yang, Binrui; Huang, Chen; Chan, Judy Yuet-Wa; Morlighem, Jean-Étienne R L; Wong, Clarence Tsun Ting; Rádis-Baptista, Gandhi; Lee, Simon Ming-Yuen

    2018-02-02

    Palythoa caribaeorum (class Anthozoa) is a zoanthid that together jellyfishes, hydra, and sea anemones, which are venomous and predatory, belongs to the Phyllum Cnidaria. The distinguished feature in these marine animals is the cnidocytes in the body tissues, responsible for toxin production and injection that are used majorly for prey capture and defense. With exception for other anthozoans, the toxin cocktails of zoanthids have been scarcely studied and are poorly known. Here, on the basis of the analysis of P. caribaeorum transcriptome, numerous predicted venom-featured polypeptides were identified including allergens, neurotoxins, membrane-active, and Kunitz-like peptides (PcKuz). The three predicted PcKuz isotoxins (1-3) were selected for functional studies. Through computational processing comprising structural phylogenetic analysis, molecular docking, and dynamics simulation, PcKuz3 was shown to be a potential voltage gated potassium-channel inhibitor. PcKuz3 fitted well as new functional Kunitz-type toxins with strong antilocomotor activity as in vivo assessed in zebrafish larvae, with weak inhibitory effect toward proteases, as evaluated in vitro. Notably, PcKuz3 can suppress, at low concentration, the 6-OHDA-induced neurotoxicity on the locomotive behavior of zebrafish, which indicated PcKuz3 may have a neuroprotective effect. Taken together, PcKuz3 figures as a novel neurotoxin structure, which differs from known homologous peptides expressed in sea anemone. Moreover, the novel PcKuz3 provides an insightful hint for biodrug development for prospective neurodegenerative disease treatment.

  19. Next-generation transcriptome assembly

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey A.; Wang, Zhong

    2011-09-01

    Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalog of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies-along with some perspectives on transcriptome assembly in the near future.

  20. Next generation sequencing and de novo transcriptome analysis of Costus pictus D. Don, a non-model plant with potent anti-diabetic properties

    Directory of Open Access Journals (Sweden)

    Annadurai Ramasamy S

    2012-11-01

    Full Text Available Abstract Background Phyto-remedies for diabetic control are popular among patients with Type II Diabetes mellitus (DM, in addition to other diabetic control measures. A number of plant species are known to possess diabetic control properties. Costus pictus D. Don is popularly known as “Insulin Plant” in Southern India whose leaves have been reported to increase insulin pools in blood plasma. Next Generation Sequencing is employed as a powerful tool for identifying molecular signatures in the transcriptome related to physiological functions of plant tissues. We sequenced the leaf transcriptome of C. pictus using Illumina reversible dye terminator sequencing technology and used combination of bioinformatics tools for identifying transcripts related to anti-diabetic properties of C. pictus. Results A total of 55,006 transcripts were identified, of which 69.15% transcripts could be annotated. We identified transcripts related to pathways of bixin biosynthesis and geraniol and geranial biosynthesis as major transcripts from the class of isoprenoid secondary metabolites and validated the presence of putative norbixin methyltransferase, a precursor of Bixin. The transcripts encoding these terpenoids are known to be Peroxisome Proliferator-Activated Receptor (PPAR agonists and anti-glycation agents. Sequential extraction and High Performance Liquid Chromatography (HPLC confirmed the presence of bixin in C. pictus methanolic extracts. Another significant transcript identified in relation to anti-diabetic, anti-obesity and immuno-modulation is of Abscisic Acid biosynthetic pathway. We also report many other transcripts for the biosynthesis of antitumor, anti-oxidant and antimicrobial metabolites of C. pictus leaves. Conclusion Solid molecular signatures (transcripts related to bixin, abscisic acid, and geranial and geraniol biosynthesis for the anti-diabetic properties of C. pictus leaves and vital clues related to the other phytochemical functions

  1. Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus (Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs

    Science.gov (United States)

    Yang, Wei; Chen, Huapu; Cui, Xuefan; Zhang, Kewei; Jiang, Dongneng; Deng, Siping; Zhu, Chunhua; Li, Guangli

    2017-09-01

    Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker

  2. De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing

    Science.gov (United States)

    Zhuang Hu; Tian Zhang; Xiao-Xiao Gao; Yang Wang; Qiang Zhang; Hui-Juan Zhou; Gui-Fang Zhao; Ma-Li Wang; Keith E. Woeste; Peng Zhao

    2016-01-01

    Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome...

  3. Transcriptome analysis of IL-10-stimulated (M2c) macrophages by next-generation sequencing.

    Science.gov (United States)

    Lurier, Emily B; Dalton, Donald; Dampier, Will; Raman, Pichai; Nassiri, Sina; Ferraro, Nicole M; Rajagopalan, Ramakrishan; Sarmady, Mahdi; Spiller, Kara L

    2017-07-01

    Alternatively activated "M2" macrophages are believed to function during late stages of wound healing, behaving in an anti-inflammatory manner to mediate the resolution of the pro-inflammatory response caused by "M1" macrophages. However, the differences between two main subtypes of M2 macrophages, namely interleukin-4 (IL-4)-stimulated "M2a" macrophages and IL-10-stimulated "M2c" macrophages, are not well understood. M2a macrophages are characterized by their ability to inhibit inflammation and contribute to the stabilization of angiogenesis. However, the role and temporal profile of M2c macrophages in wound healing are not known. Therefore, we performed next generation sequencing (RNA-seq) to identify biological functions and gene expression signatures of macrophages polarized in vitro with IL-10 to the M2c phenotype in comparison to M1 and M2a macrophages and an unactivated control (M0). We then explored the expression of these gene signatures in a publicly available data set of human wound healing. RNA-seq analysis showed that hundreds of genes were upregulated in M2c macrophages compared to the M0 control, with thousands of alternative splicing events. Following validation by Nanostring, 39 genes were found to be upregulated by M2c macrophages compared to the M0 control, and 17 genes were significantly upregulated relative to the M0, M1, and M2a phenotypes (using an adjusted p-value cutoff of 0.05 and fold change cutoff of 1.5). Many of the identified M2c-specific genes are associated with angiogenesis, matrix remodeling, and phagocytosis, including CD163, MMP8, TIMP1, VCAN, SERPINA1, MARCO, PLOD2, PCOCLE2 and F5. Analysis of the macrophage-conditioned media for secretion of matrix-remodeling proteins showed that M2c macrophages secreted higher levels of MMP7, MMP8, and TIMP1 compared to the other phenotypes. Interestingly, temporal gene expression analysis of a publicly available microarray data set of human wound healing showed that M2c-related genes were

  4. De novo assembly of transcriptome sequencing in Caragana korshinskii Kom. and characterization of EST-SSR markers.

    Directory of Open Access Journals (Sweden)

    Yan Long

    Full Text Available Caragana korshinskii Kom. is widely distributed in various habitats, including gravel desert, clay desert, fixed and semi-fixed sand, and saline land in the Asian and African deserts. To date, no previous genomic information or EST-SSR marker has been reported in Caragana Fabr. genus. In this study, more than two billion bases of high-quality sequence of C. korshinskii were generated by using illumina sequencing technology and demonstrated the de novo assembly and annotation of genes without prior genome information. These reads were assembled into 86,265 unigenes (mean length = 709 bp. The similarity search indicated that 33,955 and 21,978 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 26,232 a unigenes were separately assigned to Gene Ontology (GO database. When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG database, 5,598 unigenes were assigned to 5 main categories including 32 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (2,862, 43.7%, suggesting the active metabolic processes in the desert tree. In addition, a total of 19,150 EST-SSRs were identified from 15,484 unigenes, and the characterizations of EST-SSRs were further compared with other four species in Fabraceae. 126 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among the 9 germplasms in Caranaga Fabr. genus, PCR success rate were 93.7% and the phylogenic tree was constructed based on the genotypic data. This research generated a substantial fraction of transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding.

  5. Identification of a RAI1-associated disease network through integration of exome sequencing, transcriptomics, and 3D genomics

    Directory of Open Access Journals (Sweden)

    Maria Nicla Loviglio

    2016-11-01

    Full Text Available Abstract Background Smith-Magenis syndrome (SMS is a developmental disability/multiple congenital anomaly disorder resulting from haploinsufficiency of RAI1. It is characterized by distinctive facial features, brachydactyly, sleep disturbances, and stereotypic behaviors. Methods We investigated a cohort of 15 individuals with a clinical suspicion of SMS who showed neither deletion in the SMS critical region nor damaging variants in RAI1 using whole exome sequencing. A combination of network analysis (co-expression and biomedical text mining, transcriptomics, and circularized chromatin conformation capture (4C-seq was applied to verify whether modified genes are part of the same disease network as known SMS-causing genes. Results Potentially deleterious variants were identified in nine of these individuals using whole-exome sequencing. Eight of these changes affect KMT2D, ZEB2, MAP2K2, GLDC, CASK, MECP2, KDM5C, and POGZ, known to be associated with Kabuki syndrome 1, Mowat-Wilson syndrome, cardiofaciocutaneous syndrome, glycine encephalopathy, mental retardation and microcephaly with pontine and cerebellar hypoplasia, X-linked mental retardation 13, X-linked mental retardation Claes-Jensen type, and White-Sutton syndrome, respectively. The ninth individual carries a de novo variant in JAKMIP1, a regulator of neuronal translation that was recently found deleted in a patient with autism spectrum disorder. Analyses of co-expression and biomedical text mining suggest that these pathologies and SMS are part of the same disease network. Further support for this hypothesis was obtained from transcriptome profiling that showed that the expression levels of both Zeb2 and Map2k2 are perturbed in Rai1 –/– mice. As an orthogonal approach to potentially contributory disease gene variants, we used chromatin conformation capture to reveal chromatin contacts between RAI1 and the loci flanking ZEB2 and GLDC, as well as between RAI1 and human orthologs of the

  6. Sequencing, de novo annotation and analysis of the first Anguilla anguilla transcriptome: EeelBase opens new perspectives for the study of the critically endangered european eel

    Directory of Open Access Journals (Sweden)

    Bernatchez Louis

    2010-11-01

    Full Text Available Abstract Background Once highly abundant, the European eel (Anguilla anguilla L.; Anguillidae; Teleostei is considered to be critically endangered and on the verge of extinction, as the stock has declined by 90-99% since the 1980s. Yet, the species is poorly characterized at molecular level with little sequence information available in public databases. Results The first European eel transcriptome was obtained by 454 FLX Titanium sequencing of a normalized cDNA library, produced from a pool of 18 glass eels (juveniles from the French Atlantic coast and two sites in the Mediterranean coast. Over 310,000 reads were assembled in a total of 19,631 transcribed contigs, with an average length of 531 nucleotides. Overall 36% of the contigs were annotated to known protein/nucleotide sequences and 35 putative miRNA identified. Conclusions This study represents the first transcriptome analysis for a critically endangered species. EeelBase, a dedicated database of annotated transcriptome sequences of the European eel is freely available at http://compgen.bio.unipd.it/eeelbase. Considering the multiple factors potentially involved in the decline of the European eel, including anthropogenic factors such as pollution and human-introduced diseases, our results will provide a rich source of data to discover and identify new genes, characterize gene expression, as well as for identification of genetic markers scattered across the genome to be used in various applications.

  7. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Science.gov (United States)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  8. Transcriptome sequencing and development of an expression microarray platform for the domestic ferret

    Directory of Open Access Journals (Sweden)

    McBrayer Alexis

    2010-04-01

    Full Text Available Abstract Background The ferret (Mustela putorius furo represents an attractive animal model for the study of respiratory diseases, including influenza. Despite its importance for biomedical research, the number of reagents for molecular and immunological analysis is restricted. We present here a parallel sequencing effort to produce an extensive EST (expressed sequence tags dataset derived from a normalized ferret cDNA library made from mRNA from ferret blood, liver, lung, spleen and brain. Results We produced more than 500000 sequence reads that were assembled into 16000 partial ferret genes. These genes were combined with the available ferret sequences in the GenBank to develop a ferret specific microarray platform. Using this array, we detected tissue specific expression patterns which were confirmed by quantitative real time PCR assays. We also present a set of 41 ferret genes with even transcription profiles across the tested tissues, indicating their usefulness as housekeeping genes. Conclusion The tools developed in this study allow for functional genomic analysis and make further development of reagents for the ferret model possible.

  9. Transcriptome analysis of Loxosceles laeta (Araneae, Sicariidae) spider venomous gland using expressed sequence tags.

    Science.gov (United States)

    Fernandes-Pedrosa, Matheus de F; Junqueira-de-Azevedo, Inácio de L M; Gonçalves-de-Andrade, Rute M; Kobashi, Leonardo S; Almeida, Diego D; Ho, Paulo L; Tambourgi, Denise V

    2008-06-12

    The bite of spiders belonging to the genus Loxosceles can induce a variety of clinical symptoms, including dermonecrosis, thrombosis, vascular leakage, haemolysis, and persistent inflammation. In order to examine the transcripts expressed in venom gland of Loxosceles laeta spider and to unveil the potential of its products on cellular structure and functional aspects, we generated 3,008 expressed sequence tags (ESTs) from a cDNA library. All ESTs were clustered into 1,357 clusters, of which 16.4% of the total ESTs belong to recognized toxin-coding sequences, being the Sphingomyelinases D the most abundant transcript; 14.5% include "possible toxins", whose transcripts correspond to metalloproteinases, serinoproteinases, hyaluronidases, lipases, C-lectins, cystein peptidases and inhibitors. Thirty three percent of the ESTs are similar to cellular transcripts, being the major part represented by molecules involved in gene and protein expression, reflecting the specialization of this tissue for protein synthesis. In addition, a considerable number of sequences, 25%, has no significant similarity to any known sequence. This study provides a first global view of the gene expression scenario of the venom gland of L. laeta described so far, indicating the molecular bases of its venom composition.

  10. Transcriptome analysis of Loxosceles laeta (Araneae, Sicariidae spider venomous gland using expressed sequence tags

    Directory of Open Access Journals (Sweden)

    Almeida Diego D

    2008-06-01

    Full Text Available Abstract Background The bite of spiders belonging to the genus Loxosceles can induce a variety of clinical symptoms, including dermonecrosis, thrombosis, vascular leakage, haemolysis, and persistent inflammation. In order to examine the transcripts expressed in venom gland of Loxosceles laeta spider and to unveil the potential of its products on cellular structure and functional aspects, we generated 3,008 expressed sequence tags (ESTs from a cDNA library. Results All ESTs were clustered into 1,357 clusters, of which 16.4% of the total ESTs belong to recognized toxin-coding sequences, being the Sphingomyelinases D the most abundant transcript; 14.5% include "possible toxins", whose transcripts correspond to metalloproteinases, serinoproteinases, hyaluronidases, lipases, C-lectins, cystein peptidases and inhibitors. Thirty three percent of the ESTs are similar to cellular transcripts, being the major part represented by molecules involved in gene and protein expression, reflecting the specialization of this tissue for protein synthesis. In addition, a considerable number of sequences, 25%, has no significant similarity to any known sequence. Conclusion This study provides a first global view of the gene expression scenario of the venom gland of L. laeta described so far, indicating the molecular bases of its venom composition.

  11. Transcriptome sequencing of field pea and faba bean for discovery and validation of SSR genetic markers

    Directory of Open Access Journals (Sweden)

    Kaur Sukhjiwan

    2012-03-01

    Full Text Available Abstract Background Field pea (Pisum sativum L. and faba bean (Vicia faba L. are cool-season grain legume species that provide rich sources of food for humans and fodder for livestock. To date, both species have been relative 'genomic orphans' due to limited availability of genetic and genomic information. A significant enrichment of genomic resources is consequently required in order to understand the genetic architecture of important agronomic traits, and to support germplasm enhancement, genetic diversity, population structure and demographic studies. Results cDNA samples obtained from various tissue types of specific field pea and faba bean genotypes were sequenced using 454 Roche GS FLX Titanium technology. A total of 720,324 and 304,680 reads for field pea and faba bean, respectively, were de novo assembled to generate sets of 70,682 and 60,440 unigenes. Consensus sequences were compared against the genome of the model legume species Medicago truncatula Gaertn., as well as that of the more distantly related, but better-characterised genome of Arabidopsis thaliana L.. In comparison to M. truncatula coding sequences, 11,737 and 10,179 unique hits were obtained from field pea and faba bean. Totals of 22,057 field pea and 18,052 faba bean unigenes were subsequently annotated from GenBank. Comparison to the genome of soybean (Glycine max L. resulted in 19,451 unique hits for field pea and 16,497 unique hits for faba bean, corresponding to c. 35% and 30% of the known gene space, respectively. Simple sequence repeat (SSR-containing expressed sequence tags (ESTs were identified from consensus sequences, and totals of 2,397 and 802 primer pairs were designed for field pea and faba bean. Subsets of 96 EST-SSR markers were screened for validation across modest panels of field pea and faba bean cultivars, as well as related non-domesticated species. For field pea, 86 primer pairs successfully obtained amplification products from one or more template

  12. De novo sequencing and comprehensive analysis of the mutant transcriptome from purple sweet potato (Ipomoea batatas L.).

    Science.gov (United States)

    Ma, Peiyong; Bian, Xiaofeng; Jia, Zhaodong; Guo, Xiaoding; Xie, Yizhi

    2016-01-10

    Purple sweet potatoes, rich in anthocyanin, have been widely favored in light of increasing awareness of health and food safety. In this study, a mutant of purple sweet potato (white peel and flesh) was used to study anthocyanin metabolism by high-throughput RNA sequencing and comparative analysis of the mutant and wild type transcriptomes. A total of 88,509 unigenes ranging from 200nt to 14,986nt with an average length of 849nt were obtained. Unigenes were assigned to Gene Ontology (GO), Clusters of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Functional enrichment using GO and KEGG annotations showed that 3828 of the differently expressed genes probably influenced many important biological and metabolic pathways, including anthocyanin biosynthesis. Most importantly, the structural and transcription factor genes that contribute to anthocyanin biosynthesis were downregulated in the mutant. The unigene dataset that was used to discover the anthocyanin candidate genes can serve as a comprehensive resource for molecular research in sweet potato. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Identification of functional genes involved in Cd(2+) response of Chinese surf clam (Mactra chinensis) through transcriptome sequencing.

    Science.gov (United States)

    Zhang, Jingjing; Li, Hongjun; Qin, Yanjie; Ye, Sheng; Liu, Min

    2016-01-01

    The Chinese surf clam Mactra chinensis is an economically important bivalve species in the coastal waters of Liaoning and Shandong Province, China. In this study, we carried out transcriptome sequencing to develop molecular resources for M. chinensis and conducted an acute test of Cd(2+) stimulation through quantitative real-time PCR (qRT-PCR) to analyze the relative expression of six functional genes. A total of 100,839 transcripts and 56,712 unigenes were obtained from 39.9 million filtered reads and 21,305 unigenes were annotated by hitting against NCBI database. According to the results of qRT-PCR, heat shock protein 22 (Hsp22) and cytochrome P450 (CYP450(2C31)) were inhibited in the low concentration, and induced in the high concentration of Cd(2+); thioredoxin peroxidase (TPx-A) was at normal level in low concentration, but induced in high concentration of Cd(2+); glutathione peroxidase A (GPA), glutathione peroxidase 1 (GPA1) and Mn superoxide dismutase gene (MnSOD) were down-regulated when exposed to any treatment groups. Expression levels of the six functional genes following Cd(2+) exposure indicated that these genes were linked to environmental stress. Moreover, the present work enriched the molecule genetic data of M. chinensis. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Next-generation sequencing-based transcriptome analysis of Helicoverpa armigera Larvae immune-primed with Photorhabdus luminescens TT01.

    Directory of Open Access Journals (Sweden)

    Zengyang Zhao

    Full Text Available Although invertebrates are incapable of adaptive immunity, immunal reactions which are functionally similar to the adaptive immunity of vertebrates have been described in many studies of invertebrates including insects. The phenomenon was termed immune priming. In order to understand the molecular mechanism of immune priming, we employed Illumina/Solexa platform to investigate the transcriptional changes of the hemocytes and fat body of Helicoverpa armigera larvae immune-primed with the pathogenic bacteria Photorhabdus luminescens TT01. A total of 43.6 and 65.1 million clean reads with 4.4 and 6.5 gigabase sequence data were obtained from the TT01 (the immune-primed and PBS (non-primed cDNA libraries and assembled into 35,707 all-unigenes (non-redundant transcripts, which has a length varied from 201 to 16,947 bp and a N50 length of 1,997 bp. For 35,707 all-unigenes, 20,438 were functionally annotated and 2,494 were differentially expressed after immune priming. The differentially expressed genes (DEGs are mainly related to immunity, detoxification, development and metabolism of the host insect. Analysis on the annotated immune related DEGs supported a hypothesis that we proposed previously: the immune priming phenomenon observed in H. armigera larvae was achieved by regulation of key innate immune elements. The transcriptome profiling data sets (especially the sequences of 1,022 unannotated DEGs and the clues (such as those on immune-related signal and regulatory pathways obtained from this study will facilitate immune-related novel gene discovery and provide valuable information for further exploring the molecular mechanism of immune priming of invertebrates. All these will increase our understanding of invertebrate immunity which may provide new approaches to control insect pests or prevent epidemic of infectious diseases in economic invertebrates in the future.

  15. Gene expression profiling of MYC-driven tumor signatures in porcine liver stem cells by transcriptome sequencing.

    Science.gov (United States)

    Aravalli, Rajagopal N; Talbot, Neil C; Steer, Clifford J

    2015-02-21

    To identify the genes induced and regulated by the MYC protein in generating tumors from liver stem cells. In this study, we have used an immortal porcine liver stem cell line, PICM-19, to study the role of c-MYC in hepatocarcinogenesis. PICM-19 cells were converted into cancer cells (PICM-19-CSCs) by overexpressing human MYC. To identify MYC-driven differential gene expression, transcriptome sequencing was carried out by RNA sequencing, and genes identified by this method were validated using real-time PCR. In vivo tumorigenicity studies were then conducted by injecting PICM-19-CSCs into the flanks of immunodeficient mice. Our results showed that MYC-overexpressing PICM-19 stem cells formed tumors in immunodeficient mice demonstrating that a single oncogene was sufficient to convert them into cancer cells (PICM-19-CSCs). By using comparative bioinformatics analyses, we have determined that > 1000 genes were differentially expressed between PICM-19 and PICM-19-CSCs. Gene ontology analysis further showed that the MYC-induced, altered gene expression was primarily associated with various cellular processes, such as metabolism, cell adhesion, growth and proliferation, cell cycle, inflammation and tumorigenesis. Interestingly, six genes expressed by PICM-19 cells (CDO1, C22orf39, DKK2, ENPEP, GPX6, SRPX2) were completely silenced after MYC-induction in PICM-19-CSCs, suggesting that the absence of these genes may be critical for inducing tumorigenesis. MYC-driven genes may serve as promising candidates for the development of hepatocellular carcinoma therapeutics that would not have deleterious effects on other cell types in the liver.

  16. Deep sequencing reveals transcriptome re-programming of Polygonum multiflorum thunb. roots to the elicitation with methyl jasmonate.

    Science.gov (United States)

    Liu, Hongchang; Wu, Wei; Hou, Kai; Chen, Junwen; Zhao, Zhi

    2016-02-01

    The phytohormone methyl jasmonate (MeJA) has been successfully used as an effective elicitor to enhance production of stilbenoid which is induced in plants as a secondary metabolite possibly in defense against herbivores and pathogens. However, the mechanism of MeJA-mediated stilbenoid biosynthesis remains unclear. Genomic information for Polygonum multiflorum Thunb. (P. multiflorum) is currently unavailable. To obtain insight into the global regulation mechanism of MeJA in the steady state of stilbene glucoside production (26 h after MeJA elicitation), especially on stilbene glucoside biosynthesis, we sequenced the transcriptomes of MeJA-treated and untreated P. multiflorum roots and obtained more than 51 million clean reads, from which 79,565 unigenes were obtained by de novo assembly. 56,972 unigenes were annotated against databases including Nr, Nt, Swiss-Prot, KEGG and COG. 18,677 genes expressed differentially between untreated and treated roots. Expression level analysis indicated that a large number of genes were associated with plant-pathogen interaction, plant hormone signal transduction, stilbenoid backbone biosynthesis, and phenylpropanoid biosynthesis. 15 known genes involved in the biosynthesis of stilbenoid backbone were found with 7 genes showing increased transcript abundance following elicitation of MeJA. The significantly up (down)-regulated changes of 70 genes in stilbenoid biosynthesis were validated by qRT-PCR assays and PCR product sequencing. According to the expression changes and the previously proposed enzyme functions, multiple candidates for the unknown steps in stilbene glucoside biosynthesis were identified. We also found some genes putatively involved in the transcription factors. This comprehensive description of gene expression information could greatly facilitate our understanding of the molecular mechanisms of MeJA-mediated stilbenoid biosynthesis in P. multiflorum roots. Our results shed new light on the global regulation

  17. Deep Sequencing-Based Transcriptome Analysis Reveals the Regulatory Mechanism of Bemisia tabaci (Hemiptera: Aleyrodidae Nymph Parasitized by Encarsia sophia (Hymenoptera: Aphelinidae.

    Directory of Open Access Journals (Sweden)

    Yingying Wang

    Full Text Available The whitefly Bemisia tabaci is a genetically diverse complex with multiple cryptic species, and some are the most destructive invasive pests of many ornamentals and crops worldwide. Encarsia sophia is an autoparasitoid wasp that demonstrated high efficiency as bio-control agent of whiteflies. However, the immune mechanism of B. tabaci parasitization by E. sophia is unknown. In order to investigate immune response of B. tabaci to E. Sophia parasitization, the transcriptome of E. sophia parasitized B. tabaci nymph was sequenced by Illumina sequencing. De novo assembly generated 393,063 unigenes with average length of 616 bp, in which 46,406 unigenes (15.8% of all unigenes were successfully mapped. Parasitization by E. sophia had significant effects on the transcriptome profile of B. tabaci nymph. A total of 1482 genes were significantly differentially expressed, of which 852 genes were up-regulated and 630 genes were down-regulated. These genes were mainly involved in immune response, development, metabolism and host signaling pathways. At least 52 genes were found to be involved in the host immune response, 33 genes were involved in the development process, and 29 genes were involved in host metabolism. Taken together, the assembled and annotated transcriptome sequences provided a valuable genomic resource for further understanding the molecular mechanism of immune response of B. tabaci parasitization by E. sophia.

  18. Transcriptome sequencing and de novo analysis of a cytoplasmic male sterile line and its near-isogenic restorer line in chili pepper (Capsicum annuum L..

    Directory of Open Access Journals (Sweden)

    Chen Liu

    Full Text Available BACKGROUND: The use of cytoplasmic male sterility (CMS in F1 hybrid seed production of chili pepper is increasingly popular. However, the molecular mechanisms of cytoplasmic male sterility and fertility restoration remain poorly understood due to limited transcriptomic and genomic data. Therefore, we analyzed the difference between a CMS line 121A and its near-isogenic restorer line 121C in transcriptome level using next generation sequencing technology (NGS, aiming to find out critical genes and pathways associated with the male sterility. RESULTS: We generated approximately 53 million sequencing reads and assembled de novo, yielding 85,144 high quality unigenes with an average length of 643 bp. Among these unigenes, 27,191 were identified as putative homologs of annotated sequences in the public protein databases, 4,326 and 7,061 unigenes were found to be highly abundant in lines 121A and 121C, respectively. Many of the differentially expressed unigenes represent a set of potential candidate genes associated with the formation or abortion of pollen. CONCLUSIONS: Our study profiled anther transcriptomes of a chili pepper CMS line and its restorer line. The results shed the lights on the occurrence and recovery of the disturbances in nuclear-mitochondrial interaction and provide clues for further investigations.

  19. A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing.

    Science.gov (United States)

    Pereira-Leal, José B; Abreu, Isabel A; Alabaça, Cláudia S; Almeida, Maria Helena; Almeida, Paulo; Almeida, Tânia; Amorim, Maria Isabel; Araújo, Susana; Azevedo, Herlânder; Badia, Aleix; Batista, Dora; Bohn, Andreas; Capote, Tiago; Carrasquinho, Isabel; Chaves, Inês; Coelho, Ana Cristina; Costa, Maria Manuela Ribeiro; Costa, Rita; Cravador, Alfredo; Egas, Conceição; Faro, Carlos; Fortes, Ana M; Fortunato, Ana S; Gaspar, Maria João; Gonçalves, Sónia; Graça, José; Horta, Marília; Inácio, Vera; Leitão, José M; Lino-Neto, Teresa; Marum, Liliana; Matos, José; Mendonça, Diogo; Miguel, Andreia; Miguel, Célia M; Morais-Cecílio, Leonor; Neves, Isabel; Nóbrega, Filomena; Oliveira, Maria Margarida; Oliveira, Rute; Pais, Maria Salomé; Paiva, Jorge A; Paulo, Octávio S; Pinheiro, Miguel; Raimundo, João A P; Ramalho, José C; Ribeiro, Ana I; Ribeiro, Teresa; Rocheta, Margarida; Rodrigues, Ana Isabel; Rodrigues, José C; Saibo, Nelson J M; Santo, Tatiana E; Santos, Ana Margarida; Sá-Pereira, Paula; Sebastiana, Mónica; Simões, Fernanda; Sobral, Rómulo S; Tavares, Rui; Teixeira, Rita; Varela, Carolina; Veloso, Maria Manuela; Ricardo, Cândido P P

    2014-05-15

    Cork oak (Quercus suber) is one of the rare trees with the ability to produce cork, a material widely used to make wine bottle stoppers, flooring and insulation materials, among many other uses. The molecular mechanisms of cork formation are still poorly understood, in great part due to the difficulty in studying a species with a long life-cycle and for which there is scarce molecular/genomic information. Cork oak forests are of great ecological importance and represent a major economic and social resource in Southern Europe and Northern Africa. However, global warming is threatening the cork oak forests by imposing thermal, hydric and many types of novel biotic stresses. Despite the economic and social value of the Q. suber species, few genomic resources have been developed, useful for biotechnological applications and improved forest management. We generated in excess of 7 million sequence reads, by pyrosequencing 21 normalized cDNA libraries derived from multiple Q. suber tissues and organs, developmental stages and physiological conditions. We deployed a stringent sequence processing and assembly pipeline that resulted in the identification of ~159,000 unigenes. These were annotated according to their similarity to known plant genes, to known Interpro domains, GO classes and E.C. numbers. The phylogenetic extent of this ESTs set was investigated, and we found that cork oak revealed a significant new gene space that is not covered by other model species or EST sequencing projects. The raw data, as well as the full annotated assembly, are now available to the community in a dedicated web portal at http://www.corkoakdb.org. This genomic resource represents the first trancriptome study in a cork producing species. It can be explored to develop new tools and approaches to understand stress responses and developmental processes in forest trees, as well as the molecular cascades underlying cork differentiation and disease response.

  20. Comparative transcriptome analysis of the biocontrol strain Bacillus amyloliquefaciens FZB42 as response to biofilm formation analyzed by RNA sequencing.

    Science.gov (United States)

    Kröber, Magdalena; Verwaaijen, Bart; Wibberg, Daniel; Winkler, Anika; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    The strain Bacillus amyloliquefaciens FZB42 is a plant growth promoting rhizobacterium (PGPR) and biocontrol agent known to keep infections of lettuce (Lactuca sativa) by the phytopathogen Rhizoctonia solani down. Several mechanisms, including the production of secondary metabolites possessing antimicrobial properties and induction of the host plant's systemic resistance (ISR), were proposed to explain the biocontrol effect of the strain. B. amyloliquefaciens FZB42 is able to form plaques (biofilm-like structures) on plant roots and this feature was discussed to be associated with its biocontrol properties. For this reason, formation of B. amyloliquefaciens biofilms was studied at the transcriptional level using high-throughput sequencing of whole transcriptome cDNA libraries from cells grown under biofilm-forming conditions vs. planktonic growth. Comparison of the transcriptional profiles of B. amyloliquefaciens FZB42 under these growth conditions revealed a common set of highly transcribed genes mostly associated with basic cellular functions. The lci gene, encoding an antimicrobial peptide (AMP), was among the most highly transcribed genes of cells under both growth conditions suggesting that AMP production may contribute to biocontrol. In contrast, gene clusters coding for synthesis of secondary metabolites with antimicrobial properties were only moderately transcribed and not induced in biofilm-forming cells. Differential gene expression revealed that 331 genes were significantly up-regulated and 230 genes were down-regulated in the transcriptome of B. amyloliquefaciens FZB42 under biofilm-forming conditions in comparison to planktonic cells. Among the most highly up-regulated genes, the yvqHI operon, coding for products involved in nisin (class I bacteriocin) resistance, was identified. In addition, an operon whose products play a role in fructosamine metabolism was enhanced in its transcription. Moreover, genes involved in the production of the extracellular

  1. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants.

    Science.gov (United States)

    Li, Xin; Battle, Alexis; Karczewski, Konrad J; Zappala, Zach; Knowles, David A; Smith, Kevin S; Kukurba, Kim R; Wu, Eric; Simon, Noah; Montgomery, Stephen B

    2014-09-04

    Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  2. The impact of sequence length and number of sequences on promoter prediction performance.

    Science.gov (United States)

    Carvalho, Sávio G; Guerra-Sá, Renata; de C Merschmann, Luiz H

    2015-01-01

    The advent of rapid evolution on sequencing capacity of new genomes has evidenced the need for data analysis automation aiming at speeding up the genomic annotation process and reducing its cost. Given that one important step for functional genomic annotation is the promoter identification, several studies have been taken in order to propose computational approaches to predict promoters. Different classifiers and characteristics of the promoter sequences have been used to deal with this prediction problem. However, several works in literature have addressed the promoter prediction problem using datasets containing sequences of 250 nucleotides or more. As the sequence length defines the amount of dataset attributes, even considering a limited number of properties to characterize the sequences, datasets with a high number of attributes are generated for training classifiers. Once high-dimensional datasets can degrade the classifiers predictive performance or even require an infeasible processing time, predicting promoters by training classifiers from datasets with a reduced number of attributes, it is essential to obtain good predictive performance with low computational cost. To the best of our knowledge, there is no work in literature that verified in a systematic way the relation between the sequences length and the predictive performance of classifiers. Thus, in this work, we have evaluated the impact of sequence length variation and training dataset size (number of sequences) on the predictive performance of classifiers. We have built sixteen datasets composed of different sized sequences (ranging in length from 12 to 301 nucleotides) and evaluated them using the SVM, Random Forest and k-NN classifiers. The best predictive performances reached by SVM and Random Forest remained relatively stable for datasets composed of sequences varying in length from 301 to 41 nucleotides, while k-NN achieved its best performance for the dataset composed of 101 nucleotides. We

  3. Characterization of the Pathogenicity of Streptococcus intermedius TYG1620 Isolated from a Human Brain Abscess Based on the Complete Genome Sequence with Transcriptome Analysis and Transposon Mutagenesis in a Murine Subcutaneous Abscess Model.

    Science.gov (United States)

    Hasegawa, Noriko; Sekizuka, Tsuyoshi; Sugi, Yutaka; Kawakami, Nobuhiro; Ogasawara, Yumiko; Kato, Kengo; Yamashita, Akifumi; Takeuchi, Fumihiko; Kuroda, Makoto

    2017-02-01

    Streptococcus intermedius is known to cause periodontitis and pyogenic infections in the brain and liver. Here we report the complete genome sequence of strain TYG1620 (genome size, 2,006,877 bp; GC content, 37.6%; 2,020 predicted open reading frames [ORFs]) isolated from a brain abscess in an infant. Comparative analysis of S. intermedius genome sequences suggested that TYG1620 carries a notable type VII secretion system (T7SS), two long repeat regions, and 19 ORFs for cell wall-anchored proteins (CWAPs). To elucidate the genes responsible for the pathogenicity of TYG1620, transcriptome analysis was performed in a murine subcutaneous abscess model. The results suggest that the levels of expression of small hypothetical proteins similar to phenol-soluble modulin β1 (PSMβ1), a staphylococcal virulence factor, significantly increased in the abscess model. In addition, an experiment in a murine subcutaneous abscess model with random transposon (Tn) mutant attenuation suggested that Tn mutants with mutations in 212 ORFs in the Tn mutant library were attenuated in the murine abscess model (629 ORFs were disrupted in total); the 212 ORFs are putatively essential for abscess formation. Transcriptome analysis identified 37 ORFs, including paralogs of the T7SS and a putative glucan-binding CWAP in long repeat regions, to be upregulated and attenuated in vivo This study provides a comprehensive characterization of S. intermedius pathogenicity based on the complete genome sequence and a murine subcutaneous abscess model with transcriptome and Tn mutagenesis, leading to the identification of pivotal targets for vaccines or antimicrobial agents for the control of S. intermedius infections. Copyright © 2017 American Society for Microbiology.

  4. Microsatellites from Fosterella christophii (Bromeliaceae) by de novo transcriptome sequencing on the Pacific Biosciences RS platform.

    Science.gov (United States)

    Wöhrmann, Tina; Huettel, Bruno; Wagner, Natascha; Weising, Kurt

    2016-01-01

    Microsatellite markers were developed in Fosterella christophii (Bromeliaceae) to investigate the genetic diversity and population structure within the F. micrantha group, comprising F. christophii, F. micrantha, and F. villosula. Full-length cDNAs were isolated from F. christophii and sequenced on a Pacific Biosciences RS platform. A total of 1590 high-quality consensus isoforms were assembled into 971 unigenes containing 421 perfect microsatellites. Thirty primer sets were designed, of which 13 revealed a high level of polymorphism in three populations of F. christophii, with four to nine alleles per locus. Each of these 13 loci cross-amplified in the closely related species F. micrantha and F. villosula, with one to six and one to 11 alleles per locus, respectively. The new markers are promising tools to study the population genetics of F. christophii and to discover species boundaries within the F. micrantha group.

  5. The effect of music performance on the transcriptome of professional musicians.

    Science.gov (United States)

    Kanduri, Chakravarthi; Kuusi, Tuire; Ahvenainen, Minna; Philips, Anju K; Lähdesmäki, Harri; Järvelä, Irma

    2015-03-25

    Music performance by professional musicians involves a wide-spectrum of cognitive and multi-sensory motor skills, whose biological basis is unknown. Several neuroscientific studies have demonstrated that the brains of professional musicians and non-musicians differ structurally and functionally and that musical training enhances cognition. However, the molecules and molecular mechanisms involved in music performance remain largely unexplored. Here, we investigated the effect of music performance on the genome-wide peripheral blood transcriptome of professional musicians by analyzing the transcriptional responses after a 2-hr concert performance and after a 'music-free' control session. The up-regulated genes were found to affect dopaminergic neurotransmission, motor behavior, neuronal plasticity, and neurocognitive functions including learning and memory. Particularly, candidate genes such as SNCA, FOS and DUSP1 that are involved in song perception and production in songbirds, were identified, suggesting an evolutionary conservation in biological processes related to sound perception/production. Additionally, modulation of genes related to calcium ion homeostasis, iron ion homeostasis, glutathione metabolism, and several neuropsychiatric and neurodegenerative diseases implied that music performance may affect the biological pathways that are otherwise essential for the proper maintenance of neuronal function and survival. For the first time, this study provides evidence for the candidate genes and molecular mechanisms underlying music performance.

  6. Global transcriptome sequencing identifies chlamydospore specific markers in Candida albicans and Candida dubliniensis.

    Directory of Open Access Journals (Sweden)

    Katja Palige

    Full Text Available Candida albicans and Candida dubliniensis are pathogenic fungi that are highly related but differ in virulence and in some phenotypic traits. During in vitro growth on certain nutrient-poor media, C. albicans and C. dubliniensis are the only yeast species which are able to produce chlamydospores, large thick-walled cells of unknown function. Interestingly, only C. dubliniensis forms pseudohyphae with abundant chlamydospores when grown on Staib medium, while C. albicans grows exclusively as a budding yeast. In order to further our understanding of chlamydospore development and assembly, we compared the global transcriptional profile of both species during growth in liquid Staib medium by RNA sequencing. We also included a C. albicans mutant in our study which lacks the morphogenetic transcriptional repressor Nrg1. This strain, which is characterized by its constitutive pseudohyphal growth, specifically produces masses of chlamydospores in Staib medium, similar to C. dubliniensis. This comparative approach identified a set of putatively chlamydospore-related genes. Two of the homologous C. albicans and C. dubliniensis genes (CSP1 and CSP2 which were most strongly upregulated during chlamydospore development were analysed in more detail. By use of the green fluorescent protein as a reporter, the encoded putative cell wall related proteins were found to exclusively localize to C. albicans and C. dubliniensis chlamydospores. Our findings uncover the first chlamydospore specific markers in Candida species and provide novel insights in the complex morphogenetic development of these important fungal pathogens.

  7. Global Transcriptome Sequencing Identifies Chlamydospore Specific Markers in Candida albicans and Candida dubliniensis

    LENUS (Irish Health Repository)

    Palige, Katja

    2013-04-15

    Candida albicans and Candida dubliniensis are pathogenic fungi that are highly related but differ in virulence and in some phenotypic traits. During in vitro growth on certain nutrient-poor media, C. albicans and C. dubliniensis are the only yeast species which are able to produce chlamydospores, large thick-walled cells of unknown function. Interestingly, only C. dubliniensis forms pseudohyphae with abundant chlamydospores when grown on Staib medium, while C. albicans grows exclusively as a budding yeast. In order to further our understanding of chlamydospore development and assembly, we compared the global transcriptional profile of both species during growth in liquid Staib medium by RNA sequencing. We also included a C. albicans mutant in our study which lacks the morphogenetic transcriptional repressor Nrg1. This strain, which is characterized by its constitutive pseudohyphal growth, specifically produces masses of chlamydospores in Staib medium, similar to C. dubliniensis. This comparative approach identified a set of putatively chlamydospore-related genes. Two of the homologous C. albicans and C. dubliniensis genes (CSP1 and CSP2) which were most strongly upregulated during chlamydospore development were analysed in more detail. By use of the green fluorescent protein as a reporter, the encoded putative cell wall related proteins were found to exclusively localize to C. albicans and C. dubliniensis chlamydospores. Our findings uncover the first chlamydospore specific markers in Candida species and provide novel insights in the complex morphogenetic development of these important fungal pathogens.

  8. Development of an expressed gene catalogue and molecular markers from the de novo assembly of short sequence reads of the lentil (Lens culinaris Medik.) transcriptome.

    Science.gov (United States)

    Verma, Priyanka; Shah, Niraj; Bhatia, Sabhyata

    2013-09-01

    Genomic resources such as ESTs, molecular markers and linkage maps are essential for crop improvement. However, these resources are still limited in important legumes such as lentil (Lens culinaris Medik.), which is valued world wide as a rich source of dietary protein. In this study, the de novo transcriptome assembly of 119,855,798 short reads, generated by Illumina paired-end sequencing, was performed using various assembly programs. This resulted in 42,196 nonredundant high-quality transcripts of average length 810 bases, N50 value of 1,432 and an average expression per transcript of 26.21 rpkm reads per kilobase per million(RPKM). Similarity search with the unigenes and protein sequences of other plants resulted in maximum similarity with soybean. A total of 20,009 nonredundant transcripts showed similarity with the UniProtKB database and of these, 18,064 transcripts were grouped into three main GO categories, that is, biological process (15,126), molecular function (15,505) and cellular component (9,434). Annotated transcripts were mapped to 289 predicted Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and 8,893 transcripts were classified into 24 functional categories based on Cluster of Orthologous Groups (COG) of proteins. Mining the data set for the presence of SSRs resulted in 8,722 SSRs with a frequency occurrence of one SSR per 3.92 kb. From these, 5,673 SSR primer pairs were designed, and a subset of these were utilized for diversity analysis. This study, which provides a large data set of annotated transcripts and gene-based SSR markers, would serve as a foundation for various applications in lentil breeding and genetics. © 2013 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  9. De Novo Sequencing and Analysis of the Safflower Transcriptome to Discover Putative Genes Associated with Safflor Yellow in Carthamus tinctorius L.

    Directory of Open Access Journals (Sweden)

    Xiuming Liu

    2015-10-01

    Full Text Available Safflower (Carthamus tinctorius L., an important traditional Chinese medicine, is cultured widely for its pharmacological effects, but little is known regarding the genes related to the metabolic regulation of the safflower’s yellow pigment. To investigate genes related to safflor yellow biosynthesis, 454 pyrosequencing of flower RNA at different developmental stages was performed, generating large databases.In this study, we analyzed 454 sequencing data from different flowering stages in safflower. In total, 1,151,324 raw reads and 1,140,594 clean reads were produced, which were assembled into 51,591 unigenes with an average length of 679 bp and a maximum length of 5109 bp. Among the unigenes, 40,139 were in the early group, 39,768 were obtained from the full group and 28,316 were detected in both samples. With the threshold of “log2 ratio ≥ 1”, there were 34,464 differentially expressed genes, of which 18,043 were up-regulated and 16,421 were down-regulated in the early flower library. Based on the annotations of the unigenes, 281 pathways were predicted. We selected 12 putative genes and analyzed their expression levels using quantitative real time-PCR. The results were consistent with the 454 sequencing results. In addition, the expression of chalcone synthase, chalcone isomerase and anthocyanidin synthase, which are involved in safflor yellow biosynthesis and safflower yellow pigment (SYP content, were analyzed in different flowering periods, indicating that their expression levels were related to SYP synthesis. Moreover, to further confirm the results of the 454 pyrosequencing, full-length cDNA of chalcone isomerase (CHI and anthocyanidin synthase (ANS were cloned from safflower petal by RACE (Rapid-amplification of cDNA ends method according to fragment of the transcriptome.

  10. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L. reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    Directory of Open Access Journals (Sweden)

    Chen Jun

    2012-11-01

    Full Text Available Abstract Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10

  11. dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes

    Directory of Open Access Journals (Sweden)

    Li Hua

    2012-10-01

    Full Text Available Abstract Background Dysregulation of imprinted genes, which are expressed in a parent-of-origin-specific manner, plays an important role in various human diseases, such as cancer and behavioral disorder. To date, however, fewer than 100 imprinted genes have been identified in the human genome. The recent availability of high-throughput technology makes it possible to have large-scale prediction of imprinted genes. Here we propose a Bayesian model (dsPIG to predict imprinted genes on the basis of allelic expression observed in mRNA-Seq data of independent human tissues. Results Our model (dsPIG was capable of identifying imprinted genes with high sensitivity and specificity and a low false discovery rate when the number of sequenced tissue samples was fairly large, according to simulations. By applying dsPIG to the mRNA-Seq data, we predicted 94 imprinted genes in 20 cerebellum samples and 57 imprinted genes in 9 diverse tissue samples with expected low false discovery rates. We also assessed dsPIG using previously validated imprinted and non-imprinted genes. With simulations, we further analyzed how imbalanced allelic expression of non-imprinted genes or different minor allele frequencies affected the predictions of dsPIG. Interestingly, we found that, among biallelically expressed genes, at least 18 genes expressed significantly more transcripts from one allele than the other among different individuals and tissues. Conclusion With the prevalence of the mRNA-Seq technology, dsPIG has become a useful tool for analysis of allelic expression and large-scale prediction of imprinted genes. For ease of use, we have set up a web service and also provided an R package for dsPIG at http://www.shoudanliang.com/dsPIG/.

  12. Deep sequencing whole transcriptome exploration of the σE regulon in Neisseria meningitidis.

    Directory of Open Access Journals (Sweden)

    Robert Antonius Gerhardus Huis in 't Veld

    Full Text Available Bacteria live in an ever-changing environment and must alter protein expression promptly to adapt to these changes and survive. Specific response genes that are regulated by a subset of alternative σ(70-like transcription factors have evolved in order to respond to this changing environment. Recently, we have described the existence of a σ(E regulon including the anti-σ-factor MseR in the obligate human bacterial pathogen Neisseria meningitidis. To unravel the complete σ(E regulon in N. meningitidis, we sequenced total RNA transcriptional content of wild type meningococci and compared it with that of mseR mutant cells (ΔmseR in which σ(E is highly expressed. Eleven coding genes and one non-coding gene were found to be differentially expressed between H44/76 wildtype and H44/76ΔmseR cells. Five of the 6 genes of the σ(E operon, msrA/msrB, and the gene encoding a pepSY-associated TM helix family protein showed enhanced transcription, whilst aniA encoding a nitrite reductase and nspA encoding the vaccine candidate Neisserial surface protein A showed decreased transcription. Analysis of differential expression in IGRs showed enhanced transcription of a non-coding RNA molecule, identifying a σ(E dependent small non-coding RNA. Together this constitutes the first complete exploration of an alternative σ-factor regulon in N. meningitidis. The results direct to a relatively small regulon indicative for a strictly defined response consistent with a relatively stable niche, the human throat, where N. meningitidis resides.

  13. Genome Sequencing and Comparative Transcriptomics of the Model Entomopathogenic Fungi Metarhizium anisopliae and M. acridum

    Science.gov (United States)

    Shang, Yanfang; Duan, Zhibing; Hu, Xiao; Xie, Xue-Qin; Zhou, Gang; Peng, Guoxiong; Luo, Zhibing; Huang, Wei; Wang, Bing; Fang, Weiguo; Wang, Sibao; Zhong, Yi; Ma, Li-Jun; St. Leger, Raymond J.; Zhao, Guo-Ping; Pei, Yan; Feng, Ming-Guang; Xia, Yuxian; Wang, Chengshu

    2011-01-01

    Metarhizium spp. are being used as environmentally friendly alternatives to chemical insecticides, as model systems for studying insect-fungus interactions, and as a resource of genes for biotechnology. We present a comparative analysis of the genome sequences of the broad-spectrum insect pathogen Metarhizium anisopliae and the acridid-specific M. acridum. Whole-genome analyses indicate that the genome structures of these two species are highly syntenic and suggest that the genus Metarhizium evolved from plant endophytes or pathogens. Both M. anisopliae and M. acridum have a strikingly larger proportion of genes encoding secreted proteins than other fungi, while ∼30% of these have no functionally characterized homologs, suggesting hitherto unsuspected interactions between fungal pathogens and insects. The analysis of transposase genes provided evidence of repeat-induced point mutations occurring in M. acridum but not in M. anisopliae. With the help of pathogen-host interaction gene database, ∼16% of Metarhizium genes were identified that are similar to experimentally verified genes involved in pathogenicity in other fungi, particularly plant pathogens. However, relative to M. acridum, M. anisopliae has evolved with many expanded gene families of proteases, chitinases, cytochrome P450s, polyketide synthases, and nonribosomal peptide synthetases for cuticle-degradation, detoxification, and toxin biosynthesis that may facilitate its ability to adapt to heterogenous environments. Transcriptional analysis of both fungi during early infection processes provided further insights into the genes and pathways involved in infectivity and specificity. Of particular note, M. acridum transcribed distinct G-protein coupled receptors on cuticles from locusts (the natural hosts) and cockroaches, whereas M. anisopliae transcribed the same receptor on both hosts. This study will facilitate the identification of virulence genes and the development of improved biocontrol strains

  14. De novo transcriptome sequencing in Bixa orellana to identify genes involved in methylerythritol phosphate, carotenoid and bixin biosynthesis.

    Science.gov (United States)

    Cárdenas-Conejo, Yair; Carballo-Uicab, Víctor; Lieberman, Meric; Aguilar-Espinosa, Margarita; Comai, Luca; Rivera-Madrid, Renata

    2015-10-28

    Bixin or annatto is a commercially important natural orange-red pigment derived from lycopene that is produced and stored in seeds of Bixa orellana L. An enzymatic pathway for bixin biosynthesis was inferred from homology of putative proteins encoded by differentially expressed seed cDNAs. Some activities were later validated in a heterologous system. Nevertheless, much of the pathway remains to be clarified. For example, it is essential to identify the methylerythritol phosphate (MEP) and carotenoid pathways genes. In order to investigate the MEP, carotenoid, and bixin pathways genes, total RNA from young leaves and two different developmental stages of seeds from B. orellana were used for the construction of indexed mRNA libraries, sequenced on the Illumina HiSeq 2500 platform and assembled de novo using Velvet, CLC Genomics Workbench and CAP3 software. A total of 52,549 contigs were obtained with average length of 1,924 bp. Two phylogenetic analyses of inferred proteins, in one case encoded by thirteen general, single-copy cDNAs, in the other from carotenoid and MEP cDNAs, indicated that B. orellana is closely related to sister Malvales species cacao and cotton. Using homology, we identified 7 and 14 core gene products from the MEP and carotenoid pathways, respectively. Surprisingly, previously defined bixin pathway cDNAs were not present in our transcriptome. Here we propose a new set of gene products involved in bixin pathway. The identification and qRT-PCR quantification of cDNAs involved in annatto production suggest a hypothetical model for bixin biosynthesis that involve coordinated activation of some MEP, carotenoid and bixin pathway genes. These findings provide a better understanding of the mechanisms regulating these pathways and will facilitate the genetic improvement of B. orellana.

  15. SNP detection from de novo transcriptome sequencing in the bivalve Macoma balthica: marker development for evolutionary studies.

    Directory of Open Access Journals (Sweden)

    Eric Pante

    Full Text Available Hybrid zones are noteworthy systems for the study of environmental adaptation to fast-changing environments, as they constitute reservoirs of polymorphism and are key to the maintenance of biodiversity. They can move in relation to climate fluctuations, as temperature can affect both selection and migration, or remain trapped by environmental and physical barriers. There is therefore a very strong incentive to study the dynamics of hybrid zones subjected to climate variations. The infaunal bivalve Macoma balthica emerges as a noteworthy model species, as divergent lineages hybridize, and its native NE Atlantic range is currently contracting to the North. To investigate the dynamics and functioning of hybrid zones in M. balthica, we developed new molecular markers by sequencing the collective transcriptome of 30 individuals. Ten individuals were pooled for each of the three populations sampled at the margins of two hybrid zones. A single 454 run generated 277 Mb from which 17K SNPs were detected. SNP density averaged 1 polymorphic site every 14 to 19 bases, for mitochondrial and nuclear loci, respectively. An [Formula: see text] scan detected high genetic divergence among several hundred SNPs, some of them involved in energetic metabolism, cellular respiration and physiological stress. The high population differentiation, recorded for nuclear-encoded ATP synthase and NADH dehydrogenase as well as most mitochondrial loci, suggests cytonuclear genetic incompatibilities. Results from this study will help pave the way to a high-resolution study of hybrid zone dynamics in M. balthica, and the relative importance of endogenous and exogenous barriers to gene flow in this system.

  16. Understanding PRRSV infection in porcine lung based on genome-wide transcriptome response identified by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Shuqi Xiao

    Full Text Available Porcine reproductive and respiratory syndrome (PRRS has been one of the most economically important diseases affecting swine industry worldwide and causes great economic losses each year. PRRS virus (PRRSV replicates mainly in porcine alveolar macrophages (PAMs and dendritic cells (DCs and develops persistent infections, antibody-dependent enhancement (ADE, interstitial pneumonia and immunosuppression. But the molecular mechanisms of PRRSV infection still are poorly understood. Here we report on the first genome-wide host transcriptional responses to classical North American type PRRSV (N-PRRSV strain CH 1a infection using Solexa/Illumina's digital gene expression (DGE system, a tag-based high-throughput transcriptome sequencing method, and analyse systematically the relationship between pulmonary gene expression profiles after N-PRRSV infection and infection pathology. Our results suggest that N-PRRSV appeared to utilize multiple strategies for its replication and spread in infected pigs, including subverting host innate immune response, inducing an anti-apoptotic and anti-inflammatory state as well as developing ADE. Upregulation expression of virus-induced pro-inflammatory cytokines, chemokines, adhesion molecules and inflammatory enzymes and inflammatory cells, antibodies, complement activation were likely to result in the development of inflammatory responses during N-PRRSV infection processes. N-PRRSV-induced immunosuppression might be mediated by apoptosis of infected cells, which caused depletion of immune cells and induced an anti-inflammatory cytokine response in which they were unable to eradicate the primary infection. Our systems analysis will benefit for better understanding the molecular pathogenesis of N-PRRSV infection, developing novel antiviral therapies and identifying genetic components for swine resistance/susceptibility to PRRS.

  17. TCW: transcriptome computational workbench.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    Full Text Available BACKGROUND: The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. METHODOLOGY: The Transcriptome Computational Workbench (TCW provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms. The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina or assembling long sequences (e.g. Sanger, 454, transcripts, annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. CONCLUSION: It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the

  18. Transcriptome Profiling Using Single-Molecule Direct RNA Sequencing Approach for In-depth Understanding of Genes in Secondary Metabolism Pathways of Camellia sinensis

    Directory of Open Access Journals (Sweden)

    Qingshan Xu

    2017-07-01

    Full Text Available Characteristic secondary metabolites, including flavonoids, theanine and caffeine, are important components of Camellia sinensis, and their biosynthesis has attracted widespread interest. Previous studies on the biosynthesis of these major secondary metabolites using next-generation sequencing technologies limited the accurately prediction of full-length (FL splice isoforms. Herein, we applied single-molecule sequencing to pooled tea plant tissues, to provide a more complete transcriptome of C. sinensis. Moreover, we identified 94 FL transcripts and four alternative splicing events for enzyme-coding genes involved in the biosynthesis of flavonoids, theanine and caffeine. According to the comparison between long-read isoforms and assemble transcripts, we improved the quality and accuracy of genes sequenced by short-read next-generation sequencing technology. The resulting FL transcripts, together with the improved assembled transcripts and identified alternative splicing events, enhance our understanding of genes involved in the biosynthesis of characteristic secondary metabolites in C. sinensis.

  19. Transcriptome Analysis of Ceriops tagal in Saline Environments Using RNA-Sequencing.

    Directory of Open Access Journals (Sweden)

    Xiaorong Xiao

    Full Text Available Identification of genes involved in mangrove species' adaptation to salt stress can provide valuable information for developing salt-tolerant crops and understanding the molecular evolution of salt tolerance in halophiles. Ceriops tagal is a salt-tolerant mangrove tree growing in mudflats and marshes in tropical and subtropical areas, without any prior genome information. In this study, we assessed the biochemical and transcriptional responses of C. tagal to high salt treatment (500 mmol/L NaCl by hydroponic experiments and RNA-seq. In C. tagal root tissues under salt stress, proline accumulated strongly from 3 to 12 h of treatment; meanwhile, malondialdehyde content progressively increased from 0 to 9 h, then dropped to lower than control levels by 24 h. These implied that C. tagal plants could survive salt stress through biochemical modification. Using the Illumina sequencing platform, approximately 27.39 million RNA-seq reads were obtained from three salt-treated and control (untreated root samples. These reads were assembled into 47,111 transcripts with an average length of 514 bp and an N50 of 632 bp. Approximately 78% of the transcripts were annotated, and a total of 437 genes were putative transcription factors. Digital gene expression analysis was conducted by comparing transcripts from the untreated control to the three salt treated samples, and 7,330 differentially expressed transcripts were identified. Using k-means clustering, these transcripts were divided into six clusters that differed in their expression patterns across four treatment time points. The genes identified as being up- or downregulated are involved in salt stress responses, signal transduction, and DNA repair. Our study shows the main adaptive pathway of C. tagal in saline environments, under short-term and long-term treatments of salt stress. This provides vital clues as to which genes may be candidates for breeding salt-tolerant crops and clarifying molecular

  20. De novo transcriptome sequencing of the Octopus vulgaris hemocytes using Illumina RNA-Seq technology: response to the infection by the gastrointestinal parasite Aggregata octopiana.

    Science.gov (United States)

    Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino

    2014-01-01

    Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus' well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers for octopus resistance against

  1. Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers.

    Science.gov (United States)

    Wei, Wenliang; Qi, Xiaoqiong; Wang, Linhai; Zhang, Yanxin; Hua, Wei; Li, Donghua; Lv, Haixia; Zhang, Xiurong

    2011-09-19

    Sesame is an important oil crop, but limited transcriptomic and genomic data are currently available. This information is essential to clarify the fatty acid and lignan biosynthesis molecular mechanism. In addition, a shortage of sesame molecular markers limits the efficiency and accuracy of genetic breeding. High-throughput transcriptomic sequencing is essential to generate a large transcriptome sequence dataset for gene discovery and molecular marker development. Sesame transcriptomes from five tissues were sequenced using Illumina paired-end sequencing technology. The cleaned raw reads were assembled into a total of 86,222 unigenes with an average length of 629 bp. Of the unigenes, 46,584 (54.03%) had significant similarity with proteins in the NCBI nonredundant protein database and Swiss-Prot database (E-value < 10-5). Of these annotated unigenes, 10,805 and 27,588 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In total, 22,003 (25.52%) unigenes were mapped onto 119 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Furthermore, 44,750 unigenes showed homology to 15,460 Arabidopsis genes based on BLASTx analysis against The Arabidopsis Information Resource (TAIR, Version 10) and revealed relatively high gene coverage. In total, 7,702 unigenes were converted into SSR markers (EST-SSR). Dinucleotide SSRs were the dominant repeat motif (67.07%, 5,166), followed by trinucleotide (24.89%, 1,917), tetranucleotide (4.31%, 332), hexanucleotide (2.62%, 202), and pentanucleotide (1.10%, 85) SSRs. AG/CT (46.29%) was the dominant repeat motif, followed by AC/GT (16.07%), AT/AT (10.53%), AAG/CTT (6.23%), and AGG/CCT (3.39%). Fifty EST-SSRs were randomly selected to validate amplification and to determine the degree of polymorphism in the genomic DNA pools. Forty primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among 24 sesame accessions

  2. Transcriptome Sequencing and Differential Gene Expression Analysis of Delayed Gland Morphogenesis in Gossypium australe during Seed Germination

    Science.gov (United States)

    Tao, Tao; Zhao, Liang; Lv, Yuanda; Chen, Jiedan; Hu, Yan; Zhang, Tianzhen; Zhou, Baoliang

    2013-01-01

    The genus Gossypium is a globally important crop that is used to produce textiles, oil and protein. However, gossypol, which is found in cultivated cottonseed, is toxic to humans and non-ruminant animals. Efforts have been made to breed improved cultivated cotton with lower gossypol content. The delayed gland morphogenesis trait possessed by some Australian wild cotton species may enable the widespread, direct usage of cottonseed. However, the mechanisms about the delayed gland morphogenesis are still unknown. Here, we sequenced the first Australian wild cotton species ( Gossypium australe ) and a diploid cotton species ( Gossypium arboreum ) using the Illumina Hiseq 2000 RNA-seq platform to help elucidate the mechanisms underlying gossypol synthesis and gland development. Paired-end Illumina short reads were de novo assembled into 226,184, 213,257 and 275,434 transcripts, clustering into 61,048, 47,908 and 72,985 individual clusters with N50 lengths of 1,710 bp, 1544 BP and 1,743 bp, respectively. The clustered Unigenes were searched against three public protein databases (TrEMBL, SwissProt and RefSeq) and the nucleotide and protein sequences of Gossypium raimondii using BLASTx and BLASTn. A total of 21,987, 17,209 and 25,325 Unigenes were annotated. Of these, 18,766 (85.4%), 14,552 (84.6%) and 21,374 (84.4%) Unigenes could be assigned to GO-term classifications. We identified and analyzed 13,884 differentially expressed Unigenes by clustering and functional enrichment. Terpenoid-related biosynthesis pathways showed differentially regulated expression patterns between the two cotton species. Phylogenetic analysis of the terpene synthases family was also carried out to clarify the classifications of TPSs. RNA-seq data from two distinct cotton species provide comprehensive transcriptome annotation resources and global gene expression profiles during seed germination and gland and gossypol formation. These data may be used to further elucidate various mechanisms and

  3. Transcriptome Sequencing Analysis and Functional Identification of Sex Differentiation Genes from the Mosquito Parasitic Nematode, Romanomermis wuchangensis.

    Directory of Open Access Journals (Sweden)

    Mingyue Duan

    Full Text Available Mosquito-transmitted diseases like malaria and dengue fever are global problem and an estimated 50-100 million of dengue or dengue hemorrhagic fever cases are reported worldwide every year. The mermithid nematode Romanomermis wuchangensis has been successfully used as an ecosystem-friendly biocontrol agent for mosquito prevention in laboratory studies. However, this nematode can not undergo sex differentiation in vitro culture, which has seriously affected their application of biocontrol in the field. In this study, based on transcriptome sequencing analysis of R. wuchangensis, Rwucmab-3, Rwuclaf-1 and Rwuctra-2 were cloned and used to investigate molecular regulatory function of sex differentiation. qRT-PCR results demonstrated that the expression level of Rwucmab-3 between male and female displayed obvious difference on the 3rd day of parasitic stage, which was earlier than Rwuclaf-1 and Rwuctra-2, highlighting sex differentiation process may start on the 3rd day of parasitic stage. Besides, FITC was used as a marker to test dsRNA uptake efficiency of R. wuchangensis, which fluorescence intensity increased with FITC concentration after 16 h incubation, indicating this nematode can successfully ingest soaking solution via its cuticle. RNAi results revealed the sex ratio of R. wuchangensis from RNAi treated groups soaked in dsRNA of Rwucmab-3 was significantly higher than gfp dsRNA treated groups and control groups, highlighting RNAi of Rwumab-3 may hinder the development of male nematodes. These results suggest that Rwucmab-3 mainly involves in the initiation of sex differentiation and the development of male sexual dimorphism. Rwuclaf-1 and Rwuctra-2 may play vital role in nematode reproductive and developmental system. In conclusion, transcript sequences presented in this study could provide more bioinformatics resources for future studies on gene cloning and other molecular regulatory mechanism in R. wuchangensis. Moreover, identification

  4. High-throughput sequencing of small RNA transcriptome reveals salt stress regulated microRNAs in sugarcane.

    Directory of Open Access Journals (Sweden)

    Mariana Carnavale Bottino

    Full Text Available Salt stress is a primary cause of crop losses worldwide, and it has been the subject of intense investigation to unravel the complex mechanisms responsible for salinity tolerance. MicroRNA is implicated in many developmental processes and in responses to various abiotic stresses, playing pivotal roles in plant adaptation. Deep sequencing technology was chosen to determine the small RNA transcriptome of Saccharum sp cultivars grown on saline conditions. We constructed four small RNAs libraries prepared from plants grown on hydroponic culture submitted to 170 mM NaCl and harvested after 1 h, 6 hs and 24 hs. Each library was sequenced individually and together generated more than 50 million short reads. Ninety-eight conserved miRNAs and 33 miRNAs* were identified by bioinformatics. Several of the microRNA showed considerable differences of expression in the four libraries. To confirm the results of the bioinformatics-based analysis, we studied the expression of the 10 most abundant miRNAs and 1 miRNA* in plants treated with 170 mM NaCl and in plants with a severe treatment of 340 mM NaCl. The results showed that 11 selected miRNAs had higher expression in samples treated with severe salt treatment compared to the mild one. We also investigated the regulation of the same miRNAs in shoots of four cultivars grown on soil treated with 170 mM NaCl. Cultivars could be grouped according to miRNAs expression in response to salt stress. Furthermore, the majority of the predicted target genes had an inverse regulation with their correspondent microRNAs. The targets encode a wide range of proteins, including transcription factors, metabolic enzymes and genes involved in hormone signaling, probably assisting the plants to develop tolerance to salinity. Our work provides insights into the regulatory functions of miRNAs, thereby expanding our knowledge on potential salt-stressed regulated genes.

  5. Transcriptome sequencing of three Pseudo-nitzschia species reveals comparable gene sets and the presence of Nitric Oxide Synthase genes in diatoms.

    Science.gov (United States)

    Di Dato, Valeria; Musacchia, Francesco; Petrosino, Giuseppe; Patil, Shrikant; Montresor, Marina; Sanges, Remo; Ferrante, Maria Immacolata

    2015-07-20

    Diatoms are among the most diverse eukaryotic microorganisms on Earth, they are responsible for a large fraction of primary production in the oceans and can be found in different habitats. Pseudo-nitzschia are marine planktonic diatoms responsible for blooms in coastal and oceanic waters. We analyzed the transcriptome of three species, Pseudo-nitzschia arenysensis, Pseudo-nitzschia delicatissima and Pseudo-nitzschia multistriata, with different levels of genetic relatedness. These species have a worldwide distribution and the last one produces the neurotoxin domoic acid. We were able to annotate about 80% of the sequences in each transcriptome and the analysis of the relative functional annotations allowed comparison of the main metabolic pathways, pathways involved in the biosynthesis of isoprenoids (MAV and MEP pathways), and pathways putatively involved in domoic acid synthesis. The search for homologous transcripts among the target species and other congeneric species resulted in the discovery of a sequence annotated as Nitric Oxide Synthase (NOS), found uniquely in Pseudo-nitzschia multistriata. The predicted protein product contained all the domains of the canonical metazoan sequence. Putative NOS sequences were found in other available diatom datasets, supporting a role for nitric oxide as signaling molecule in this group of microalgae.

  6. SNP discovery by illumina-based transcriptome sequencing of the olive and the genetic characterization of Turkish olive genotypes revealed by AFLP, SSR and SNP markers.

    Directory of Open Access Journals (Sweden)

    Hilal Betul Kaya

    Full Text Available BACKGROUND: The olive tree (Olea europaea L. is a diploid (2n = 2x = 46 outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP discovery in olive. The objectives of this study were (1 to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2 to characterize 96 olive genotypes originating from different regions of Turkey. METHODOLOGY/PRINCIPAL FINDINGS: Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP and simple sequence repeats (SSR markers. CONCLUSIONS/SIGNIFICANCE: This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL analysis, association mapping and map-based gene cloning in the olive. High levels

  7. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery.

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata

  8. Evaluation of sequencing batch reactor performance for petrochemical wastewater treatment

    OpenAIRE

    Mina Salari; Seyed Ahmad Ataei; Fereshteh Bakhtiyari

    2017-01-01

    Sequencing batch reactor (SBR) technology has found many applications in industrial wastewater treatment in recent years. The aim of this study was to determine the optimal time for a cycle of the sequencing batch reactor (SBR) and evaluate the performance of a SBR for petrochemical wastewater treatment in that cycle time. The reactor was operated with a suspended biomass configuration under aerobic conditions. Carbon removal and operating parameters such as pH, temperature and dissolved oxyg...

  9. Hot Transcriptomics

    OpenAIRE

    Walther, Jasper; Sierocinski, Pawel; van der Oost, John

    2011-01-01

    DNA microarray technology allows for a quick and easy comparison of complete transcriptomes, resulting in improved molecular insight in fluctuations of gene expression. After emergence of the microarray technology about a decade ago, the technique has now matured and has become routine in many molecular biology laboratories. Numerous studies have been performed that have provided global transcription patterns of many organisms under a wide range of conditions. Initially, implementation of thi...

  10. Transcriptome sequencing reveals population differentiation in gene expression linked to functional traits and environmental gradients in the South African shrub Protea repens.

    Science.gov (United States)

    Akman, Melis; Carlson, Jane E; Holsinger, Kent E; Latimer, Andrew M

    2016-04-01

    Understanding the environmental and genetic mechanisms underlying locally adaptive trait variation across the ranges of species is a major focus of evolutionary biology. Combining transcriptome sequencing with common garden experiments on populations spanning geographical and environmental gradients holds promise for identifying such mechanisms. The South African shrub Protea repens displays diverse phenotypes in the wild along drought and temperature gradients. We grew plants from seeds collected at 19 populations spanning this species' range, and sequenced the transcriptomes of these plants to reveal gene pathways associated with adaptive trait variation. We related expression in co-expressed gene networks to trait phenotypes measured in the common garden and to source population climate. We found that expression in gene networks correlated with source-population environment and with plant traits. In particular, the activity of gene networks enriched for growth related pathways correlated strongly with source site minimum winter temperature and with leaf size, stem diameter and height in the garden. Other gene networks with enrichments for photosynthesis related genes showed associations with precipitation. Our results strongly suggest that this species displays population-level differences in gene expression that have been shaped by source population site climate, and that are reflected in trait variation along environmental gradients. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  11. De novo sequencing and assembly of Centella asiatica leaf transcriptome for mapping of structural, functional and regulatory genes with special reference to secondary metabolism.

    Science.gov (United States)

    Sangwan, Rajender S; Tripathi, Sandhya; Singh, Jyoti; Narnoliya, Lokesh K; Sangwan, Neelam S

    2013-08-01

    Centella asiatica (L.) Urban is an important medicinal plant and has been used since ancient times in traditional systems of medicine. C. asiatica mainly contains ursane skeleton based triterpenoid sapogenins and saponins predominantly in its leaves. This investigation employed Illumina next generation sequencing (NGS) strategy on a pool of three cDNAs from expanding leaf of C. asiatica and developed an assembled transcriptome sequence resource of the plant. The short transcript reads (STRs) generated and assembled into contigs and singletons, representing majority of the genes expressed in C. asiatica, were termed as 'tentative unique transcripts' (TUTs). The TUT dataset was analyzed with the objectives of (i) development of a transcriptome assembly of C. asiatica, and (ii) classification/characterization of the genes into categories like structural, functional, regulatory etc. based on their function. Overall, 68.49% of the 46,171,131 reads generated in the NGS process could be assembled into a total of 79,041 contigs. Gene ontology and functional annotation of sequences resulted into the identification of genes related to different sets of cellular functions including identification of genes related to primary and secondary metabolism. The wet lab validation of seventeen assembled gene sequences identified to be involved in secondary metabolic pathways and control of reactive oxygen species (ROS) was established by semi-quantitative and real time PCR (qRT-PCR). The validation also included sequencing/size matching of a set of semi-quantitative PCR amplicons with their in silico assembled contig/gene. This confirmed the appropriateness of assembling the reads and contigs. Thus, the present study constitutes the largest report to date on C. asiatica transcriptome based gene resource that may contribute substantially to the understanding of the basal biological functions and biochemical pathways of secondary metabolites as well as the transcriptional regulatory

  12. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  13. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    Science.gov (United States)

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be

  14. A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera, Demospongiae, Poecilosclerida): transcriptome sequencing, characterization and overview of the gene expression along three life cycle stages.

    Science.gov (United States)

    Pérez-Porro, A R; Navarro-Gómez, D; Uriz, M J; Giribet, G

    2013-05-01

    Sponges can be dominant organisms in many marine and freshwater habitats where they play essential ecological roles. They also represent a key group to address important questions in early metazoan evolution. Recent approaches for improving knowledge on sponge biological and ecological functions as well as on animal evolution have focused on the genetic toolkits involved in ecological responses to environmental changes (biotic and abiotic), development and reproduction. These approaches are possible thanks to newly available, massive sequencing technologies-such as the Illumina platform, which facilitate genome and transcriptome sequencing in a cost-effective manner. Here we present the first NGS (next-generation sequencing) approach to understanding the life cycle of an encrusting marine sponge. For this we sequenced libraries of three different life cycle stages of the Mediterranean sponge Crella elegans and generated de novo transcriptome assemblies. Three assemblies were based on sponge tissue of a particular life cycle stage, including non-reproductive tissue, tissue with sperm cysts and tissue with larvae. The fourth assembly pooled the data from all three stages. By aggregating data from all the different life cycle stages we obtained a higher total number of contigs, contigs with blast hit and annotated contigs than from one stage-based assemblies. In that multi-stage assembly we obtained a larger number of the developmental regulatory genes known for metazoans than in any other assembly. We also advance the differential expression of selected genes in the three life cycle stages to explore the potential of RNA-seq for improving knowledge on functional processes along the sponge life cycle. © 2013 Blackwell Publishing Ltd.

  15. Global characterization of the root transcriptome of a wild species of rice, Oryza longistaminata, by deep sequencing

    OpenAIRE

    Yang, Haiyuan; Hu, Liwei; Hurek, Thomas; Reinhold-Hurek, Barbara

    2010-01-01

    Abstract Background Oryza longistaminata, an AA genome type (2 n = 24), originates from Africa and is closely related to Asian cultivated rice (O. sativa L.). It contains various valuable traits with respect to tolerance to biotic and abiotic stress, QTLs with agronomically important traits and high ability to use nitrogen efficiently (NUE). However, only limited genomic or transcriptomic data of O. longistaminata are currently available. Results In this study we present the first comprehensi...

  16. Web services for transcriptomics

    NARCIS (Netherlands)

    Neerincx, P.

    2009-01-01

    Transcriptomics is part of a family of disciplines focussing on high throughput molecular biology experiments. In the case of transcriptomics, scientists study the expression of genes resulting in transcripts. These transcripts can either perform a biological function themselves or function as

  17. Procedural Memory Consolidation in the Performance of Brief Keyboard Sequences

    Science.gov (United States)

    Duke, Robert A.; Davis, Carla M.

    2006-01-01

    Using two sequential key press sequences, we tested the extent to which subjects' performance on a digital piano keyboard changed between the end of training and retest on subsequent days. We found consistent, significant improvements attributable to sleep-based consolidation effects, indicating that learning continued after the cessation of…

  18. Deep sequencing-based transcriptome profiling reveals comprehensive insights into the responses of Nicotiana benthamiana to beet necrotic yellow vein virus infections containing or lacking RNA4.

    Directory of Open Access Journals (Sweden)

    Huiyan Fan

    Full Text Available BACKGROUND: Beet necrotic yellow vein virus (BNYVV, encodes either four or five plus-sense single stranded RNAs and is the causal agent of sugar beet rhizomania disease, which is widely distributed in most regions of the world. BNYVV can also infect Nicotiana benthamiana systemically, and causes severe curling and stunting symptoms in the presence of RNA4 or mild symptoms in the absence of RNA4. RESULTS: Confocal laser scanning microscopy (CLSM analyses showed that the RNA4-encoded p31 protein fused to the red fluorescent protein (RFP accumulated mainly in the nuclei of N. benthamiana epidermal cells. This suggested that severe RNA4-induced symptoms might result from p31-dependent modifications of the transcriptome. Therefore, we used next-generation sequencing technologies to analyze the transcriptome profile of N. benthamiana in response to infection with different isolates of BNYVV. Comparisons of the transcriptomes of mock, BN3 (RNAs 1+2+3, and BN34 (RNAs 1+2+3+4 infected plants identified 3,016 differentially expressed transcripts, which provided a list of candidate genes that potentially are elicited in response to virus infection. Our data indicate that modifications in the expression of genes involved in RNA silencing, ubiquitin-proteasome pathway, cellulose synthesis, and metabolism of the plant hormone gibberellin may contribute to the severe symptoms induced by RNA4 from BNYVV. CONCLUSIONS: These results expand our understanding of the genetic architecture of N. benthamiana as well as provide valuable clues to identify genes potentially involved in resistance to BNYVV infection. Our global survey of gene expression changes in infected plants reveals new insights into the complicated molecular mechanisms underlying symptom development, and aids research into new strategies to protect crops against viruses.

  19. Transcriptome analysis of the Capra hircus ovary.

    Directory of Open Access Journals (Sweden)

    Zhong Quan Zhao

    Full Text Available Capra hircus is an important economic livestock animal, and therefore, it is necessary to discover transcriptome information about their reproductive performance. In this study, we performed de novo transcriptome sequencing to produce the first transcriptome dataset for the goat ovary using high-throughput sequencing technologies. The result will contribute to research on goat reproductive performance.RNA-seq analysis generated more than 38.8 million clean paired end (PE reads, which were assembled into 80,069 unigenes (mean size = 619 bp. Based on sequence similarity searches, 64,824 (60.6% genes were identified, among which 29,444 and 11,271 unigenes were assigned to Gene Ontology (GO categories and Clusters of Orthologous Groups (COG, respectively. Searches in the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG showed that 27,766 (63.4% unigenes were mapped to 258 KEGG pathways. Furthermore, we investigated the transcriptome differences of goat ovaries at two different ages using a tag-based digital gene expression system. We obtained a sequencing depth of over 5.6 million and 5.8 million tags for the two ages and identified a large number of genes associated with reproductive hormones, ovulatory cycle and follicle. Moreover, many antisense transcripts and novel transcripts were found; clusters with similar differential expression patterns, enriched GO terms and metabolic pathways were revealed for the first time with regard to the differentially expressed genes.The transcriptome provides invaluable new data for a functional genomic resource and future biological research in Capra hircus, and it is essential for the in-depth study of candidate genes in breeding programs.

  20. De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation

    Science.gov (United States)

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L.; Chang, Bill Chia-Han; Matzke, Antonius J. M.; Matzke, Marjori

    2014-01-01

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. PMID:25193496

  1. De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

    Science.gov (United States)

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori

    2014-09-04

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. Copyright © 2014 Huang et al.

  2. SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

    Science.gov (United States)

    Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

    2013-01-01

    Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of

  3. Changes in the transcriptome of the human endometrial Ishikawa cancer cell line induced by estrogen, progesterone, tamoxifen, and mifepristone (RU486 as detected by RNA-sequencing.

    Directory of Open Access Journals (Sweden)

    Karin Tamm-Rosenstein

    Full Text Available BACKGROUND: Estrogen (E2 and progesterone (P4 are key players in the maturation of the human endometrium. The corresponding steroid hormone modulators, tamoxifen (TAM and mifepristone (RU486 are widely used in breast cancer therapy and for contraception purposes, respectively. METHODOLOGY/PRINCIPAL FINDINGS: Gene expression profiling of the human endometrial Ishikawa cancer cell line treated with E2 and P4 for 3 h and 12 h, and TAM and RU486 for 12 h, was performed using RNA-sequencing. High levels of mRNA were detected for genes, including PSAP, ATP5G2, ATP5H, and GNB2L1 following E2 or P4 treatment. A total of 82 biomarkers for endometrial biology were identified among E2 induced genes, and 93 among P4 responsive genes. Identified biomarkers included: EZH2, MDK, MUC1, SLIT2, and IL6ST, which are genes previously associated with endometrial receptivity. Moreover, 98.8% and 98.6% of E2 and P4 responsive genes in Ishikawa cells, respectively, were also detected in two human mid-secretory endometrial biopsy samples. TAM treatment exhibited both antagonistic and agonistic effects of E2, and also regulated a subset of genes independently. The cell cycle regulator cyclin D1 (CCND1 showed significant up-regulation following treatment with TAM. RU486 did not appear to act as a pure antagonist of P4 and a functional analysis of RU486 response identified genes related to adhesion and apoptosis, including down-regulated genes associated with cell-cell contacts and adhesion as CTNND1, JUP, CDH2, IQGAP1, and COL2A1. CONCLUSIONS: Significant changes in gene expression by the Ishikawa cell line were detected after treatments with E2, P4, TAM, and RU486. These transcriptome data provide valuable insight into potential biomarkers related to endometrial receptivity, and also facilitate an understanding of the molecular changes that take place in the endometrium in the early stages of breast cancer treatment and contraception usage.

  4. Transcriptome sequencing reveals differences between primary and secondary hair follicle-derived dermal papilla cells of the Cashmere goat (Capra hircus.

    Directory of Open Access Journals (Sweden)

    Bing Zhu

    Full Text Available The dermal papilla is thought to establish the character and control the size of hair follicles. Inner Mongolia Cashmere goats (Capra hircus have a double coat comprising the primary and secondary hair follicles, which have dramatically different sizes and textures. The Cashmere goat is rapidly becoming a potent model for hair follicle morphogenesis research. In this study, we established two dermal papilla cell lines during the anagen phase of the hair growth cycle from the primary and secondary hair follicles and clarified the similarities and differences in their morphology and growth characteristics. High-throughput transcriptome sequencing was used to identify gene expression differences between the two dermal papilla cell lines. Many of the differentially expressed genes are involved in vascularization, ECM-receptor interaction and Wnt/β-catenin/Lef1 signaling pathways, which intimately associated with hair follicle morphogenesis. These findings provide valuable information for research on postnatal morphogenesis of hair follicles.

  5. Transcriptome sequencing reveals differences between primary and secondary hair follicle-derived dermal papilla cells of the Cashmere goat (Capra hircus).

    Science.gov (United States)

    Zhu, Bing; Xu, Teng; Yuan, Jianlong; Guo, Xudong; Liu, Dongjun

    2013-01-01

    The dermal papilla is thought to establish the character and control the size of hair follicles. Inner Mongolia Cashmere goats (Capra hircus) have a double coat comprising the primary and secondary hair follicles, which have dramatically different sizes and textures. The Cashmere goat is rapidly becoming a potent model for hair follicle morphogenesis research. In this study, we established two dermal papilla cell lines during the anagen phase of the hair growth cycle from the primary and secondary hair follicles and clarified the similarities and differences in their morphology and growth characteristics. High-throughput transcriptome sequencing was used to identify gene expression differences between the two dermal papilla cell lines. Many of the differentially expressed genes are involved in vascularization, ECM-receptor interaction and Wnt/β-catenin/Lef1 signaling pathways, which intimately associated with hair follicle morphogenesis. These findings provide valuable information for research on postnatal morphogenesis of hair follicles.

  6. Transcriptome-Wide Analysis of Botrytis elliptica Responsive microRNAs and Their Targets in Lilium Regale Wilson by High-Throughput Sequencing and Degradome Analysis

    Directory of Open Access Journals (Sweden)

    Xue Gao

    2017-05-01

    Full Text Available MicroRNAs, as master regulators of gene expression, have been widely identified and play crucial roles in plant-pathogen interactions. A fatal pathogen, Botrytis elliptica, causes the serious folia disease of lily, which reduces production because of the high susceptibility of most cultivated species. However, the miRNAs related to Botrytis infection of lily, and the miRNA-mediated gene regulatory networks providing resistance to B. elliptica in lily remain largely unexplored. To systematically dissect B. elliptica-responsive miRNAs and their target genes, three small RNA libraries were constructed from the leaves of Lilium regale, a promising Chinese wild Lilium species, which had been subjected to mock B. elliptica treatment or B. elliptica infection for 6 and 24 h. By high-throughput sequencing, 71 known miRNAs belonging to 47 conserved families and 24 novel miRNA were identified, of which 18 miRNAs were downreguleted and 13 were upregulated in response to B. elliptica. Moreover, based on the lily mRNA transcriptome, 22 targets for 9 known and 1 novel miRNAs were identified by the degradome sequencing approach. Most target genes for elliptica-responsive miRNAs were involved in metabolic processes, few encoding different transcription factors, including ELONGATION FACTOR 1 ALPHA (EF1a and TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR 2 (TCP2. Furthermore, the expression patterns of a set of elliptica-responsive miRNAs and their targets were validated by quantitative real-time PCR. This study represents the first transcriptome-based analysis of miRNAs responsive to B. elliptica and their targets in lily. The results reveal the possible regulatory roles of miRNAs and their targets in B. elliptica interaction, which will extend our understanding of the mechanisms of this disease in lily.

  7. SNP design from 454 sequencing of Podosphaera plantaginis transcriptome reveals a genetically diverse pathogen metapopulation with high levels of mixed-genotype infection.

    Directory of Open Access Journals (Sweden)

    Charlotte Tollenaere

    Full Text Available Molecular tools may greatly improve our understanding of pathogen evolution and epidemiology but technical constraints have hindered the development of genetic resources for parasites compared to free-living organisms. This study aims at developing molecular tools for Podosphaera plantaginis, an obligate fungal pathogen of Plantago lanceolata. This interaction has been intensively studied in the Åland archipelago of Finland with epidemiological data collected from over 4,000 host populations annually since year 2001.A cDNA library of a pooled sample of fungal conidia was sequenced on the 454 GS-FLX platform. Over 549,411 reads were obtained and annotated into 45,245 contigs. Annotation data was acquired for 65.2% of the assembled sequences. The transcriptome assembly was screened for SNP loci, as well as for functionally important genes (mating-type genes and potential effector proteins. A genotyping assay of 27 SNP loci was designed and tested on 380 infected leaf samples from 80 populations within the Åland archipelago. With this panel we identified 85 multilocus genotypes (MLG with uneven frequencies across the pathogen metapopulation. Approximately half of the sampled populations contain polymorphism. Our genotyping protocol revealed mixed-genotype infection within a single host leaf to be common. Mixed infection has been proposed as one of the main drivers of pathogen evolution, and hence may be an important process in this pathosystem.The developed SNP panel offers exciting research perspectives for future studies in this well-characterized pathosystem. Also, the transcriptome provides an invaluable novel genomic resource for powdery mildews, which cause significant yield losses on commercially important crops annually. Furthermore, the features that render genetic studies in this system a challenge are shared with the majority of obligate parasitic species, and hence our results provide methodological insights from SNP calling to field

  8. De Novo transcriptome analysis of Oncomelania hupensis after molluscicide treatment by next-generation sequencing: implications for biology and future snail interventions.

    Directory of Open Access Journals (Sweden)

    Qin Ping Zhao

    Full Text Available The freshwater snail Oncomelania hupensis is the only intermediate host of Schistosoma japonicum, which causes schistosomiasis. This disease is endemic in the Far East, especially in mainland China. Because niclosamide is the only molluscicide recommended by the World Health Organization, 50% wettable powder of niclosamide ethanolamine salt (WPN, the only chemical molluscicide available in China, has been widely used as the main snail control method for over two decades. Recently, a novel molluscicide derived from niclosamide, the salt of quinoid-2',5-dichloro-4'-nitro-salicylanilide (Liu Dai Shui Yang An, LDS, has been developed and proven to have the same molluscicidal effect as WPN, with lower cost and significantly lower toxicity to fish than WPN. The mechanism by which these molluscicides cause snail death is not known. Here, we report the next-generation transcriptome sequencing of O. hupensis; 145,008,667 clean reads were generated and assembled into 254,286 unigenes. Using GO and KEGG databases, 14,860 unigenes were assigned GO annotations and 4,686 unigenes were mapped to 250 KEGG pathways. Many sequences involved in key processes associated with biological regulation and innate immunity have been identified. After the snails were exposed to LDS and WPN, 254 unigenes showed significant differential expression. These genes were shown to be involved in cell structure defects and the inhibition of neurohumoral transmission and energy metabolism, which may cause snail death. Gene expression patterns differed after exposure to LDS and WPN, and these differences must be elucidated by the identification and annotation of these unknown unigenes. We believe that this first large-scale transcriptome dataset for O. hupensis will provide an opportunity for the in-depth analysis of this biomedically important freshwater snail at the molecular level and accelerate studies of the O. hupensis genome. The data elucidating the molluscicidal mechanism

  9. De Novo transcriptome analysis of Oncomelania hupensis after molluscicide treatment by next-generation sequencing: implications for biology and future snail interventions.

    Science.gov (United States)

    Zhao, Qin Ping; Xiong, Tao; Xu, Xing Jian; Jiang, Ming Sen; Dong, Hui Fen

    2015-01-01

    The freshwater snail Oncomelania hupensis is the only intermediate host of Schistosoma japonicum, which causes schistosomiasis. This disease is endemic in the Far East, especially in mainland China. Because niclosamide is the only molluscicide recommended by the World Health Organization, 50% wettable powder of niclosamide ethanolamine salt (WPN), the only chemical molluscicide available in China, has been widely used as the main snail control method for over two decades. Recently, a novel molluscicide derived from niclosamide, the salt of quinoid-2',5-dichloro-4'-nitro-salicylanilide (Liu Dai Shui Yang An, LDS), has been developed and proven to have the same molluscicidal effect as WPN, with lower cost and significantly lower toxicity to fish than WPN. The mechanism by which these molluscicides cause snail death is not known. Here, we report the next-generation transcriptome sequencing of O. hupensis; 145,008,667 clean reads were generated and assembled into 254,286 unigenes. Using GO and KEGG databases, 14,860 unigenes were assigned GO annotations and 4,686 unigenes were mapped to 250 KEGG pathways. Many sequences involved in key processes associated with biological regulation and innate immunity have been identified. After the snails were exposed to LDS and WPN, 254 unigenes showed significant differential expression. These genes were shown to be involved in cell structure defects and the inhibition of neurohumoral transmission and energy metabolism, which may cause snail death. Gene expression patterns differed after exposure to LDS and WPN, and these differences must be elucidated by the identification and annotation of these unknown unigenes. We believe that this first large-scale transcriptome dataset for O. hupensis will provide an opportunity for the in-depth analysis of this biomedically important freshwater snail at the molecular level and accelerate studies of the O. hupensis genome. The data elucidating the molluscicidal mechanism will be of great

  10. Whole transcriptome analysis of Acinetobacter baumannii assessed by RNA-sequencing reveals different mRNA expression profiles in biofilm compared to planktonic cells.

    Directory of Open Access Journals (Sweden)

    Soraya Rumbo-Feal

    Full Text Available Acinetobacterbaumannii has emerged as a dangerous opportunistic pathogen, with many strains able to form biofilms and thus cause persistent infections. The aim of the present study was to use high-throughput sequencing techniques to establish complete transcriptome profiles of planktonic (free-living and sessile (biofilm forms of A. baumannii ATCC 17978 and thereby identify differences in their gene expression patterns. Collections of mRNA from planktonic (both exponential and stationary phase cultures and sessile (biofilm cells were sequenced. Six mRNA libraries were prepared following the mRNA-Seq protocols from Illumina. Reads were obtained in a HiScanSQ platform and mapped against the complete genome to describe the complete mRNA transcriptomes of planktonic and sessile cells. The results showed that the gene expression pattern of A. baumannii biofilm cells was distinct from that of planktonic cells, including 1621 genes over-expressed in biofilms relative to stationary phase cells and 55 genes expressed only in biofilms. These differences suggested important changes in amino acid and fatty acid metabolism, motility, active transport, DNA-methylation, iron acquisition, transcriptional regulation, and quorum sensing, among other processes. Disruption or deletion of five of these genes caused a significant decrease in biofilm formation ability in the corresponding mutant strains. Among the genes over-expressed in biofilm cells were those in an operon involved in quorum sensing. One of them, encoding an acyl carrier protein, was shown to be involved in biofilm formation as demonstrated by the significant decrease in biofilm formation by the corresponding knockout strain. The present work serves as a basis for future studies examining the complex network systems that regulate bacterial biofilm formation and maintenance.

  11. Structural Correlates of Skilled Performance on a Motor Sequence Task

    Directory of Open Access Journals (Sweden)

    Christopher J Steele

    2012-10-01

    Full Text Available The brain regions functionally engaged in motor sequence performance are well established, but the structural characteristics of these regions and the fibre pathways involved have been less well studied. In addition, relatively few studies have combined multiple magnetic resonance imaging (MRI and behavioural performance measures in the same sample. Therefore, the current study used diffusion tensor imaging, probabilistic tractography, and voxel-based morphometry to determine the structural correlates of skilled motor performance. Further, we compared these findings with fMRI results in the same sample. We correlated final performance and rate of improvement measures on a temporal motor sequence task with skeletonised fractional anisotropy (FA and whole brain grey matter (GM volume. Final synchronisation performance was negatively correlated with FA in white matter underlying bilateral sensorimotor cortex – an effect that was mediated by a positive correlation with radial diffusivity. Multi-fibre tractography indicated that this region contained crossing fibres from the corticospinal tract and superior longitudinal fasciculus (SLF. The identified SLF pathway linked parietal and auditory cortical regions that have been shown to be functionally engaged in this task. Thus, we hypothesise that enhanced synchronisation performance on this task may be related to greater fibre integrity of the SLF. Rate of improvement on synchronisation was positively correlated with GM volume in cerebellar lobules HVI and V – regions that showed training-related decreases in activity in the same sample. Taken together, our results link individual differences in brain structure and function to motor sequence performance on the same task. Further, our study illustrates the utility of using multiple MR measures and analysis techniques to specify the interpretation of structural findings.

  12. Assembly and annotation of a non-model gastropod (Nerita melanotragus) transcriptome: a comparison of de novo assemblers.

    Science.gov (United States)

    Amin, Shorash; Prentis, Peter J; Gilding, Edward K; Pavasovic, Ana

    2014-08-01

    The sequencing, de novo assembly and annotation of transcriptome datasets generated with next generation sequencing (NGS) has enabled biologists to answer genomic questions in non-model species with unprecedented ease. Reliable and accurate de novo assembly and annotation of transcriptomes, however, is a critically important step for transcriptome assemblies generated from short read sequences. Typical benchmarks for assembly and annotation reliability have been performed with model species. To address the reliability and accuracy of de novo transcriptome assembly in non-model species, we generated an RNAseq dataset for an intertidal gastropod mollusc species, Nerita melanotragus, and compared the assembly produced by four different de novo transcriptome assemblers; Velvet, Oases, Geneious and Trinity, for a number of quality metrics and redundancy. Transcriptome sequencing on the Ion Torrent PGM™ produced 1,883,624 raw reads with a mean length of 133 base pairs (bp). Both the Trinity and Oases de novo assemblers produced the best assemblies based on all quality metrics including fewer contigs, increased N50 and average contig length and contigs of greater length. Overall the BLAST and annotation success of our assemblies was not high with only 15-19% of contigs assigned a putative function. We believe that any improvement in annotation success of gastropod species will require more gastropod genome sequences, but in particular an increase in mollusc protein sequences in public databases. Overall, this paper demonstrates that reliable and accurate de novo transcriptome assemblies can be generated from short read sequencers with the right assembly algorithms.

  13. Bioinformatic prediction of G protein-coupled receptor encoding sequences from the transcriptome of the foreleg, including the Haller's organ, of the cattle tick, Rhipicephalus australis.

    Directory of Open Access Journals (Sweden)

    Sergio Munoz

    Full Text Available The cattle tick of Australia, Rhipicephalus australis, is a vector for microbial parasites that cause serious bovine diseases. The Haller's organ, located in the tick's forelegs, is crucial for host detection and mating. To facilitate the development of new technologies for better control of this agricultural pest, we aimed to sequence and annotate the transcriptome of the R. australis forelegs and associated tissues, including the Haller's organ. As G protein-coupled receptors (GPCRs are an important family of eukaryotic proteins studied as pharmaceutical targets in humans, we prioritized the identification and classification of the GPCRs expressed in the foreleg tissues. The two forelegs from adult R. australis were excised, RNA extracted, and pyrosequenced with 454 technology. Reads were assembled into unigenes and annotated by sequence similarity. Python scripts were written to find open reading frames (ORFs from each unigene. These ORFs were analyzed by different GPCR prediction approaches based on sequence alignments, support vector machines, hidden Markov models, and principal component analysis. GPCRs consistently predicted by multiple methods were further studied by phylogenetic analysis and 3D homology modeling. From 4,782 assembled unigenes, 40,907 possible ORFs were predicted. Using Blastp, Pfam, GPCRpred, TMHMM, and PCA-GPCR, a basic set of 46 GPCR candidates were compiled and a phylogenetic tree was constructed. With further screening of tertiary structures predicted by RaptorX, 6 likely GPCRs emerged and the strongest candidate was classified by PCA-GPCR to be a GABAB receptor.

  14. [Comparison and analysis of SNV results: detected by transcriptome sequencing technology on initial diagnosis and remission stage of a patient with AML-M2].

    Science.gov (United States)

    Gao, Pan-Ke; Cao, Xiang-Shan; Liu, Yan

    2014-06-01

    This study was aimed to further clarify the pathogenesis of acute myeloid leukemia(AML), and to forecast new somatic mutations related with leukemia. The peripheral blood samples on initial diagnosis and remission stage of a patient with partial differentiated AML(AML-M2) were sequenced by high throughput transcriptome sequencing technology. The single nucleotide variation (SNV) which possibly related with pathogenesis of leukemia was screened through comparison of the expressed genes on initial diagnosis and after remission. The results showed that the Reads distributed uniformly in genome and covered completely, detecting most expression genes. by screening the SNV, a total of 29881 mutations were discovered, including 28113 germline mutations and 752 individual mutations. Among them, 11 acquired mutations (P < 0.05) in coding regions were got, including ZRSR1, MLXIP, TLN1, LAP3, HK3. It is concluded that the high throughput sequencing as an unbiased new method can find new tumor-related mutations. MLXIP may be a new molecular marker of AML-M2.

  15. Characterization of Fusobacterium varium Fv113-g1 isolated from a patient with ulcerative colitis based on complete genome sequence and transcriptome analysis.

    Directory of Open Access Journals (Sweden)

    Tsuyoshi Sekizuka

    Full Text Available Fusobacterium spp. present in the oral and gut flora is carcinogenic and is associated with the risk of pancreatic and colorectal cancers. Fusobacterium spp. is also implicated in a broad spectrum of human pathologies, including Crohn's disease and ulcerative colitis (UC. Here we report the complete genome sequence of Fusobacterium varium Fv113-g1 (genome size, 3.96 Mb isolated from a patient with UC. Comparative genome analyses totally suggested that Fv113-g1 is basically assigned as F. varium, in particular, it could be reclassified as notable F. varium subsp. similar to F. ulcerans because of partial shared orthologs. Compared with the genome sequences of F. varium ATCC 27725 (genome size, 3.30 Mb and other strains of Fusobacterium spp., Fv113-g1 possesses many accessary pan-genome sequences with noteworthy multiple virulence factors, including 44 autotransporters (type V secretion system, T5SS and 13 Fusobacterium adhesion (FadA paralogs involved in potential mucosal inflammation. Indeed, transcriptome analysis demonstrated that Fv113-g1-specific accessary genes, such as multiple T5SS and fadA paralogs, showed notably increased expression with D-MEM cultivation than with brain heart infusion broth. This implied that growth condition may enhance the expression of such potential virulence factors, leading to remarkable survival against other gut microorganisms and to the pathogenicity to human intestinal epithelium.

  16. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    Directory of Open Access Journals (Sweden)

    Gomes Paula

    2010-10-01

    Full Text Available Abstract Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR and their RNA transcription level by quantitative PCR (q

  17. Massive sequencing of Ulmus minor's transcriptome provides new molecular tools for a genus under the constant threat of Dutch elm disease.

    Science.gov (United States)

    Perdiguero, Pedro; Venturas, Martin; Cervera, María Teresa; Gil, Luis; Collada, Carmen

    2015-01-01

    Elms, especially Ulmus minor and U. americana, are carrying out a hard battle against Dutch elm disease (DED). This vascular wilt disease, caused by Ophiostoma ulmi and O. novo-ulmi, appeared in the twentieth century and killed millions of elms across North America and Europe. Elm breeding and conservation programmes have identified a reduced number of DED tolerant genotypes. In this study, three U. minor genotypes with contrasted levels of tolerance to DED were exposed to several biotic and abiotic stresses in order to (i) obtain a de novo assembled transcriptome of U. minor using 454 pyrosequencing, (ii) perform a functional annotation of the assembled transcriptome, (iii) identify genes potentially involved in the molecular response to environmental stress, and (iv) develop gene-based markers to support breeding programmes. A total of 58,429 putative unigenes were identified after assembly and filtering of the transcriptome. 32,152 of these unigenes showed homology with proteins identified in the genome from the most common plant model species. Well-known family proteins and transcription factors involved in abiotic, biotic or both stresses were identified after functional annotation. A total of 30,693 polymorphisms were identified in 7,125 isotigs, a large number of them corresponding to single nucleotide polymorphisms (SNPs; 27,359). In a subset randomly selected for validation, 87% of the SNPs were confirmed. The material generated may be valuable for future Ulmus gene expression, population genomics and association genetics studies, especially taking into account the scarce molecular information available for this genus and the great impact that DED has on elm populations.

  18. Massive sequencing of Ulmus minor’s transcriptome provides new molecular tools for a genus under the constant threat of Dutch elm disease

    Science.gov (United States)

    Perdiguero, Pedro; Venturas, Martin; Cervera, María Teresa; Gil, Luis; Collada, Carmen

    2015-01-01

    Elms, especially Ulmus minor and U. americana, are carrying out a hard battle against Dutch elm disease (DED). This vascular wilt disease, caused by Ophiostoma ulmi and O. novo-ulmi, appeared in the twentieth century and killed millions of elms across North America and Europe. Elm breeding and conservation programmes have identified a reduced number of DED tolerant genotypes. In this study, three U. minor genotypes with contrasted levels of tolerance to DED were exposed to several biotic and abiotic stresses in order to (i) obtain a de novo assembled transcriptome of U. minor using 454 pyrosequencing, (ii) perform a functional annotation of the assembled transcriptome, (iii) identify genes potentially involved in the molecular response to environmental stress, and (iv) develop gene-based markers to support breeding programmes. A total of 58,429 putative unigenes were identified after assembly and filtering of the transcriptome. 32,152 of these unigenes showed homology with proteins identified in the genome from the most common plant model species. Well-known family proteins and transcription factors involved in abiotic, biotic or both stresses were identified after functional annotation. A total of 30,693 polymorphisms were identified in 7,125 isotigs, a large number of them corresponding to single nucleotide polymorphisms (SNPs; 27,359). In a subset randomly selected for validation, 87% of the SNPs were confirmed. The material generated may be valuable for future Ulmus gene expression, population genomics and association genetics studies, especially taking into account the scarce molecular information available for this genus and the great impact that DED has on elm populations. PMID:26257751

  19. Massive sequencing of Ulmus minor's transcriptome provides new molecular tools for a genus under the constant threat of Dutch elm disease

    Directory of Open Access Journals (Sweden)

    Pedro ePerdiguero

    2015-07-01

    Full Text Available Elms, especially Ulmus minor and Ulmus americana, are carrying out a hard battle against Dutch elm disease (DED. This vascular wilt disease, caused by Ophiostoma ulmi and O. novo-ulmi, appeared in the twentieth century and killed millions of elms across North America and Europe. Elm breeding and conservation programmes have identified a reduced number of DED tolerant genotypes. In this study, three U. minor genotypes with contrasted levels of tolerance to DED were exposed to several biotic and abiotic stresses in order to (i obtain a de novo assembled transcriptome of U. minor using 454 pyrosequencing, (ii perform a functional annotation of the assembled transcriptome, (iii identify genes potentially involved in the molecular response to environmental stress, and (iv develop gene-based markers to support breeding programmes. A total of 58,429 putative unigenes were identified after assembly and filtering of the transcriptome. 32,152 of these unigenes showed homology with proteins identified in the genome from the most common plant model species. Well-known family proteins and transcription factors involved in abiotic, biotic or both stresses were identified after functional annotation. A total of 30,693 polymorphisms were identified in 7,125 isotigs, a large number of them corresponding to SNPs (27,359. In a subset randomly selected for validation, 87 % of the SNPs were confirmed. The material generated may be valuable for future Ulmus gene expression, population genomics and association genetics studies, especially taking into account the scarce molecular information available for this genus and the great impact that DED has on elm populations.

  20. Concordance between RNA-sequencing data and DNA microarray data in transcriptome analysis of proliferative and quiescent fibroblasts

    Science.gov (United States)

    Trost, Brett; Moir, Catherine A.; Gillespie, Zoe E.; Kusalik, Anthony; Mitchell, Jennifer A.; Eskiw, Christopher H.

    2015-01-01

    DNA microarrays and RNA sequencing (RNA-seq) are major technologies for performing high-throughput analysis of transcript abundance. Recently, concerns have been raised regarding the concordance of data derived from the two techniques. Using cDNA libraries derived from normal human foreskin fibroblasts, we measured changes in transcript abundance as cells transitioned from proliferative growth to quiescence using both DNA microarrays and RNA-seq. The internal reproducibility of the RNA-seq data was greater than that of the microarray data. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarray values were moderate. The two technologies had good agreement when considering probes with the largest (both positive and negative) fold change (FC) values. An independent technique, quantitative reverse-transcription PCR (qRT-PCR), was used to measure the FC of 76 genes between proliferative and quiescent samples, and a higher correlation was observed between the qRT-PCR data and the RNA-seq data than between the qRT-PCR data and the microarray data. PMID:26473061

  1. The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags

    DEFF Research Database (Denmark)

    Brentani, Helena; Caballero, Otávia L; Camargo, Anamaria A

    2003-01-01

    Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two milli...

  2. Population genetics, phylogenomics and hybrid speciation of Juglans in China determined from whole chloroplast genomes, transcriptomes, and genotyping-by-sequencing (GBS).

    Science.gov (United States)

    Zhao, Peng; Zhou, Hui-Juan; Potter, Daniel; Hu, Yi-Heng; Feng, Xiao-Jia; Dang, Meng; Feng, Li; Zulfiqar, Saman; Liu, Wen-Zhe; Zhao, Gui-Fang; Woeste, Keith

    2018-04-18

    Genomic data are a powerful tool for elucidating the processes involved in the evolution and divergence of species. The speciation and phylogenetic relationships among Chinese Juglans remain unclear. Here, we used results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing (GBS), and whole chloroplast genomes (Cp genome) data to infer processes of lineage formation among the five native Chinese species of the walnut genus (Juglans, Juglandaceae), a widespread, economically important group. We found that the processes of isolation generated diversity during glaciations, but that the recent range expansion of J. regia, probably from multiple refugia, led to hybrid formation both within and between sections of the genus. In southern China, human dispersal of J. regia brought it into contact with J. sigillata, which we determined to be an ecotype of J. regia that is now maintained as a landrace. In northern China, walnut hybridized with a distinct lineage of J. mandshurica to form J. hopeiensis, a controversial taxon (considered threatened) that our data indicate is a horticultural variety. Comparisons among whole chloroplast genomes and nuclear transcriptome analyses provided conflicting evidence for the timing of the divergence of Chinese Juglans taxa. J. cathayensis and J. mandshurica are poorly differentiated based our genomic data. Reconstruction of Juglans evolutionary history indicate that episodes of climatic variation over the past 4.5 to 33.80 million years, associated with glacial advances and retreats and population isolation, have shaped Chinese walnut demography and evolution, even in the presence of gene flow and introgression. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Evaluation of sequencing batch reactor performance for petrochemical wastewater treatment

    Directory of Open Access Journals (Sweden)

    Mina Salari

    2017-09-01

    Full Text Available Sequencing batch reactor (SBR technology has found many applications in industrial wastewater treatment in recent years. The aim of this study was to determine the optimal time for a cycle of the sequencing batch reactor (SBR and evaluate the performance of a SBR for petrochemical wastewater treatment in that cycle time. The reactor was operated with a suspended biomass configuration under aerobic conditions. Carbon removal and operating parameters such as pH, temperature and dissolved oxygen (DO were monitored during the wastewater treatment. The SBR was run at different cycle times and amongst the cycle times tested, the best performance was obtained with a 7 h cycle time composed of a fill time of 15min, reaction of 6 h, settling of 30 min, and withdrawal of 15 min. The SBR with the determined cycle time was used to study the treatment of wastewater with various organic loading rates (12.88 gr COD/L.d, 18.02 gr COD/L.d and 31.39 gr COD/L.d. The SBR performance was evaluated by chemical oxygen demand (COD, total solids (TS total suspended solids (TSS removal efficiencies. During the shock loading tests, the maximum COD, TS and TSS removal efficiencies were 84%, 67% and 92%, respectively.

  4. De novo sequencing of Hypericum perforatum transcriptome to identify potential genes involved in the biosynthesis of active metabolites.

    Directory of Open Access Journals (Sweden)

    Miao He

    Full Text Available BACKGROUND: Hypericum perforatum L. (St. John's wort is a medicinal plant with pharmacological properties that are antidepressant, anti-inflammatory, antiviral, anti-cancer, and antibacterial. Its major active metabolites are hypericins, hyperforins, and melatonin. However, little genetic information is available for this species, especially that concerning the biosynthetic pathways for active ingredients. METHODOLOGY/PRINCIPAL FINDINGS: Using de novo transcriptome analysis, we obtained 59,184 unigenes covering the entire life cycle of these plants. In all, 40,813 unigenes (68.86% were annotated and 2,359 were assigned to secondary metabolic pathways. Among them, 260 unigenes are involved in the production of hypericin, hyperforin, and melatonin. Another 2,291 unigenes are classified as potential Type III polyketide synthase. Our BlastX search against the AGRIS database reveals 1,772 unigenes that are homologous to 47 known Arabidopsis transcription factor families. Further analysis shows that 10.61% (6,277 of these unigenes contain 7,643 SSRs. CONCLUSION: We have identified a set of putative genes involved in several secondary metabolism pathways, especially those related to the synthesis of its active ingredients. Our results will serve as an important platform for public information about gene expression, genomics, and functional genomics in H. perforatum.