Loren A Honaas
Full Text Available Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1 proportion of reads mapping to an assembly 2 recovery of conserved, widely expressed genes, 3 N50 length statistics, and 4 the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.
Full Text Available Abstract Background Rapid advances in next-generation sequencing methods have provided new opportunities for transcriptome sequencing (RNA-Seq. The unprecedented sequencing depth provided by RNA-Seq makes it a powerful and cost-efficient method for transcriptome study, and it has been widely used in model organisms and non-model organisms to identify and quantify RNA. For non-model organisms lacking well-defined genomes, de novo assembly is typically required for downstream RNA-Seq analyses, including SNP discovery and identification of genes differentially expressed by phenotypes. Although RNA-Seq has been successfully used to sequence many non-model organisms, the results of de novo assembly from short reads can still be improved by using recent bioinformatic developments. Results In this study, we used 212.6 million pair-end reads, which accounted for 16.2 Gb, to assemble the hexaploid wheat transcriptome. Two state-of-the-art assemblers, Trinity and Trans-ABySS, which use the single and multiple k-mer methods, respectively, were used, and the whole de novo assembly process was divided into the following four steps: pre-assembly, merging different samples, removal of redundancy and scaffolding. We documented every detail of these steps and how these steps influenced assembly performance to gain insight into transcriptome assembly from short reads. After optimization, the assembled transcripts were comparable to Sanger-derived ESTs in terms of both continuity and accuracy. We also provided considerable new wheat transcript data to the community. Conclusions It is feasible to assemble the hexaploid wheat transcriptome from short reads. Special attention should be paid to dealing with multiple samples to balance the spectrum of expression levels and redundancy. To obtain an accurate overview of RNA profiling, removal of redundancy may be crucial in de novo assembly.
Full Text Available The shrimp Palaemon serratus is a coastal decapod crustacean with a high commercial value. It is harvested for human consumption. In this study, we used Illumina sequencing technology (HiSeq 2000 to sequence, assemble and annotate the transcriptome of P. serratus. RNA was isolated from muscle of adults individuals and, from a pool of larvae. A total number of 4 cDNA libraries were constructed, using the TruSeq RNA Sample Preparation Kit v2. The raw data in this study was deposited in NCBI SRA database with study accession number of SRP090769. The obtained data were subjected to de novo transcriptome assembly using Trinity software, and coding regions were predicted by TransDecoder. We used Blastp and Sma3s to annotate the identified proteins. The transcriptome data could provide some insight into the understanding of genes involved in the larval development and metamorphosis.
Full Text Available Peach (Prunus persica is one of the most popular stone fruits worldwide. Next generation sequencing (NGS has facilitated genome and transcriptome analyses of several stone fruit trees. In this study, we conducted de novo transcriptome analyses of two peach cultivars grown in Korea. Leaves of two cultivars, referred to as Jangtaek and Mibaek, were harvested and used for library preparation. The two prepared libraries were paired-end sequenced by the HiSeq2000 system. We obtained 8.14 GB and 9.62 GB sequence data from Jangtaek and Mibaek (NCBI accession numbers: SRS1056585 and SRS1056587, respectively. The Trinity program was used to assemble two transcriptomes de novo, resulting in 110,477 (Jangtaek and 136,196 (Mibaek transcripts. TransDecoder identified possible coding regions in assembled transcripts. The identified proteins were subjected to BLASTP search against NCBI's non-redundant database for functional annotation. This study provides transcriptome data for two peach cultivars, which might be useful for genetic marker development and comparative transcriptome analyses.
Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M
While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
Full Text Available Foxtail millet (Setaria italica belonging to the family Poaceae is an important millet that is widely cultivated in East Asia. Of the cultivated millets, the foxtail millet has the longest history and is one of the main food crops in South India and China. Moreover, foxtail millet is a model plant system for biofuel generation utilizing the C4 photosynthetic pathway. In this study, we carried out de novo transcriptome assembly for the foxtail millet variety Taejin collected from Korea using next-generation sequencing. We obtained a total of 8.676 GB raw data by paired-end sequencing. The raw data in this study can be available in NCBI SRA database with accession number of SRR3406552. The Trinity program was used to de novo assemble 145,332 transcripts. Using the TransDecoder program, we predicted 82,925 putative proteins. BLASTP was performed against the Swiss-Prot protein sequence database to annotate the functions of identified proteins, resulting in 20,555 potentially novel proteins. Taken together, this study provides transcriptome data for the foxtail millet variety Taejin by RNA-Seq.
Jacqueline D Farrell
Full Text Available Perennial ryegrass is a highly heterozygous outbreeding grass species used for turf and forage production. Heterozygosity can affect de-Bruijn graph assembly making de novo transcriptome assembly of species such as perennial ryegrass challenging. Creating a reference transcriptome from a homozygous perennial ryegrass genotype can circumvent the challenge of heterozygosity. The goals of this study were to perform RNA-sequencing on multiple tissues from a highly inbred genotype to develop a reference transcriptome. This was complemented with RNA-sequencing of a highly heterozygous genotype for SNP calling.De novo transcriptome assembly of the inbred genotype created 185,833 transcripts with an average length of 830 base pairs. Within the inbred reference transcriptome 78,560 predicted open reading frames were found of which 24,434 were predicted as complete. Functional annotation found 50,890 transcripts with a BLASTp hit from the Swiss-Prot non-redundant database, 58,941 transcripts with a Pfam protein domain and 1,151 transcripts encoding putative secreted peptides. To evaluate the reference transcriptome we targeted the high-affinity K+ transporter gene family and found multiple orthologs. Using the longest unique open reading frames as the reference sequence, 64,242 single nucleotide polymorphisms were found. One thousand sixty one open reading frames from the inbred genotype contained heterozygous sites, confirming the high degree of homozygosity.Our study has developed an annotated, comprehensive transcriptome reference for perennial ryegrass that can aid in determining genetic variation, expression analysis, genome annotation, and gene mapping.
Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as
Martin, Jeffrey A.; Wang, Zhong
Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalog of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies-along with some perspectives on transcriptome assembly in the near future.
Cerveau, Nicolas; Jackson, Daniel J
Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be
Vladimir A. Zhukov
Full Text Available The large size and complexity of the garden pea (Pisum sativum L. genome hamper its sequencing and the discovery of pea gene resources. Although transcriptome sequencing provides extensive information about expressed genes, some tissue-specific transcripts can only be identified from particular organs under appropriate conditions. In this study, we performed RNA sequencing of polyadenylated transcripts from young pea nodules and root tips on an Illumina GAIIx system, followed by de novo transcriptome assembly using the Trinity program. We obtained more than 58,000 and 37,000 contigs from “Nodules” and “Root Tips” assemblies, respectively. The quality of the assemblies was assessed by comparison with pea expressed sequence tags and transcriptome sequencing project data available from NCBI website. The “Nodules” assembly was compared with the “Root Tips” assembly and with pea transcriptome sequencing data from projects indicating tissue specificity. As a result, approximately 13,000 nodule-specific contigs were found and annotated by alignment to known plant protein-coding sequences and by Gene Ontology searching. Of these, 581 sequences were found to possess full CDSs and could thus be considered as novel nodule-specific transcripts of pea. The information about pea nodule-specific gene sequences can be applied for gene-based markers creation, polymorphism studies, and real-time PCR.
Full Text Available Sorghum (Sorghum bicolor, also known as great millet, is one of the most popular cultivated grass species in the world. Sorghum is frequently consumed as food for humans and animals as well as used for ethanol production. In this study, we conducted de novo transcriptome assembly for sorghum variety Taejin by next-generation sequencing, obtaining 8.748 GB of raw data. The raw data in this study can be available in NCBI SRA database with accession number of SRX1715644. Using the Trinity program, we identified 222,161 transcripts from sorghum variety Taejin. We further predicted coding regions within the assembled transcripts by the TransDecoder program, resulting in a total of 148,531 proteins. We carried out BLASTP against the Swiss-Prot protein sequence database to annotate the functions of the identified proteins. To our knowledge, this is the first transcriptome data for a sorghum variety derived from Korea, and it can be usefully applied to the generation of genetic markers.
Full Text Available Background: The grasshopper Shirakiacris shirakii is an important agricultural pest and feeds mainly on gramineous plants, thereby causing economic damage to a wide range of crops. However, genomic information on this species is extremely limited thus far, and transcriptome data relevant to insecticide resistance and pest control are also not available. Methods: The transcriptome of S. shirakii was sequenced using the Illumina HiSeq platform, and we de novo assembled the transcriptome. Results: Its sequencing produced a total of 105,408,878 clean reads, and the de novo assembly revealed 74,657 unigenes with an average length of 680 bp and N50 of 1057 bp. A total of 28,173 unigenes were annotated for the NCBI non-redundant protein sequences (Nr, NCBI non-redundant nucleotide sequences (Nt, a manually-annotated and reviewed protein sequence database (Swiss-Prot, Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. Based on the Nr annotation results, we manually identified 79 unigenes encoding cytochrome P450 monooxygenases (P450s, 36 unigenes encoding carboxylesterases (CarEs and 36 unigenes encoding glutathione S-transferases (GSTs in S. shirakii. Core RNAi components relevant to miroRNA, siRNA and piRNA pathways, including Pasha, Loquacious, Argonaute-1, Argonaute-2, Argonaute-3, Zucchini, Aubergine, enhanced RNAi-1 and Piwi, were expressed in S. shirakii. We also identified five unigenes that were homologous to the Sid-1 gene. In addition, the analysis of differential gene expressions revealed that a total of 19,764 unigenes were up-regulated and 4185 unigenes were down-regulated in larvae. In total, we predicted 7504 simple sequence repeats (SSRs from 74,657 unigenes. Conclusions: The comprehensive de novo transcriptomic data of S. shirakii will offer a series of valuable molecular resources for better studying insecticide resistance, RNAi and molecular marker discovery in the transcriptome.
Full Text Available Due to rapid advances in sequencing technology, increasing amounts of genomic and transcriptomic data are available for plant species, presenting enormous challenges for biocomputing analysis. A crucial first step for a successful transcriptomics-based study is the building of a high-quality assembly. Here, we utilized three different de novo assemblers (Trinity, Velvet, and CLC and the EvidentialGene pipeline tr2aacds to assemble two optimized transcript sets for the notorious weed species, . Two RNA sequencing (RNA-seq datasets from leaf and aboveground seedlings were processed using three assemblers, which resulted in 20 assemblies for each dataset. The contig numbers and N50 values of each assembly were compared to study the effect of read number, k-mer size, and in silico normalization on assembly output. The 20 assemblies were then processed through the tr2aacds pipeline to remove redundant transcripts and to select the transcript set with the best coding potential. Each assembly contributed a considerable proportion to the final transcript combination with the exception of the CLC-k14. Thus each assembler and parameter set did assemble better contigs for certain transcripts. The redundancy, total contig number, N50, fully assembled contig number, and transcripts related to target-site herbicide resistance were evaluated for the EvidentialGene and Trinity assemblies. Comparing the EvidentialGene set with the Trinity assembly revealed improved quality and reduced redundancy in both leaf and seedling EvidentialGene sets. The optimized transcriptome references will be useful for studying herbicide resistance in and the evolutionary process in the three allotetraploid offspring.
Gross, Stephen M; Martin, Jeffrey A; Simpson, June; Abraham-Juarez, María Jazmín; Wang, Zhong; Visel, Axel
Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.
Full Text Available For vertebrate organisms where a reference genome is not available, de novo transcriptome assembly enables a cost effective insight into the identification of tissue specific or differentially expressed genes and variation of the coding part of the genome. However, since there are a number of different tools and parameters that can be used to reconstruct transcripts, it is difficult to determine an optimal method. Here we suggest a pipeline based on (1 assessing the performance of three different assembly tools (2 using both single and multiple k-mer approaches (3 examining the influence of the number of reads used in the assembly (4 merging assemblies from different tools. We use an example dataset from the vertebrate Anas platyrhynchos domestica (Pekin duck. We find that taking a subset of data enables a robust assembly to be produced by multiple methods without the need for very high memory capacity. The use of reads mapped back to transcripts (RMBT and CEGMA (Core Eukaryotic Genes Mapping Approach provides useful metrics to determine the completeness of assembly obtained. For this dataset the use of multiple k-mers in the assembly generated a more complete assembly as measured by greater number of RMBT and CEGMA score. Merged single k-mer assemblies are generally smaller but consist of longer transcripts, suggesting an assembly consisting of fewer fragmented transcripts. We suggest that the use of a subset of reads during assembly allows the relatively rapid investigation of assembly characteristics and can guide the user to the most appropriate transcriptome for particular downstream use. Transcriptomes generated by the compared assembly methods and the final merged assembly are freely available for download at http://dx.doi.org/10.6084/m9.figshare.1032613.
Keerthi S. Guruge
Full Text Available The present study employed a next-generation sequencing method to assemble a de novo transcriptome database designed to distinguish gene expression changes exhibited by the fumonisin-producing fungus Fusarium fujikuroi when grown under ‘fumonisin-producing’ compared to ‘non-fumonisin-producing’ conditions. The raw data of this study have been deposited at DNA Data Bank of Japan (DDBJ under the accession ID DRA006146. Keywords: Fusarium fujikuroi, Fumonisin, Next-generation sequencing, Transcriptome, Gene-expression
Kim, Hui-Su; Lee, Bo-Young; Han, Jeonghoon; Lee, Young Hwan; Min, Gi-Sik; Kim, Sanghee; Lee, Jae-Seong
The whole transcriptome of the Antarctic copepod (Tigriopus kingsejongensis) was sequenced using Illumina RNA-seq. De novo assembly was performed with 64,785,098 raw reads using Trinity, which assembled into 81,653 contigs. TransDecoder found 38,250 candidate coding contigs which showed homology to other species by BLAST analysis. Functional gene annotation was performed by Gene Ontology (GO), InterProScan, and KEGG pathway analyses. Finally, we identified a number of expressed gene catalog for T. kingsejongensis that is a useful model animal for gene information-based polar research to uncover molecular mechanisms of environmental adaptation on harsh environments. In particular, we observed highly developing lipid metabolism in T. kingsejongensis directly compared to those of the Far East Pacific coast copepod Tigriopus japonicus at the transcriptome level. Copyright © 2016 Elsevier B.V. All rights reserved.
De novo transcriptome assembly was performed using CLC Genome Workbench software and a total of 54,584 unique contigs were generated, with an N50 of 1343 base pair (bp and a mean length of 829 bp. This work contributed with a specific Japanese plum skin transcriptome, providing two libraries of contrasting fruit skin color phenotype (yellow and red and increasing substantially the GB of raw data available until now for this specie.
Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong
De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads . However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.
Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong
De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads . However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.
Kono, Nobuaki; Nakamura, Hiroyuki; Ito, Yusuke; Tomita, Masaru; Arakawa, Kazuharu
With advances in high-throughput sequencing technologies, de novo transcriptome sequencing and assembly has become a cost-effective method to obtain comprehensive genetic information of a species of interest, especially in nonmodel species with large genomes such as spiders. However, high-quality RNA is essential for successful sequencing, and sample preservation conditions require careful consideration for the effective storage of field-collected samples. To this end, we report a streamlined feasibility study of various storage conditions and their effects on de novo transcriptome assembly results. The storage parameters considered include temperatures ranging from room temperature to -80°C; preservatives, including ethanol, RNAlater, TRIzol and RNAlater-ICE; and sample submersion states. As a result, intact RNA was extracted and assembly was successful when samples were preserved at low temperatures regardless of the type of preservative used. The assemblies as well as the gene expression profiles were shown to be robust to RNA degradation, when 30 million 150-bp paired-end reads are obtained. The parameters for sample storage, RNA extraction, library preparation, sequencing and in silico assembly considered in this work provide a guideline for the study of field-collected samples of spiders. © 2015 John Wiley & Sons Ltd.
Samuel E. Fox
Full Text Available Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis, 104,950 (Spain, and 177,682 (Greece transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs and >20,000 simple sequence repeat (SSR DNA sites. Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum.
Roncalli, Vittoria; Cieslak, Matthew C; Sommer, Stephanie A; Hopcroft, Russell R; Lenz, Petra H
Copepods, small planktonic crustaceans, are key links between primary producers and upper trophic levels, including many economically important fishes. In the subarctic North Pacific, the life cycle of copepods like Neocalanus flemingeri includes an ontogenetic migration to depth followed by a period of diapause (a type of dormancy) characterized by arrested development and low metabolic activity. The end of diapause is marked by the production of the first brood of eggs. Recent temperature anomalies in the North Pacific have raised concerns about potential negative effects on N. flemingeri. Since diapause is a developmental program, its progress can be tracked using through global gene expression. Thus, a reference transcriptome was developed as a first step towards physiological profiling of diapausing females using high-throughput Illumina sequencing. The de novo transcriptome, the first for this species was designed to investigate the diapause period. RNA-Seq reads were obtained for dormant to reproductive N. flemingeri females. A high quality de novo transcriptome was obtained by first assembling reads from each individual using Trinity software followed by clustering with CAP3 Assembly Program. This assembly consisted of 140,841transcripts (contigs). Bench-marking universal single-copy orthologs analysis identified 85% of core eukaryotic genes, with 79% predicted to be complete. Comparison with other calanoid transcriptomes confirmed its quality and degree of completeness. Trinity assembly of reads originating from multiple individuals led to fragmentation. Thus, the workflow applied here differed from the one recommended by Trinity, but was required to obtain a good assembly. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Canales, Javier; Bautista, Rocio; Label, Philippe; Gómez-Maldonado, Josefa; Lesur, Isabelle; Fernández-Pozo, Noe; Rueda-López, Marina; Guerrero-Fernández, Dario; Castro-Rodríguez, Vanessa; Benzekri, Hicham; Cañas, Rafael A; Guevara, María-Angeles; Rodrigues, Andreia; Seoane, Pedro; Teyssier, Caroline; Morel, Alexandre; Ehrenmann, François; Le Provost, Grégoire; Lalanne, Céline; Noirot, Céline; Klopp, Christophe; Reymond, Isabelle; García-Gutiérrez, Angel; Trontin, Jean-François; Lelu-Walter, Marie-Anne; Miguel, Celia; Cervera, María Teresa; Cantón, Francisco R; Plomion, Christophe; Harvengt, Luc; Avila, Concepción; Gonzalo Claros, M; Cánovas, Francisco M
Maritime pine (Pinus pinasterAit.) is a widely distributed conifer species in Southwestern Europe and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next-generation sequencing platforms, 454 and Illumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome assembly was validated by RT-PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and the establishment of robust single nucleotide polymorphism (SNP) and simple-sequence repeat (SSR) databases for genotyping applications and integration of translational genomics in maritime pine breeding programmes. All our data are freely available at SustainpineDB, the P. pinaster expressional database. Results reported here on the maritime pine transcriptome represent a valuable resource for future basic and applied studies on this ecological and economically important pine species. © 2013 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.
Full Text Available Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeqTM 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant nonredundant protein database, gene ontology and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts − using kyoto encyclopedia of genes and genomes database − revealed 5,606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6,767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs in 23,168 transcripts. Assembled sequences of transcriptome of A.paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analyses besides identification of key enzymes involved in the various pathways of secondary metabolism.
Cherukupalli, Neeraja; Divate, Mayur; Mittapelli, Suresh R.; Khareedu, Venkateswara R.; Vudem, Dashavantha R.
Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeq™ 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant non-redundant protein database, gene ontology, and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts—using kyoto encyclopedia of genes and genomes database—revealed 5606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs) in 23,168 transcripts. Assembled sequences of transcriptome of A. paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analysis besides identification of key enzymes involved in the various pathways of secondary metabolism. PMID:27582746
Alexey V. Beletsky
Full Text Available Monotropa hypopitys (pinesap is a non-photosynthetic obligately mycoheterotrophic plant of the family Ericaceae. It obtains the carbon and other nutrients from the roots of surrounding autotrophic trees through the associated mycorrhizal fungi. In order to understand the evolutionary changes in the plant genome associated with transition to a heterotrophic lifestyle, we performed de novo transcriptomic analysis of M. hypopitys using next-generation sequencing. We obtained the RNA-Seq data from flowers, flower bracts and roots with haustoria using Illumina HiSeq2500 platform. The raw data obtained in this study can be available in NCBI SRA database with accession number of SRP069226. A total of 10.3 GB raw sequence data were obtained, corresponding to 103,357,809 raw reads. A total of 103,025,683 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 98,349 unigens with an N50 of 1342 bp. Using the TransDecoder program, we predicted 43,505 putative proteins. 38,416 unigenes were annotated in the Swiss-Prot protein sequence database using BLASTX. The obtained transcriptomic data will be useful for further studies of the evolution of plant genomes upon transition to a non-photosynthetic lifestyle and the loss of photosynthesis-related functions.
Full Text Available Sour cherry (Prunus cerasus in the genus Prunus in the family Rosaceae is one of the most popular stone fruit trees worldwide. Of known sour cherry cultivars, the Schattenmorelle is a famous old sour cherry with a high amount of fruit production. The Schattenmorelle was selected before 1650 and described in the 1800s. This cultivar was named after gardens of the Chateau de Moreille in which the cultivar was initially found. In order to identify new genes and to develop genetic markers for sour cherry, we performed a transcriptome analysis of a sour cherry. We selected the cultivar Schattenmorelle, which is among commercially important cultivars in Europe and North America. We obtained 2.05 GB raw data from the Schattenmorelle (NCBI accession number: SRX1187170. De novo transcriptome assembly using Trinity identified 61,053 transcripts in which N50 was 611 bp. Next, we identified 25,585 protein coding sequences using TransDecoder. The identified proteins were blasted against NCBI's non-redundant database for annotation. Based on blast search, we taxonomically classified the obtained sequences. As a result, we provide the transcriptome of sour cherry cultivar Schattenmorelle using next generation sequencing.
Kumar, Sujai; Blaxter, Mark L
Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible
Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V
Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.
Weisberg, Alexandra J; Kim, Gunjune; Westwood, James H; Jelesko, John G
Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is "leaves of three, let it be", which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species.
Full Text Available In this work we studied the liver transcriptomes of two frog species, the American bullfrog (Rana (Lithobates catesbeiana and the African clawed frog (Xenopus laevis. We used high throughput RNA sequencing (RNA-seq data to assemble and annotate these transcriptomes, and compared how their baseline expression profiles change when tadpoles of the two species are exposed to thyroid hormone. We generated more than 1.5 billion RNA-seq reads in total for the two species under two conditions as treatment/control pairs. We de novo assembled these reads using Trans-ABySS to reconstruct reference transcriptomes, obtaining over 350,000 and 130,000 putative transcripts for R. catesbeiana and X. laevis, respectively. Using available genomics resources for X. laevis, we annotated over 97% of our X. laevis transcriptome contigs, demonstrating the utility and efficacy of our methodology. Leveraging this validated analysis pipeline, we also annotated the assembled R. catesbeiana transcriptome. We used the expression profiles of the annotated genes of the two species to examine the similarities and differences between the tadpole liver transcriptomes. We also compared the gene ontology terms of expressed genes to measure how the animals react to a challenge by thyroid hormone. Our study reports three main conclusions. First, de novo assembly of RNA-seq data is a powerful method for annotating and establishing transcriptomes of non-model organisms. Second, the liver transcriptomes of the two frog species, R. catesbeiana and X. laevis, show many common features, and the distribution of their gene ontology profiles are statistically indistinguishable. Third, although they broadly respond the same way to the presence of thyroid hormone in their environment, their receptor/signal transduction pathways display marked differences.
Full Text Available Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV, infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.
Haznedaroglu Berat Z
Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities
Blaxter Mark L
Full Text Available Abstract Background Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects, which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies
Gaur, Mahendra; Das, Aradhana; Sahoo, Rajesh Kumar; Kar, Basudeba; Nayak, Sanghamitra; Subudhi, Enketeswara
Zingiber officinale Rosc., known as ginger, is an Asian crop, popularly used in every household kitchen and commercially used in bakery, beverage, food and pharmaceutical industries. The present study deals with de novo transcriptome assembly of an elite ginger cultivar Suruchi by next generation sequencing methodology. From the analysis 10.9 GB raw data was obtained which can be available in NCBI accession number SAMN03761185. We identified 41,969 transcripts using Trinity RNA-Seq from ginger rhizome of Suruchi variety from Odisha. The transcript length varied from 300 bp to 8404 bp with a total length of 3,96,40,526 bp and N50 of 1251 bp. To the best of our knowledge, this is the first transcriptome data of an elite ginger cultivar Suruchi released for Odisha state of India which will help molecular biologists to develop genetic markers for identification of cultivars.
Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.
Martin, Jeffrey; Bruno, Vincent M.; Fang, Zhide; Meng, Xiandong; Blow, Matthew; Zhang, Tao; Sherlock, Gavin; Snyder, Michael; Wang, Zhong
Background: Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied. Results: Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95percent) and reconstruct full-length genes for the majority of the existing gene models (54.3percent). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics. Conclusions: These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.
Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent
The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when
Stanley Kimbung Mbandi
Full Text Available Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score base filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms.
Peng, Yu; Leung, Henry C M; Yiu, Siu-Ming; Lv, Ming-Ju; Zhu, Xin-Guang; Chin, Francis Y L
RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100), which make it very difficult to identify low-expressed isoforms. One challenge is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing correct ones with not-so-high multiplicity from low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to erroneous vertices/edges. Contributions: Unlike existing tools, which remove erroneous vertices/edges with multiplicities lower than a global threshold, we use a probabilistic progressive approach to iteratively remove them with local thresholds. This enables us to decompose the graph into disconnected components, each containing a few genes, if not a single gene, while retaining many correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperform existing assemblers in terms of sensitivity and specificity for both simulated and real data. http://www.cs.hku.hk/~alse/idba_tran. Supplementary data are available at Bioinformatics online.
Full Text Available Next generation sequencing platforms have recently been used to rapidly characterize transcriptome sequences from a number of non-model organisms. The present study compares two of the most frequently used platforms, the Roche 454-pyrosequencing and the Illumina sequencing-by-synthesis (SBS, on the same RNA sample obtained from an intertidal gastropod mollusc species, Haliotis midae. All the sequencing reads were deposited in the Short Read Archive (SRA database are retrievable under the accession number [SRR071314 (Illumina Genome Analyzer II] and [SRR1737738, SRR1737737, SRR1737735, SRR1737734 (454 GS FLX] in the SRA database of NCBI. Three transcriptomes, composed of either pure 454 or Illumina reads or a mixture of read types (Hybrid, were assembled using CLC Genomics Workbench software. Illumina assemblies performed the best de novo transcriptome characterization in terms of contig length, whereas the 454 assemblies tended to improve the complete assembly of gene transcripts. Both the Hybrid and Illumina assemblies produced longer contigs covering more of the transcriptome than 454 assemblies. However, the addition of 454 significantly increased the number of genes annotated.
Melicher, Dacotah; Torson, Alex S; Dworkin, Ian; Bowsher, Julia H
The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba. Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species. The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.
Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun
Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.
Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah
Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.
Full Text Available A long-standing goal in plant breeding has been the ability to confer apomixis to agriculturally relevant species, which would require a deeper comprehension of the molecular basis of apomictic regulatory mechanisms. Eragrostis curvula (Schrad. Nees is a perennial grass that includes both sexual and apomictic cytotypes. The availability of a reference transcriptome for this species would constitute a very important tool toward the identification of genes controlling key steps of the apomictic pathway. Here, we used Roche/454 sequencing technologies to generate reads from inflorescences of E. curvula apomictic and sexual genotypes that were de novo assembled into a reference transcriptome. Near 90% of the 49568 assembled isotigs showed sequence similarity to sequences deposited in the public databases. A gene ontology analysis categorized 27448 isotigs into at least one of the three main GO categories. We identified 11475 SSRs, and several of them were assayed in E curvula germoplasm using SSR-based primers, providing a valuable set of molecular markers that could allow direct allele selection. The differential contribution to each library of the spliced forms of several transcripts revealed the existence of several isotigs produced via alternative splicing of single genes. The reference transcriptome presented and validated in this work will be useful for the identification of a wide range of gene(s related to agronomic traits of E. curvula, including those controlling key steps of the apomictic pathway in this species, allowing the extrapolation of the findings to other plant species.
Full Text Available The adzuki bean (Vigna angularis, a member of the family Fabaceae, is widely grown in Asia, from East Asia to the Himalayas. The adzuki bean is known as an ingredient that adds sweetness to diverse desserts made in Eastern Asian countries. Libraries prepared from two V. angularis varieties referred to as Taejin Black and Taejin Red were paired-end sequenced using the Illumina HiSeq 2000 system. The raw data in this study can be available in NCBI SRA database with accession numbers of SRR3406660 and SRR3406553. After de novo transcriptome assembly using Trinity, we obtained 324,219 and 280,056 transcripts from Taejin Black and Taejin Red, respectively. We predicted a total of 238,321 proteins and 179,519 proteins for Taejin Black and Taejin Red, respectively, by the TransDecoder program. We carried out BLASTP on the predicted proteins against the Swiss-Prot protein sequence database to predict the putative functions of identified proteins. Taken together, we provide transcriptomes of two adzuki bean varieties by RNA-Seq, which might be usefully applied to generate molecular markers.
Full Text Available Next-generation technologies for determination of genomics and transcriptomics composition have a wide range of applications. Andrias davidianus, has become an endangered amphibian species of salamander endemic in China. However, there is a lack of the molecular information. In this study, we obtained the RNA-Seq data from a pool of A. davidianus tissue including spleen, liver, muscle, kidney, skin, testis, gut and heart using Illumina HiSeq 2500 platform. A total of 15,398,997,600 bp were obtained, corresponding to 102,659,984 raw reads. A total of 102,659,984 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 132,912 unigenes with an average length of 690 bp and N50 of 1263 bp. Unigenes were annotated through number of databases. These transcriptomic data of A. davidianus should open the door to molecular evolution studies based on the entire transcriptome or targeted genes of interest to sequence. The raw data in this study can be available in NCBI SRA database with accession number of SRP099564.
In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...
Full Text Available Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.
Farrell, Jacqueline Danielle; Byrne, Stephen; Paina, Cristiana
a homozygous perennial ryegrass genotype can circumvent the challenge of heterozygosity. The goals of this study were to perform RNA-sequencing on multiple tissues from a highly inbred genotype to develop a reference transcriptome. This was complemented with RNA-sequencing of a highly heterozygous genotype...... for SNP calling. Result De novo transcriptome assembly of the inbred genotype created 185,833 transcripts with an average length of 830 base pairs. Within the inbred reference transcriptome 78,560 predicted open reading frames were found of which 24,434 were predicted as complete. Functional annotation...... multiple orthologs. Using the longest unique open reading frames as the reference sequence, 64,242 single nucleotide polymorphisms were found. One thousand sixty one open reading frames from the inbred genotype contained heterozygous sites, confirming the high degree of homozygosity. Conclusion Our study...
The hawksbill sea turtle, Eretmochelys imbricata, is an endangered species of the Caribbean Colombian coast due to anthropic and natural factors that have decreased their population levels. Little is known about the genes that are involved in their immune system, sex determination, aging and others important functions. The data generated represents RNA sequencing and the first de-novo assembly of transcripts expressed in the blood of the hawksbill sea turtle. The raw FASTQ files were deposited in the NCBI SRA database with accession number SRX2653641. A total of 5.7 Gb raw sequence data were obtained, corresponding to 47,555,108 raw reads. Trinity was used to perform a first de-novo assembly, and we were able to identify 47,586 transcripts of the female hawksbill turtle transcriptome with an N50 of 1100 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species.
Full Text Available BACKGROUND: Sweet potato (Ipomoea batatas L. [Lam.] ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages. METHODOLOGY/PRINCIPAL FINDINGS: Illumina paired-end (PE RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥ 100 bp, which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified. CONCLUSIONS/SIGNIFICANCE: The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots
Full Text Available Next generation sequencing (NGS technologies have greatly changed the landscape of transcriptomic studies of non-model organisms. Since there is no reference genome available, de novo assembly methods play key roles in the analysis of these data sets. Because of the huge amount of data generated by NGS technologies for each run, many assemblers, e.g., ABySS, Velvet and Trinity, are developed based on a de Bruijn graph due to its time- and space-efficiency. However, most of these assemblers were developed initially for the Illumina/Solexa platform. The performance of these assemblers on 454 transcriptomic data is unknown. In this study, we evaluated and compared the relative performance of these de Bruijn graph based assemblers on both simulated and real 454 transcriptomic data. The results suggest that Trinity, the Illumina/Solexa-specialized transcriptomic assembler, performs the best among the multiple de Bruijn graph assemblers, comparable to or even outperforming the standard 454 assembler Newbler which is based on the overlap-layout-consensus algorithm. Our evaluation is expected to provide helpful guidance for researchers to choose assemblers when analyzing 454 transcriptomic data.
Full Text Available Abstract Background Earthworms are sensitive to toxic chemicals present in the soil and so are useful indicator organisms for soil health. Eisenia fetida are commonly used in ecotoxicological studies; therefore the assembly of a baseline transcriptome is important for subsequent analyses exploring the impact of toxin exposure on genome wide gene expression. Results This paper reports on the de novo transcriptome assembly of E. fetida using Trinity, a freely available software tool. Trinotate was used to carry out functional annotation of the Trinity generated transcriptome file and the transdecoder generated peptide sequence file along with BLASTX, BLASTP and HMMER searches and were loaded into a Sqlite3 database. To identify differentially expressed transcripts; each of the original sequence files were aligned to the de novo assembled transcriptome using Bowtie and then RSEM was used to estimate expression values based on the alignment. EdgeR was used to calculate differential expression between the two conditions, with an FDR corrected P value cut off of 0.001, this returned six significantly differentially expressed genes. Initial BLASTX hits of these putative genes included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes. Conclusion These data have been made publicly available, and to our knowledge represent the most comprehensive available transcriptome for E. fetida assembled from RNA sequencing data. This provides important groundwork for subsequent ecotoxicogenomic studies exploring the impact of the environment on global gene expression in E. fetida and other earthworm species.
Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu
Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.
Raja Sekhar Nandety
Full Text Available BACKGROUND: The glassy-winged sharpshooter Homalodisca vitripennis (Hemiptera: Cicadellidae, is a xylem-feeding leafhopper and important vector of the bacterium Xylella fastidiosa; the causal agent of Pierce's disease of grapevines. The functional complexity of the transcriptome of H. vitripennis has not been elucidated thus far. It is a necessary blueprint for an understanding of the development of H. vitripennis and for designing efficient biorational control strategies including those based on RNA interference. RESULTS: Here we elucidate and explore the transcriptome of adult H. vitripennis using high-throughput paired end deep sequencing and de novo assembly. A total of 32,803,656 paired-end reads were obtained with an average transcript length of 624 nucleotides. We assembled 32.9 Mb of the transcriptome of H. vitripennis that spanned across 47,265 loci and 52,708 transcripts. Comparison of our non-redundant database showed that 45% of the deduced proteins of H. vitripennis exhibit identity (e-value ≤1(-5 with known proteins. We assigned Gene Ontology (GO terms, Kyoto Encyclopedia of Genes and Genomes (KEGG annotations, and potential Pfam domains to each transcript isoform. In order to gain insight into the molecular basis of key regulatory genes of H. vitripennis, we characterized predicted proteins involved in the metabolism of juvenile hormone, and biogenesis of small RNAs (Dicer and Piwi sequences from the transcriptomic sequences. Analysis of transposable element sequences of H. vitripennis indicated that the genome is less expanded in comparison to many other insects with approximately 1% of the transcriptome carrying transposable elements. CONCLUSIONS: Our data significantly enhance the molecular resources available for future study and control of this economically important hemipteran. This transcriptional information not only provides a more nuanced understanding of the underlying biological and physiological mechanisms that
Full Text Available Sophora japonica Linn (Chinese Scholar Tree is a shrub species belonging to the subfamily Faboideae of the pea family Fabaceae. In this study, RNA sequencing of S. japonica transcriptome was performed to produce large expression datasets for functional genomic analysis. Approximate 86.1 million high-quality clean reads were generated and assembled de novo into 143010 unique transcripts and 57614 unigenes. The average length of unigenes was 901 bps with an N50 of 545 bps. Four public databases, including the NCBI nonredundant protein (NR, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG, and the Cluster of Orthologous Groups (COG, were used to annotate unigenes through NCBI BLAST procedure. A total of 27541 of 57614 unigenes (47.8% were annotated for gene descriptions, conserved protein domains, or gene ontology. Moreover, an interaction network of unigenes in S. japonica was predicted based on known protein-protein interactions of putative orthologs of well-studied plant genomes. The transcriptome data of S. japonica reported here represents first genome-scale investigation of gene expressions in Faboideae plants. We expect that our study will provide a useful resource for further studies on gene expression, genomics, functional genomics, and protein-protein interaction in S. japonica.
Full Text Available The hawksbill sea turtle, Eretmochelys imbricata, is an endangered species of the Caribbean Colombian coast due to anthropic and natural factors that have decreased their population levels. Little is known about the genes that are involved in their immune system, sex determination, aging and others important functions. The data generated represents RNA sequencing and the first de-novo assembly of transcripts expressed in the blood of the hawksbill sea turtle. The raw FASTQ files were deposited in the NCBI SRA database with accession number SRX2653641. A total of 5.7 Gb raw sequence data were obtained, corresponding to 47,555,108 raw reads. Trinity was used to perform a first de-novo assembly, and we were able to identify 47,586 transcripts of the female hawksbill turtle transcriptome with an N50 of 1100 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species. Keywords: Hawksbill turtle, Trinity, RNAseq, illumina, N50
Full Text Available Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR protein database. We also compared the donkey protein sequences with those of the horse (E. caballus and wild horse (E. przewalskii, and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.
Full Text Available The common Chinese cuttlefish (Sepiella japonica has been considered one of the most economically important marine Cephalopod species in East Asia and seed breeding technology has been established for massive aquaculture and stock enhancement. In the present study, we used Illumina HiSeq2000 to sequence, assemble and annotate the transcriptome of the ovary tissues of S. japonica for the first time. A total of 53,116,650 and 53,446,640 reads were obtained from the immature and matured ovaries, respectively (NCBI SRA database SRX1409472 and SRX1409473, and 70,039 contigs (N50 = 1443 bp were obtained after de novo assembling with Trinity software. Digital gene expression analysis reveals 47,288 contigs show differential expression profile and 793 contigs are highly expressed in the immature ovary, while 38 contigs are highly expressed in the mature ovary with FPKM >100. We hope that the ovarian transcriptome and those stage-enriched transcripts of S. japonica can provide some insight into the understanding of genome-wide transcriptome profile of cuttlefish gonad tissue and give useful information in cuttlefish gonad development. Keywords: Cuttlefish, Gonad development, Transcriptome
Deden Derajat Matra
Full Text Available Garcinia mangostana L. (Mangosteen, of the family Clusiaceae, is one of the economically important tropical fruits in Indonesia. In the present study, we performed de novo transcriptomic analysis of Garcinia mangostana L. through RNA-Seq technology. We obtained the raw data from 12 libraries through Ion Proton System. Clean reads of 191,735,809 were obtained from 307,634,890 raw reads. The raw data obtained in this study can be accessible in DDBJ database with accession number of DRA005014 with bioproject accession number of PRJDB5091. We obtained 268,851 transcripts as well as 155,850 unigenes, having N50 value of 555 and 433 bp, respectively. Transcript/unigene length ranged from 201 to 5916 bp. The unigenes were annotated with two main databases from NCBI and UniProtKB, respectively having annotated-sequences of 73,287 and 73,107, respectively. These transcriptomic data will be beneficial for studying transcriptome of Garcinia mangostana L.
Full Text Available Whole genome sequencing (WGS is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembled de-novo. We used RNA-seq to obtain the transcriptomic profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.
Xu, Zhifeng; Zhu, Wenyi; Liu, Yanchao; Liu, Xing; Chen, Qiushuang; Peng, Miao; Wang, Xiangzun; Shen, Guangmao; He, Lin
The carmine spider mite (CSM), Tetranychus cinnabarinus, is an important pest mite in agriculture, because it can develop insecticide resistance easily. To gain valuable gene information and molecular basis for the future insecticide resistance study of CSM, the first transcriptome analysis of CSM was conducted. A total of 45,016 contigs and 25,519 unigenes were generated from the de novo transcriptome assembly, and 15,167 unigenes were annotated via BLAST querying against current databases, including nr, SwissProt, the Clusters of Orthologous Groups (COGs), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). Aligning the transcript to Tetranychus urticae genome, the 19255 (75.45%) of the transcripts had significant (e-value insecticide resistance in arthropod were generated from CSM transcriptome, including 53 P450-, 22 GSTs-, 23 CarEs-, 1 AChE-, 7 GluCls-, 9 nAChRs-, 8 GABA receptor-, 1 sodium channel-, 6 ATPase- and 12 Cyt b genes. We developed significant molecular resources for T. cinnabarinus putatively involved in insecticide resistance. The transcriptome assembly analysis will significantly facilitate our study on the mechanism of adapting environmental stress (including insecticide) in CSM at the molecular level, and will be very important for developing new control strategies against this pest mite.
Kiruba Shankari eArun Chinnappa
Full Text Available Vicia faba (L. is an important cool-season grain legume species used widely in agriculture but also in plant physiology research, particularly as an experimental model to study transfer cell (TC development. Adaxial epidermal cells of isolated cotyledons can be induced to form functional TCs, thus providing a valuable experimental system to investigate genetic regulation of TC development. The genome of V. faba is exceedingly large (ca. 13 Gb, however, and limited genomic information is available for this species. To provide a resource for transcript profiling of epidermal TC development, we have undertaken de novo assembly of a cotyledon-enriched transcriptome map for V. faba. Illumina paired-end sequencing of total RNA pooled from different tissues and different stages, including isolated cotyledons induced to form TCs, generated 69.5M reads, of which 65.8M were used for assembly following trimming and quality control. Assembly using a De-Bruijn graph-based approach within CLC Genomics Workbench v6.1 generated 21,297 contigs, of which 80.6% were successfully annotated against GO terms. The assembly was validated against known V. faba cDNAs held in GenBank, including transcripts previously identified as being specifically expressed in epidermal cells across TC trans-differentiation. This cotyledon-enriched transcriptome map therefore provides a valuable tool for future transcript profiling of epidermal TC development, and also enriches the genetic resources available for this important legume crop species.
Full Text Available Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects, representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket, a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in
Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu
The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.
Marchant, A; Mougel, F; Almeida, C; Jacquin-Joly, E; Costa, J; Harry, M
High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
Full Text Available Anomala corpulenta is an important insect pest and can cause enormous economic losses in agriculture, horticulture and forestry. It is widely distributed in China, and both larvae and adults can cause serious damage. It is difficult to control this pest because the larvae live underground. Any new control strategy should exploit alternatives to heavily and frequently used chemical insecticides. However, little genetic research has been carried out on A. corpulenta due to the lack of genomic resources. Genomic resources could be produced by next generation sequencing technologies with low cost and in a short time. In this study, we performed de novo sequencing, assembly and characterization of the antennal transcriptome of A. corpulenta.Illumina sequencing technology was used to sequence the antennal transcriptome of A. corpulenta. Approximately 76.7 million total raw reads and about 68.9 million total clean reads were obtained, and then 35,656 unigenes were assembled. Of these unigenes, 21,463 of them could be annotated in the NCBI nr database, and, among the annotated unigenes, 11,154 and 6,625 unigenes could be assigned to GO and COG, respectively. Additionally, 16,350 unigenes could be annotated in the Swiss-Prot database, and 14,499 unigenes could map onto 258 pathways in the KEGG Pathway database. We also found 24 unigenes related to OBPs, 6 to CSPs, and in total 167 unigenes related to chemodetection. We analyzed 4 OBPs and 3CSPs sequences and their RT-qPCR results agreed well with their FPKM values.We produced the first large-scale antennal transcriptome of A. corpulenta, which is a species that has little genomic information in public databases. The identified chemodetection unigenes can promote the molecular mechanistic study of behavior in A. corpulenta. These findings provide a general sequence resource for molecular genetics research on A. corpulenta.
Schulze, Thomas T.; Ali, Jonathan M.; Bartlett, Maggie L.; McFarland, Madalyn M.; Clement, Emalie J.; Won, Harim I.; Sanford, Austin G.; Monzingo, Elyssa B.; Martens, Matthew C.; Hemsley, Ryan M.; Kumar, Sidharta; Gouin, Nicolas; Kolok, Alan S.; Davis, Paul H.
Trichomycterus areolatus is an endemic species of pencil catfish that inhabits the riffles and rapids of many freshwater ecosystems of Chile. Despite its unique adaptation to Chile's high gradient watersheds and therefore potential application in the investigation of ecosystem integrity and environmental contamination, relatively little is known regarding the molecular biology of this environmental sentinel. Here, we detail the assembly of the Trichomycterus areolatus transcriptome, a molecular resource for the study of this organism and its molecular response to the environment. RNA-Seq reads were obtained by next-generation sequencing with an Illumina® platform and processed using PRINSEQ. The transcriptome assembly was performed using TRINITY assembler. Transcriptome validation was performed by functional characterization with KOG, KEGG, and GO analyses. Additionally, differential expression analysis highlights sex-specific expression patterns, and a list of endocrine and oxidative stress related transcripts are included. PMID:27672404
Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.
Full Text Available Pomegranate (Punica granatum L. belongs to Punicaceae, and is valued for its social, ecological, economic, and aesthetic values, as well as more recently for its health benefits. The 'Tunisia' variety has softer seeds and big arils that are easily swallowed. It is a widely popular fruit; however, the molecular mechanisms of the formation of hard and soft seeds is not yet clear. We conducted a de novo assembly of the seed transcriptome in P. granatum L. and revealed differential gene expression between the soft-seed and hard-seed pomegranate varieties. A total of 35.1 Gb of data were acquired in this study, including 280,881,106 raw reads. Additionally, de novo transcriptome assembly generated 132,287 transcripts and 105,743 representative unigenes; approximately 13,805 unigenes (37.7% were longer than 1,000 bp. Using bioinformatics annotation libraries, a total of 76,806 unigenes were annotated and, among the high-quality reads, 72.63% had at least one significant match to an existing gene model. Gene expression and differentially expressed genes were analyzed. The seed formation of the two pomegranate cultivars involves lignin biosynthesis and metabolism, including some genes encoding laccase and peroxidase, WRKY, MYB, and NAC transcription factors. In the hard-seed pomegranate, lignin-related genes and cellulose synthesis-related genes were highly expressed; in soft-seed pomegranates, expression of genes related to flavonoids and programmed cell death was slightly higher. We validated selection of the identified genes using qRT-PCR. This is the first transcriptome analysis of P. granatum L. This transcription sequencing greatly enriched the pomegranate molecular database, and the high-quality SSRs generated in this study will aid the gene cloning from pomegranate in the future. It provides important insights into the molecular mechanisms underlying the formation of soft seeds in pomegranate.
Xue, Hui; Cao, Shangyin; Li, Haoxian; Zhang, Jie; Niu, Juan; Chen, Lina; Zhang, Fuhong; Zhao, Diguang
Pomegranate (Punica granatum L.) belongs to Punicaceae, and is valued for its social, ecological, economic, and aesthetic values, as well as more recently for its health benefits. The 'Tunisia' variety has softer seeds and big arils that are easily swallowed. It is a widely popular fruit; however, the molecular mechanisms of the formation of hard and soft seeds is not yet clear. We conducted a de novo assembly of the seed transcriptome in P. granatum L. and revealed differential gene expression between the soft-seed and hard-seed pomegranate varieties. A total of 35.1 Gb of data were acquired in this study, including 280,881,106 raw reads. Additionally, de novo transcriptome assembly generated 132,287 transcripts and 105,743 representative unigenes; approximately 13,805 unigenes (37.7%) were longer than 1,000 bp. Using bioinformatics annotation libraries, a total of 76,806 unigenes were annotated and, among the high-quality reads, 72.63% had at least one significant match to an existing gene model. Gene expression and differentially expressed genes were analyzed. The seed formation of the two pomegranate cultivars involves lignin biosynthesis and metabolism, including some genes encoding laccase and peroxidase, WRKY, MYB, and NAC transcription factors. In the hard-seed pomegranate, lignin-related genes and cellulose synthesis-related genes were highly expressed; in soft-seed pomegranates, expression of genes related to flavonoids and programmed cell death was slightly higher. We validated selection of the identified genes using qRT-PCR. This is the first transcriptome analysis of P. granatum L. This transcription sequencing greatly enriched the pomegranate molecular database, and the high-quality SSRs generated in this study will aid the gene cloning from pomegranate in the future. It provides important insights into the molecular mechanisms underlying the formation of soft seeds in pomegranate.
Sathyanarayana, N; Pittala, Ranjith Kumar; Tripathi, Pankaj Kumar; Chopra, Ratan; Singh, Heikham Russiachand; Belamkar, Vikas; Bhardwaj, Pardeep Kumar; Doyle, Jeff J; Egan, Ashley N
The medicinal legume Mucuna pruriens (L.) DC. has attracted attention worldwide as a source of the anti-Parkinson's drug L-Dopa. It is also a popular green manure cover crop that offers many agronomic benefits including high protein content, nitrogen fixation and soil nutrients. The plant currently lacks genomic resources and there is limited knowledge on gene expression, metabolic pathways, and genetics of secondary metabolite production. Here, we present transcriptomic resources for M. pruriens, including a de novo transcriptome assembly and annotation, as well as differential transcript expression analyses between root, leaf, and pod tissues. We also develop microsatellite markers and analyze genetic diversity and population structure within a set of Indian germplasm accessions. One-hundred ninety-one million two hundred thirty-three thousand two hundred forty-two bp cleaned reads were assembled into 67,561 transcripts with mean length of 626 bp and N50 of 987 bp. Assembled sequences were annotated using BLASTX against public databases with over 80% of transcripts annotated. We identified 7,493 simple sequence repeat (SSR) motifs, including 787 polymorphic repeats between the parents of a mapping population. 134 SSRs from expressed sequenced tags (ESTs) were screened against 23 M. pruriens accessions from India, with 52 EST-SSRs retained after quality control. Population structure analysis using a Bayesian framework implemented in fastSTRUCTURE showed nearly similar groupings as with distance-based (neighbor-joining) and principal component analyses, with most of the accessions clustering per geographical origins. Pair-wise comparison of transcript expression in leaves, roots and pods identified 4,387 differentially expressed transcripts with the highest number occurring between roots and leaves. Differentially expressed transcripts were enriched with transcription factors and transcripts annotated as belonging to secondary metabolite pathways. The M
Full Text Available Areca catechu L. belongs to the Arecaceae family which comprises many economically important palms. The palm is a source of alkaloids and carotenoids. The lack of ample genetic information in public databases has been a constraint for the genetic improvement of arecanut. To gain molecular insight into the palm, high throughput RNA sequencing and de novo assembly of arecanut leaf transcriptome was undertaken in the present study. A total 56,321,907 paired end reads of 101 bp length consisting of 11.343 Gb nucleotides were generated. De novo assembly resulted in 48,783 good quality transcripts, of which 67% of transcripts could be annotated against NCBI non – redundant database. The Gene Ontology (GO analysis with UniProt database identified 9222 biological process, 11268 molecular function and 7574 cellular components GO terms. Large scale expression profiling through Fragments per Kilobase per Million mapped reads (FPKM showed major genes involved in different metabolic pathways of the plant. Metabolic pathway analysis of the assembled transcripts identified 124 plant related pathways. The transcripts related to carotenoid and alkaloid biosynthetic pathways had more number of reads and FPKM values suggesting higher expression of these genes. The arecanut transcript sequences generated in the study showed high similarity with coconut, oil palm and date palm sequences retrieved from public domains. We also identified 6853 genic SSR regions in the arecanut. The possible primers were designed for SSR detection and this would simplify the future efforts in genetic characterization of arecanut.
Migalska, M; Sebastian, A; Konczal, M; Kotlík, P; Radwan, J
The major histocompatibility complex (MHC) plays a central role in the adaptive immune response and is the most polymorphic gene family in vertebrates. Although high-throughput sequencing has increasingly been used for genotyping families of co-amplifying MHC genes, its potential to facilitate early steps in the characterisation of MHC variation in nonmodel organism has not been fully explored. In this study we evaluated the usefulness of de novo transcriptome assembly in characterisation of MHC sequence diversity. We found that although de novo transcriptome assembly of MHC I genes does not reconstruct sequences of individual alleles, it does allow the identification of conserved regions for PCR primer design. Using the newly designed primers, we characterised MHC I sequences in the bank vole. Phylogenetic analysis of the partial MHC I coding sequence (2-4 exons) of the bank vole revealed a lack of orthology to MHC I of other Cricetidae, consistent with the high gene turnover of this region. The diversity of expressed alleles was characterised using ultra-deep sequencing of the third exon that codes for the peptide-binding region of the MHC molecule. High allelic diversity was demonstrated, with 72 alleles found in 29 individuals. Interindividual variation in the number of expressed loci was found, with the number of alleles per individual ranging from 5 to 14. Strong signatures of positive selection were found for 8 amino acid sites, most of which are inferred to bind antigens in human MHC, indicating conservation of structure despite rapid sequence evolution.
Full Text Available To enrich the molecular data of Pinus bungeana Zucc. ex Endl. and study the regulating factors of different morphology controled by apical dominance. In this study, de novo assembly of transcriptome annotation was performed for two varieties of Pinus bungeana Zucc. ex Endl. that are obviously different in morphology. More than 147 million reads were produced, which were assembled into 88,092 unigenes. Based on a similarity search, 11,692 unigenes showed significant similarity to proteins from Picea sitchensis (Bong. Carr. From this collection of unigenes, a large number of molecular markers were identified, including 2829 simple sequence repeats (SSRs. A total of 158 unigenes expressed differently between two varieties, including 98 up-regulated and 60 down-regulated unigenes. Furthermore, among the differently expressed genes (DEGs, five genes which may impact the plant morphology were further validated by reverse transcription quantitative polymerase chain reaction (RT-qPCR. The five genes related to cytokinin oxidase/dehydrogenase (CKX, two-component response regulator ARR-A family (ARR-A, plant hormone signal transduction (AHP, and MADS-box transcription factors have a close relationship with apical dominance. This new dataset will be a useful resource for future genetic and genomic studies in Pinus bungeana Zucc. ex Endl.
Full Text Available The carmine spider mite (CSM, Tetranychus cinnabarinus, is an important pest mite in agriculture, because it can develop insecticide resistance easily. To gain valuable gene information and molecular basis for the future insecticide resistance study of CSM, the first transcriptome analysis of CSM was conducted. A total of 45,016 contigs and 25,519 unigenes were generated from the de novo transcriptome assembly, and 15,167 unigenes were annotated via BLAST querying against current databases, including nr, SwissProt, the Clusters of Orthologous Groups (COGs, Kyoto Encyclopedia of Genes and Genomes (KEGG and Gene Ontology (GO. Aligning the transcript to Tetranychus urticae genome, the 19255 (75.45% of the transcripts had significant (e-value <10-5 matches to T. urticae DNA genome, 19111 sequences matched to T. urticae proteome with an average protein length coverage of 42.55%. Core Eukaryotic Genes Mapping Approach (CEGMA analysis identified 435 core eukaryotic genes (CEGs in the CSM dataset corresponding to 95% coverage. Ten gene categories that relate to insecticide resistance in arthropod were generated from CSM transcriptome, including 53 P450-, 22 GSTs-, 23 CarEs-, 1 AChE-, 7 GluCls-, 9 nAChRs-, 8 GABA receptor-, 1 sodium channel-, 6 ATPase- and 12 Cyt b genes. We developed significant molecular resources for T. cinnabarinus putatively involved in insecticide resistance. The transcriptome assembly analysis will significantly facilitate our study on the mechanism of adapting environmental stress (including insecticide in CSM at the molecular level, and will be very important for developing new control strategies against this pest mite.
Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu
Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to
Full Text Available Locusts exhibit remarkable density-dependent phenotype (phase changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to identify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4(th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development
Qiu, Zhongying; Liu, Fei; Lu, Huimeng; Huang, Yuan
The pygmy grasshopper Tetrix japonica is a common insect distributed throughout the world, and it has the potential for use in studies of body colour polymorphism, genomics and the biology of Tetrigoidea (Insecta: Orthoptera). However, limited biological information is available for this insect. Here, we conducted a de novo transcriptome study of adult and larval T. japonica to provide a better understanding of its gene expression and develop genomic resources for future work. We sequenced and explored the characteristics of the de novo transcriptome of T. japonica using Illumina HiSeq 2000 platform. A total of 107 608 206 paired-end clean reads were assembled into 61 141 unigenes using the trinity software; the mean unigene size was 771 bp, and the N50 length was 1238 bp. A total of 29 225 unigenes were functionally annotated to the NCBI nonredundant protein sequences (Nr), NCBI nonredundant nucleotide sequences (Nt), a manually annotated and reviewed protein sequence database (Swiss-Prot), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. A large number of putative genes that are potentially involved in pigment pathways, juvenile hormone (JH) metabolism and signalling pathways were identified in the T. japonica transcriptome. Additionally, 165 769 and 156 796 putative single nucleotide polymorphisms occurred in the adult and larvae transcriptomes, respectively, and a total of 3162 simple sequence repeats were detected in this assembly. This comprehensive transcriptomic data for T. japonica will provide a usable resource for gene predictions, signalling pathway investigations and molecular marker development for this species and other pygmy grasshoppers. © 2016 John Wiley & Sons Ltd.
Full Text Available Cucurbita pepo (squash, pumpkin, gourd, a worldwide-cultivated vegetable of American origin, is extremely variable in fruit characteristics. However, the information associated with genes and genetic markers for pumpkin is very limited. In order to identify new genes and to develop genetic markers, we performed a transcriptome analysis (RNA-Seq of two contrasting pumpkin cultivars. Leaves and female flowers of cultivars, ‘Big Moose’ with large round fruits and ‘Munchkin’ with small round fruits, were harvested for total RNA extraction. We obtained a total of 6 GB (Big Moose; http://www.ncbi.nlm.nih.gov/Traces/sra/?run=SRR3056882 and 5 GB (Munchkin; http://www.ncbi.nlm.nih.gov/Traces/sra/?run=SRR3056883 sequence data (NCBI SRA database SRX1502732 and SRX1502735, respectively, which correspond to 18,055,786 and 14,824,292 150-base reads. After quality assessment, the clean sequences where 17,995,932 and 14,774,486 respectively. The numbers of total transcripts for ‘Big Moose’ and ‘Munchkin’ were 84,727 and 68,051, respectively. TransDecoder identified possible coding regions in assembled transcripts. This study provides transcriptome data for two contrasting pumpkin cultivars, which might be useful for genetic marker development and comparative transcriptome analyses. Keywords: RNA-Seq, Pumpkin, Contrasting cultivars, Cucurbita pepo
Jayakumar, Vasanthan; Damodaran, Anand C.; Rao, Sudha Narayana; Katta, Mohan A. V. S. K.; Gopinathan, Sreeja; Sarma, Santosh Prasad; Senthilkumar, Vanitha; Niranjan, Vidya; Gopinath, Ashok; Mugasimangalam, Raja C.
Herbal remedies are increasingly being recognised in recent years as alternative medicine for a number of diseases including cancer. Curcuma longa L., commonly known as turmeric is used as a culinary spice in India and in many Asian countries has been attributed to lower incidences of gastrointestinal cancers. Curcumin, a secondary metabolite isolated from the rhizomes of this plant has been shown to have significant anticancer properties, in addition to antimalarial and antioxidant effects. We sequenced the transcriptome of the rhizome of the 3 varieties of Curcuma longa L. using Illumina reversible dye terminator sequencing followed by de novo transcriptome assembly. Multiple databases were used to obtain a comprehensive annotation and the transcripts were functionally classified using GO, KOG and PlantCyc. Special emphasis was given for annotating the secondary metabolite pathways and terpenoid biosynthesis pathways. We report for the first time, the presence of transcripts related to biosynthetic pathways of several anti-cancer compounds like taxol, curcumin, and vinblastine in addition to anti-malarial compounds like artemisinin and acridone alkaloids, emphasizing turmeric's importance as a highly potent phytochemical. Our data not only provides molecular signatures for several terpenoids but also a comprehensive molecular resource for facilitating deeper insights into the transcriptome of C. longa. PMID:23468859
Ramasamy S Annadurai
Full Text Available Herbal remedies are increasingly being recognised in recent years as alternative medicine for a number of diseases including cancer. Curcuma longa L., commonly known as turmeric is used as a culinary spice in India and in many Asian countries has been attributed to lower incidences of gastrointestinal cancers. Curcumin, a secondary metabolite isolated from the rhizomes of this plant has been shown to have significant anticancer properties, in addition to antimalarial and antioxidant effects. We sequenced the transcriptome of the rhizome of the 3 varieties of Curcuma longa L. using Illumina reversible dye terminator sequencing followed by de novo transcriptome assembly. Multiple databases were used to obtain a comprehensive annotation and the transcripts were functionally classified using GO, KOG and PlantCyc. Special emphasis was given for annotating the secondary metabolite pathways and terpenoid biosynthesis pathways. We report for the first time, the presence of transcripts related to biosynthetic pathways of several anti-cancer compounds like taxol, curcumin, and vinblastine in addition to anti-malarial compounds like artemisinin and acridone alkaloids, emphasizing turmeric's importance as a highly potent phytochemical. Our data not only provides molecular signatures for several terpenoids but also a comprehensive molecular resource for facilitating deeper insights into the transcriptome of C. longa.
Annadurai, Ramasamy S; Neethiraj, Ramprasad; Jayakumar, Vasanthan; Damodaran, Anand C; Rao, Sudha Narayana; Katta, Mohan A V S K; Gopinathan, Sreeja; Sarma, Santosh Prasad; Senthilkumar, Vanitha; Niranjan, Vidya; Gopinath, Ashok; Mugasimangalam, Raja C
Herbal remedies are increasingly being recognised in recent years as alternative medicine for a number of diseases including cancer. Curcuma longa L., commonly known as turmeric is used as a culinary spice in India and in many Asian countries has been attributed to lower incidences of gastrointestinal cancers. Curcumin, a secondary metabolite isolated from the rhizomes of this plant has been shown to have significant anticancer properties, in addition to antimalarial and antioxidant effects. We sequenced the transcriptome of the rhizome of the 3 varieties of Curcuma longa L. using Illumina reversible dye terminator sequencing followed by de novo transcriptome assembly. Multiple databases were used to obtain a comprehensive annotation and the transcripts were functionally classified using GO, KOG and PlantCyc. Special emphasis was given for annotating the secondary metabolite pathways and terpenoid biosynthesis pathways. We report for the first time, the presence of transcripts related to biosynthetic pathways of several anti-cancer compounds like taxol, curcumin, and vinblastine in addition to anti-malarial compounds like artemisinin and acridone alkaloids, emphasizing turmeric's importance as a highly potent phytochemical. Our data not only provides molecular signatures for several terpenoids but also a comprehensive molecular resource for facilitating deeper insights into the transcriptome of C. longa.
Full Text Available BACKGROUND: The Indo-Pacific humpback dolphin (Sousa chinensis, a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. PRINCIPAL FINDINGS: We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5, respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. CONCLUSION: This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.
Gui, Duan; Jia, Kuntong; Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng
The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-valueIndo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.
Full Text Available The molecular mechanisms that drive the development of the endangered fossil fish species Acipenser baeri are difficult to study due to the lack of genomic data. Recent advances in sequencing technologies and the reducing cost of sequencing offer exclusive opportunities for exploring important molecular mechanisms underlying specific biological processes. This manuscript describes the large scale sequencing and analyses of mRNA from Acipenser baeri collected at five development time points using the Illumina Hiseq2000 platform. The sequencing reads were de novo assembled and clustered into 278167 unigenes, of which 57346 (20.62% had 45837 known homologues proteins in Uniprot protein databases while 11509 proteins matched with at least one sequence of assembled unigenes. The remaining 79.38% of unigenes could stand for non-coding unigenes or unigenes specific to A. baeri. A number of 43062 unigenes were annotated into functional categories via Gene Ontology (GO annotation whereas 29526 unigenes were associated with 329 pathways by mapping to KEGG database. Subsequently, 3479 differentially expressed genes were scanned within developmental stages and clustered into 50 gene expression profiles. Genes preferentially expressed at each stage were also identified. Through GO and KEGG pathway enrichment analysis, relevant physiological variations during the early development of A. baeri could be better cognized. Accordingly, the present study gives insights into the transcriptome profile of the early development of A. baeri, and the information contained in this large scale transcriptome will provide substantial references for A. baeri developmental biology and promote its aquaculture research.
Full Text Available Despite the ecological and economic importance of European beech (Fagus sylvatica L. genomic resources of this species are still limited. This hampers an understanding of the molecular basis of adaptation to stress. Since beech will most likely be threatened by the consequences of climate change, an understanding of adaptive processes to climate change-related drought stress is of major importance. Here, we used RNA-seq to provide the first drought stress-related transcriptome of beech. In a drought stress trial with beech saplings, 50 samples were taken for RNA extraction at five points in time during a soil desiccation experiment. De novo transcriptome assembly and analysis of differential gene expression revealed 44,335 contigs, and 662 differentially expressed genes between the stress and normally watered control group. Gene expression was specific to the different time points, and only five genes were significantly differentially expressed between the stress and control group on all five sampling days. GO term enrichment showed that mostly genes involved in lipid- and homeostasis-related processes were upregulated, whereas genes involved in oxidative stress response were downregulated in the stressed seedlings. This study gives first insights into the genomic drought stress response of European beech, and provides new genetic resources for adaptation research in this species.
Full Text Available Guinea grass (Panicum maximum Jacq. is a tropical African grass often used to feed beef cattle, which is an important economic activity in Brazil. Brazil is the leader in global meat exportation because of its exclusively pasture-raised bovine herds. Guinea grass also has potential uses in bioenergy production due to its elevated biomass generation through the C4 photosynthesis pathway. We generated approximately 13 Gb of data from Illumina sequencing of P. maximum leaves. Four different genotypes were sequenced, and the combined reads were assembled de novo into 38,192 unigenes and annotated; approximately 63% of the unigenes had homology to other proteins in the NCBI non-redundant protein database. Functional classification through COG (Clusters of Orthologous Groups, GO (Gene Ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes analyses showed that the unigenes from Guinea grass leaves are involved in a wide range of biological processes and metabolic pathways, including C4 photosynthesis and lignocellulose generation, which are important for cattle grazing and bioenergy production. The most abundant transcripts were involved in carbon fixation, photosynthesis, RNA translation and heavy metal cellular homeostasis. Finally, we identified a number of potential molecular markers, including 5,035 microsatellites (SSRs and 346,456 single nucleotide polymorphisms (SNPs. To the best of our knowledge, this is the first study to characterize the complete leaf transcriptome of P. maximum using high-throughput sequencing. The biological information provided here will aid in gene expression studies and marker-assisted selection-based breeding research in tropical grasses.
Chu, Zongli; Chen, Junying; Sun, Junyan; Dong, Zhongdong; Yang, Xia; Wang, Ying; Xu, Haixia; Zhang, Xiaoke; Chen, Feng; Cui, Dangqun
During asexual reproduction the embryogenic callus can differentiate into a new plantlet, offering great potential for fostering in vitro culture efficiency in plants. The immature embryos (IMEs) of wheat (Triticum aestivum L.) are more easily able to generate embryogenic callus than mature embryos (MEs). To understand the molecular process of embryogenic callus formation in wheat, de novo transcriptome sequencing was used to generate transcriptome sequences from calli derived from IMEs and MEs after 3d, 6d, or 15d of culture (DC). In total, 155 million high quality paired-end reads were obtained from the 6 cDNA libraries. Our de novo assembly generated 142,221 unigenes, of which 59,976 (42.17%) were annotated with a significant Blastx against nr, Pfam, Swissprot, KOG, KEGG, GO and COG/KOG databases. Comparative transcriptome analysis indicated that a total of 5194 differentially expressed genes (DEGs) were identified in the comparisons of IME vs. ME at the three stages, including 3181, 2085 and 1468 DEGs at 3, 6 and 15 DC, respectively. Of them, 283 overlapped in all the three comparisons. Furthermore, 4731 DEGs were identified in the comparisons between stages in IMEs and MEs. Functional analysis revealed that 271transcription factor (TF) genes (10 overlapped in all 3 comparisons of IME vs. ME) and 346 somatic embryogenesis related genes (SSEGs; 35 overlapped in all 3 comparisons of IME vs. ME) were differentially expressed in at least one comparison of IME vs. ME. In addition, of the 283 overlapped DEGs in the 3 comparisons of IME vs. ME, excluding the SSEGs and TFs, 39 possessed a higher rate of involvement in biological processes relating to response to stimuli, in multi-organism processes, reproductive processes and reproduction. Furthermore, 7 were simultaneously differentially expressed in the 2 comparisons between the stages in IMEs, but not MEs, suggesting that they may be related to embryogenic callus formation. The expression levels of genes, which
Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii: The Identification of Genes and Markers Associated with Reproduction
Full Text Available The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.
Bharat Bhusan Patnaik
Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata
Donald M. Bryant
Full Text Available Mammals have extremely limited regenerative capabilities; however, axolotls are profoundly regenerative and can replace entire limbs. The mechanisms underlying limb regeneration remain poorly understood, partly because the enormous and incompletely sequenced genomes of axolotls have hindered the study of genes facilitating regeneration. We assembled and annotated a de novo transcriptome using RNA-sequencing profiles for a broad spectrum of tissues that is estimated to have near-complete sequence information for 88% of axolotl genes. We devised expression analyses that identified the axolotl orthologs of cirbp and kazald1 as highly expressed and enriched in blastemas. Using morpholino anti-sense oligonucleotides, we find evidence that cirbp plays a cytoprotective role during limb regeneration whereas manipulation of kazald1 expression disrupts regeneration. Our transcriptome and annotation resources greatly complement previous transcriptomic studies and will be a valuable resource for future research in regenerative biology.
Huth, Troy J; Place, Sean P
The notothenioids comprise a diverse group of fishes that rapidly radiated after isolation by the Antarctic Circumpolar Current approximately 14-25 million years ago. Given that evolutionary adaptation has led to finely tuned traits with narrow physiological limits in these organisms, this system provides a unique opportunity to examine physiological trade-offs and limits of adaptive responses to environmental perturbation. As such, notothenioids have a rich history with respect to studies attempting to understand the vulnerability of polar ecosystems to the negative impacts associated with global climate change. Unfortunately, despite being a model system for understanding physiological adaptations to extreme environments, we still lack fundamental molecular tools for much of the Nototheniidae family. Specimens of the emerald notothen, Trematomus bernacchii, were acclimated for 28 days in flow-through seawater tanks maintained near ambient seawater temperatures (-1.5°C) or at +4°C. Following acclimation, tissue specific cDNA libraries for liver, gill and brain were created by pooling RNA from n = 5 individuals per temperature treatment. The tissue specific libraries were bar-coded and used for 454 pyrosequencing, which yielded over 700 thousand sequencing reads. A de novo assembly and annotation of these reads produced a functional transcriptome library of T. bernacchii containing 30,107 unigenes, 13,003 of which possessed significant homology to a known protein product. Digital gene expression analysis of these extremely cold adapted fish reinforced the loss of an inducible heat shock response and allowed the preliminary exploration into other elements of the cellular stress response. Preliminary exploration of the transcriptome of T. bernacchii under elevated temperatures enabled a semi-quantitative comparison to prior studies aimed at characterizing the thermal response of this endemic fish whose size, abundance and distribution has established it as a
Lu, Qi-Huan; Wang, Ya-Qi; Song, Jin-Nan; Yang, Hong-Bing
Common buckwheat (F. esculentum), annually herbaceous crop, is prevalent in people's daily life with the increasing development of economics. Compared with wheat, it is highly praised with high content of rutin and flavonoid. Common buckwheat is recognized as healthy food with good taste, and the product price of which such as noodles, flour, bread and so on are higher than wheat, and the seeds of which are bigger than that of tartary buckwheat, so if common buckwheat are planted more widely, people will spend less money on this healthy and delicious food. However, soil salinity has been a giant problem for agriculture production. The cultivation of salt tolerant crop varieties is an effective way to make full use of saline alkali land, and the highest salinity that the common buckwheat can sow is at 6.0%, so we chose 100 mM as the concentration of NaCl for treatment. Then we conducted transcriptome comparison between control and treatment groups. Potential regulatory genes related salt stress in common buckwheat were identified. A total of 29.36 million clean reads were produced via an illumina sequencing approach. We de novo assembled these reads into a transcriptome dataset containing 43,772 unigenes with N50 length of 1778 bp. A total of 26,672 unigenes could be found matches in public databases. GO, KEGG and Swiss-Prot classification suggested the enrichment of these unigenes in 47 sub-categories, 25 KOG and 129 pathways, respectively. We got 385 differentially expressed genes (DEGs) after comparing the transcriptome data between salt treatment and control groups. There are some genes encoded for responsing to stimulus, cell killing, metabolic process, signaling, multi-organism process, growth and cellular process might be relevant to salt stress in common buckwheat, which will provide a valuable references for the study on mechanism of salt tolerance and will be used as a genetic information for cultivating strong salt tolerant common buckwheat varieties in
Ranjan, Aashish; Ichihashi, Yasunori; Farhi, Moran; Zumstein, Kristina; Townsley, Brad; David-Schwartz, Rakefet; Sinha, Neelima R
Parasitic flowering plants are one of the most destructive agricultural pests and have major impact on crop yields throughout the world. Being dependent on finding a host plant for growth, parasitic plants penetrate their host using specialized organs called haustoria. Haustoria establish vascular connections with the host, which enable the parasite to steal nutrients and water. The underlying molecular and developmental basis of parasitism by plants is largely unknown. In order to investigate the process of parasitism, RNAs from different stages (i.e. seed, seedling, vegetative strand, prehaustoria, haustoria, and flower) were used to de novo assemble and annotate the transcriptome of the obligate plant stem parasite dodder (Cuscuta pentagona). The assembled transcriptome was used to dissect transcriptional dynamics during dodder development and parasitism and identified key gene categories involved in the process of plant parasitism. Host plant infection is accompanied by increased expression of parasite genes underlying transport and transporter categories, response to stress and stimuli, as well as genes encoding enzymes involved in cell wall modifications. By contrast, expression of photosynthetic genes is decreased in the dodder infective stages compared with normal stem. In addition, genes relating to biosynthesis, transport, and response of phytohormones, such as auxin, gibberellins, and strigolactone, were differentially expressed in the dodder infective stages compared with stems and seedlings. This analysis sheds light on the transcriptional changes that accompany plant parasitism and will aid in identifying potential gene targets for use in controlling the infestation of crops by parasitic weeds. © 2014 American Society of Plant Biologists. All Rights Reserved.
Full Text Available De novo assembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.
Rhee, Sun-Ju; Kwon, Taehyung; Seo, Minseok; Jang, Yoon Jeong; Sim, Tae Yong; Cho, Seoae; Han, Sang-Wook; Lee, Gung Pyo
The whole-genome sequence of watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai), a valuable horticultural crop worldwide, was released in 2013. Here, we compared a de novo-based approach (DBA) to a reference-based approach (RBA) using RNA-seq data, to aid in efforts to improve the annotation of the watermelon reference genome and to obtain biological insight into male-sterility in watermelon. We applied these techniques to available data from two watermelon lines: the male-sterile line DAH3615-MS and the male-fertile line DAH3615. Using DBA, we newly annotated 855 watermelon transcripts, and found gene functional clusters predicted to be related to stimulus responses, nucleic acid binding, transmembrane transport, homeostasis, and Golgi/vesicles. Among the DBA-annotated transcripts, 138 de novo-exclusive differentially-expressed genes (DEDEGs) related to male sterility were detected. Out of 33 randomly selected newly annotated transcripts and DEDEGs, 32 were validated by RT-qPCR. This study demonstrates the usefulness and reliability of the de novo transcriptome assembly in watermelon, and provides new insights for researchers exploring transcriptional blueprints with regard to the male sterility.
Hornett Emily A
Full Text Available Abstract Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.
Juan A. Galarza
Full Text Available In this paper we report the public availability of transcriptome resources for the aposematic wood tiger moth (Parasemia plantaginis. A comprehensive assembly methods, quality statistics, and annotation are provided. This reference transcriptome may serve as a useful resource for investigating functional gene activity in aposematic Lepidopteran species. All data is freely available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena under study accession number: PRJEB14172.
Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi
Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.
Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori
Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. Copyright © 2014 Huang et al.
Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514
Full Text Available Pigeonpea [Cajanus cajan (L. Millsp.] is a heat and drought resilient legume crop grown mostly in Asia and Africa. Pigeonpea is affected by various biotic (diseases and insect pests and abiotic stresses (salinity and water logging which limit the yield potential of this crop. However, resistance to all these constraints is not readily available in the cultivated genotypes and some of the wild relatives have been found to withstand these resistances. Thus, the utilization of crop wild relatives (CWR in pigeonpea breeding has been effective in conferring resistance, quality and breeding efficiency traits to this crop. Bud and leaf tissue of Cajanus scarabaeoides, a wild relative of pigeon pea were used for transcriptome profiling. Approximately 30 million clean reads filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (3.02 gigabase pairs were generated by Illumina paired-end RNA-seq technology. All of these clean reads were pooled and assembled de novo into 1,17,007 transcripts using the Trinity. Finally, a total of 98,664 unigenes were derived with mean length of 396 bp and N50 values of 1393. The assembly produced significant mapping results (73.68% in BLASTN searches of the Glycine max CDS sequence database (Ensembl. Further, uniprot database of Viridiplantae was used for unigene annotation; 81,799 of 98,664 (82.90% unigenes were finally annotated with gene descriptions or conserved protein domains. Further, a total of 23,475 SSRs were identified in 27,321 unigenes. This data will provide useful information for mining of functionally important genes and SSR markers for pigeonpea improvement.
Full Text Available The Canadian beaver (Castor canadensis is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 × long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 × and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.
Safikhani, Zhaleh; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz
Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp. © 2013.
Bansal, Ankush; Salaria, Mehul; Sharma, Tashil; Stobdan, Tsering; Kant, Anil
Sea buckthorn is a dioecious medicinal plant found at high altitude. The plant has both male and female reproductive organs in separate individuals. In this article, whole transcriptome de novo assemblies of male and female flower bud samples were carried out using Illumina NextSeq 500 platform to determine the role of the genes involved in sex determination. Moreover, genes with differential expression in male and female transcriptomes were identified to understand the underlying sex determination mechanism. The current study showed 63,904 and 62,272 coding sequences (CDS) in female and male transcriptome data sets, respectively. 16,831 common CDS were screened out from both transcriptomes, out of which 625 were upregulated and 491 were found to be downregulated. To understand the potential regulatory roles of differentially expressed genes in metabolic networks and biosynthetic pathways: KEGG mapping, gene ontology, and co-expression network analysis were performed. Comparison with Flowering Interactive Database (FLOR-ID) resulted in eight differentially expressed genes viz. CHD3-type chromatin-remodeling factor PICKLE ( PKL ), phytochrome-associated serine/threonine-protein phosphatase ( FYPP ), protein TOPLESS ( TPL ), sensitive to freezing 6 ( SFR6 ), lysine-specific histone demethylase 1 homolog 1 ( LDL1 ), pre-mRNA-processing-splicing factor 8A ( PRP8A ), sucrose synthase 4 ( SUS4 ), ubiquitin carboxyl-terminal hydrolase 12 ( UBP12 ), known to be broadly involved in flowering, photoperiodism, embryo development, and cold response pathways. Male and female flower bud transcriptome data of Sea buckthorn may provide comprehensive information at genomic level for the identification of genetic regulation involved in sex determination.
Baldé, Aladje; Neves, Dina; García-Breijo, Francisco J; Pais, Maria Salomé; Cravador, Alfredo
Phlomis plants are a source of biological active substances with potential applications in the control of phytopathogens. Phlomis purpurea (Lamiaceae) is autochthonous of southern Iberian Peninsula and Morocco and was found to be resistant to Phytophthora cinnamomi. Phlomis purpurea has revealed antagonistic effect in the rhizosphere of Quercus suber and Q. ilex against P. cinnamomi. Phlomis purpurea roots produce bioactive compounds exhibiting antitumor and anti-Phytophthora activities with potential to protect susceptible plants. Although these important capacities of P. purpurea have been demonstrated, there is no transcriptomic or genomic information available in public databases that could bring insights on the genes underlying this anti-oomycete activity. Using Illumina technology we obtained a de novo assembly of P. purpurea transcriptome and differential transcript abundance to identify putative defence related genes in challenged versus non-challenged plants. A total of 1,272,600,000 reads from 18 cDNA libraries were merged and assembled into 215,739 transcript contigs. BLASTX alignment to Nr NCBI database identified 124,386 unique annotated transcripts (57.7%) with significant hits. Functional annotation identified 83,550 out of 124,386 unique transcripts, which were mapped to 141 pathways. 39% of unigenes were assigned GO terms. Their functions cover biological processes, cellular component and molecular functions. Genes associated with response to stimuli, cellular and primary metabolic processes, catalytic and transporter functions were among those identified. Differential transcript abundance analysis using DESeq revealed significant differences among libraries depending on post-challenge times. Comparative cyto-histological studies of P. purpurea roots challenged with P. cinnamomi zoospores and controls revealed specific morphological features (exodermal strips and epi-cuticular layer), that may provide a constitutive efficient barrier against
J. W. Clouse
Full Text Available Amaranth ( L. is an emerging pseudocereal native to the New World that has garnered increased attention in recent years because of its nutritional quality, in particular its seed protein and more specifically its high levels of the essential amino acid lysine. It belongs to the Amaranthaceae family, is an ancient paleopolyploid that shows disomic inheritance (2 = 32, and has an estimated genome size of 466 Mb. Here we present a high-quality draft genome sequence of the grain amaranth. The genome assembly consisted of 377 Mb in 3518 scaffolds with an N of 371 kb. Repetitive element analysis predicted that 48% of the genome is comprised of repeat sequences, of which -like elements were the most commonly classified retrotransposon. A de novo transcriptome consisting of 66,370 contigs was assembled from eight different amaranth tissue and abiotic stress libraries. Annotation of the genome identified 23,059 protein-coding genes. Seven grain amaranths (, , and and their putative progenitor ( were resequenced. A single nucleotide polymorphism (SNP phylogeny supported the classification of as the progenitor species of the grain amaranths. Lastly, we generated a de novo physical map for using the BioNano Genomics’ Genome Mapping platform. The physical map spanned 340 Mb and a hybrid assembly using the BioNano physical maps nearly doubled the N of the assembly to 697 kb. Moreover, we analyzed synteny between amaranth and sugar beet ( L. and estimated, using analysis, the age of the most recent polyploidization event in amaranth.
Matthew D. MacManes
Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.
Full Text Available Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production.We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem.Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.
Seo, Jeong-Sun; Rhie, Arang; Kim, Junsoo; Lee, Sangjin; Sohn, Min-Hwan; Kim, Chang-Uk; Hastie, Alex; Cao, Han; Yun, Ji-Young; Kim, Jihye; Kuk, Junho; Park, Gun Hwa; Kim, Juhyeok; Ryu, Hanna; Kim, Jongbum; Roh, Mira; Baek, Jeonghun; Hunkapiller, Michael W; Korlach, Jonas; Shin, Jong-Yeon; Kim, Changhoon
Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of
Petra H Lenz
Full Text Available Assessing the impact of global warming on the food web of the North Atlantic will require difficult-to-obtain physiological data on a key copepod crustacean, Calanus finmarchicus. The de novo transcriptome presented here represents a new resource for acquiring such data. It was produced from multiplexed gene libraries using RNA collected from six developmental stages: embryo, early nauplius (NI-II, late nauplius (NV-VI, early copepodite (CI-II, late copepodite (CV and adult (CVI female. Over 400,000,000 paired-end reads (100 base-pairs long were sequenced on an Illumina instrument, and assembled into 206,041 contigs using Trinity software. Coverage was estimated to be at least 65%. A reference transcriptome comprising 96,090 unique components ("comps" was annotated using Blast2GO. 40% of the comps had significant blast hits. 11% of the comps were successfully annotated with gene ontology (GO terms. Expression of many comps was found to be near zero in one or more developmental stages suggesting that 35 to 48% of the transcriptome is "silent" at any given life stage. Transcripts involved in lipid biosynthesis pathways, critical for the C. finmarchicus life cycle, were identified and their expression pattern during development was examined. Relative expression of three transcripts suggests wax ester biosynthesis in late copepodites, but triacylglyceride biosynthesis in adult females. Two of these transcripts may be involved in the preparatory phase of diapause. A key environmental challenge for C. finmarchicus is the seasonal exposure to the dinoflagellate Alexandrium fundyense with high concentrations of saxitoxins, neurotoxins that block voltage-gated sodium channels. Multiple contigs encoding putative voltage-gated sodium channels were identified. They appeared to be the result of both alternate splicing and gene duplication. This is the first report of multiple NaV1 genes in a protostome. These data provide new insights into the transcriptome
Georganas, Evangelos [Intel Corporation, Santa Clara, CA (United States); Hofmeyr, Steven [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Egan, Rob [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Rokhsar, Daniel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.
Jones, Beryl M.; Wcislo, William T.; Robinson, Gene E.
Transcriptomes provide excellent foundational resources for mechanistic and evolutionary analyses of complex traits. We present a developmental transcriptome for the facultatively eusocial bee Megalopta genalis, which represents a potential transition point in the evolution of eusociality. A de novo transcriptome assembly of Megalopta genalis was generated using paired-end Illumina sequencing and the Trinity assembler. Males and females of all life stages were aligned to this transcriptome fo...
Michael T Guarnieri
Full Text Available Biofuels derived from algal lipids represent an opportunity to dramatically impact the global energy demand for transportation fuels. Systems biology analyses of oleaginous algae could greatly accelerate the commercialization of algal-derived biofuels by elucidating the key components involved in lipid productivity and leading to the initiation of hypothesis-driven strain-improvement strategies. However, higher-level systems biology analyses, such as transcriptomics and proteomics, are highly dependent upon available genomic sequence data, and the lack of these data has hindered the pursuit of such analyses for many oleaginous microalgae. In order to examine the triacylglycerol biosynthetic pathway in the unsequenced oleaginous microalga, Chlorella vulgaris, we have established a strategy with which to bypass the necessity for genomic sequence information by using the transcriptome as a guide. Our results indicate an upregulation of both fatty acid and triacylglycerol biosynthetic machinery under oil-accumulating conditions, and demonstrate the utility of a de novo assembled transcriptome as a search model for proteomic analysis of an unsequenced microalga.
Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were
Wan, LingLin; Han, Juan; Sang, Min; Li, AiFen; Wu, Hong; Yin, ShunJi; Zhang, ChengWu
Background Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:22536352
Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying
Full Text Available BACKGROUND: Wax gourd is a widely used vegetable of Cucuribtaceae, and also has important medicinal and health values. However, the genomic resources of wax gourd were scarcity, and only a few nucleotide sequences could be obtained in public databases. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we examined transcriptome in wax gourd. More than 44 million of high quality reads were generated from five different tissues of wax gourd using Illumina paired-end sequencing technology. Approximately 4 Gbp data were generated, and de novo assembled into 65,059 unigenes, with an N50 of 1,132 bp. Based on sequence similarity search with known protein database, 36,070 (55.4% showed significant similarity to known proteins in Nr database, and 24,969 (38.4% had BLAST hits in Swiss-Prot database. Among the annotated unigenes, 14,994 of wax gourd unigenes were assigned to GO term annotation, and 23,977 were found to have COG classifications. In addition, a total of 18,713 unigenes were assigned to 281 KEGG pathways. Furthermore, 6,242 microsatellites (simple sequence repeats were detected as potential molecular markers in wax gourd. Two hundred primer pairs for SSRs were designed for validation of the amplification and polymorphism. The result showed that 170 of the 200 primer pairs were successfully amplified and 49 (28.8% of them exhibited polymorphisms. CONCLUSION/SIGNIFICANCE: Our study enriches the genomic resources of wax gourd and provides powerful information for future studies. The availability of this ample amount of information about the transcriptome and SSRs in wax gourd could serve as valuable basis for studies on the physiology, biochemistry, molecular genetics and molecular breeding of this important vegetable crop.
Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino
Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus' well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers for octopus resistance against
Full Text Available Soil salinization has been a tremendous obstacle for agriculture production. The regulatory networks underlying salinity adaption in model plants have been extensively explored. However, limited understanding of the salt response mechanisms has hindered the planting and production in Fagopyrum tataricum, an economic and health-beneficial plant mainly distributing in southwest China. In this study, we performed physiological analysis and found that salt stress of 200 mM NaCl solution significantly affected the relative water content (RWC, electrolyte leakage (EL, malondialdehyde (MDA content, peroxidase (POD and superoxide dismutase (SOD activities in tartary buckwheat seedlings. Further, we conducted transcriptome comparison between control and salt treatment to identify potential regulatory components involved in F. tataricum salt responses. A total of 53.15 million clean reads from control and salt-treated libraries were produced via an Illumina sequencing approach. Then we de novo assembled these reads into a transcriptome dataset containing 57,921 unigenes with N50 length of 1400 bp and total length of 44.5 Mb. A total of 36,688 unigenes could find matches in public databases. GO, KEGG and KOG classification suggested the enrichment of these unigenes in 56 sub-categories, 25 KOG, and 273 pathways, respectively. Comparison of the transcriptome expression patterns between control and salt treatment unveiled 455 differentially expressed genes (DEGs. Further, we found the genes encoding for protein kinases, phosphatases, heat shock proteins (HSPs, ATP-binding cassette (ABC transporters, glutathione S-transferases (GSTs, abiotic-related transcription factors and circadian clock might be relevant to the salinity adaption of this species. Thus, this study offers an insight into salt tolerance mechanisms, and will serve as useful genetic information for tolerant elite breeding programs in future.
Fritz-Laylin, Lillian K; Levy, Yaron Y; Levitan, Edward; Chen, Sean; Cande, W Zacheus; Lai, Elaine Y; Fulton, Chandler
Centrioles are eukaryotic organelles whose number and position are critical for cilia formation and mitosis. Many cell types assemble new centrioles next to existing ones ("templated" or mentored assembly). Under certain conditions, centrioles also form without pre-existing centrioles (de novo). The synchronous differentiation of Naegleria amoebae to flagellates represents a unique opportunity to study centriole assembly, as nearly 100% of the population transitions from having no centrioles to having two within minutes. Here, we find that Naegleria forms its first centriole de novo, immediately followed by mentored assembly of the second. We also find both de novo and mentored assembly distributed among all major eukaryote lineages. We therefore propose that both modes are ancestral and have been conserved because they serve complementary roles, with de novo assembly as the default when no pre-existing centriole is available, and mentored assembly allowing precise regulation of number, timing, and location of centriole assembly. © 2016 Wiley Periodicals, Inc.
Ranjan, Aashish; Ichihashi, Yasunori; Farhi, Moran; Zumstein, Kristina; Townsley, Brad; David-Schwartz, Rakefet; Sinha, Neelima R.
Parasitic flowering plants are one of the most destructive agricultural pests and have major impact on crop yields throughout the world. Being dependent on finding a host plant for growth, parasitic plants penetrate their host using specialized organs called haustoria. Haustoria establish vascular connections with the host, which enable the parasite to steal nutrients and water. The underlying molecular and developmental basis of parasitism by plants is largely unknown. In order to investigate the process of parasitism, RNAs from different stages (i.e. seed, seedling, vegetative strand, prehaustoria, haustoria, and flower) were used to de novo assemble and annotate the transcriptome of the obligate plant stem parasite dodder (Cuscuta pentagona). The assembled transcriptome was used to dissect transcriptional dynamics during dodder development and parasitism and identified key gene categories involved in the process of plant parasitism. Host plant infection is accompanied by increased expression of parasite genes underlying transport and transporter categories, response to stress and stimuli, as well as genes encoding enzymes involved in cell wall modifications. By contrast, expression of photosynthetic genes is decreased in the dodder infective stages compared with normal stem. In addition, genes relating to biosynthesis, transport, and response of phytohormones, such as auxin, gibberellins, and strigolactone, were differentially expressed in the dodder infective stages compared with stems and seedlings. This analysis sheds light on the transcriptional changes that accompany plant parasitism and will aid in identifying potential gene targets for use in controlling the infestation of crops by parasitic weeds. PMID:24399359
Castro, Juan C; Maddox, J Dylan; Cobos, Marianela; Requena, David; Zimic, Mirko; Bombarely, Aureliano; Imán, Sixto A; Cerdeira, Luis A; Medina, Andersson E
Myrciaria dubia is an Amazonian fruit shrub that produces numerous bioactive phytochemicals, but is best known by its high L-ascorbic acid (AsA) content in fruits. Pronounced variation in AsA content has been observed both within and among individuals, but the genetic factors responsible for this variation are largely unknown. The goals of this research, therefore, were to assemble, characterize, and annotate the fruit transcriptome of M. dubia in order to reconstruct metabolic pathways and determine if multiple pathways contribute to AsA biosynthesis. In total 24,551,882 high-quality sequence reads were de novo assembled into 70,048 unigenes (mean length = 1150 bp, N50 = 1775 bp). Assembled sequences were annotated using BLASTX against public databases such as TAIR, GR-protein, FB, MGI, RGD, ZFIN, SGN, WB, TIGR_CMR, and JCVI-CMR with 75.2 % of unigenes having annotations. Of the three core GO annotation categories, biological processes comprised 53.6 % of the total assigned annotations, whereas cellular components and molecular functions comprised 23.3 and 23.1 %, respectively. Based on the KEGG pathway assignment of the functionally annotated transcripts, five metabolic pathways for AsA biosynthesis were identified: animal-like pathway, myo-inositol pathway, L-gulose pathway, D-mannose/L-galactose pathway, and uronic acid pathway. All transcripts coding enzymes involved in the ascorbate-glutathione cycle were also identified. Finally, we used the assembly to identified 6314 genic microsatellites and 23,481 high quality SNPs. This study describes the first next-generation sequencing effort and transcriptome annotation of a non-model Amazonian plant that is relevant for AsA production and other bioactive phytochemicals. Genes encoding key enzymes were successfully identified and metabolic pathways involved in biosynthesis of AsA, anthocyanins, and other metabolic pathways have been reconstructed. The identification of these genes and pathways is in agreement with
Patricia M. Aguilera
Full Text Available This contribution contains data associated to the research article entitled “Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil. by NGS and de novo transcriptome assembly” (Debat et al., 2014 . By means of a bioinformatic approach involving extensive NGS data analyses, we provide a resource encompassing the full transcriptome assembly of yerba mate, the first available reference for the Ilex L. genus. This dataset (Supplementary files 1 and 2 consolidates the transcriptome-wide assembled sequences of I. paraguariensis with further comprehensive annotation of the protein coding genes of yerba mate via the integration of Arabidopsis thaliana databases. The generated data is pivotal for the characterization of agronomical relevant genes in the tree crop yerba mate -a non-model species- and related taxa in Ilex. The raw sequencing data dissected here is available at DDBJ/ENA/GenBank (NCBI Resource Coordinators, 2016  Sequence Read Archive (SRA under the accession SRP043293 and the assembled sequences have been deposited at the Transcriptome Shotgun Assembly Sequence Database (TSA under the accession GFHV00000000.
Harrison, Peter W; Mank, Judith E; Wedell, Nina
Males and females experience differences in gene dose for loci in the nonrecombining region of heteromorphic sex chromosomes. If not compensated, this leads to expression imbalances, with the homogametic sex on average exhibiting greater expression due to the doubled gene dose. Many organisms with heteromorphic sex chromosomes display global dosage compensation mechanisms, which equalize gene expression levels between the sexes. However, birds and Schistosoma have been previously shown to lack chromosome-wide dosage compensation mechanisms, and the status in other female heterogametic taxa including Lepidoptera remains unresolved. To further our understanding of dosage compensation in female heterogametic taxa and to resolve its status in the lepidopterans, we assessed the Indian meal moth, Plodia interpunctella. As P. interpunctella lacks a complete reference genome, we conducted de novo transcriptome assembly combined with orthologous genomic location prediction from the related silkworm genome, Bombyx mori, to compare Z-linked and autosomal gene expression levels for each sex. We demonstrate that P. interpunctella lacks complete Z chromosome dosage compensation, female Z-linked genes having just over half the expression level of males and autosomal genes. This finding suggests that the Lepidoptera and possibly all female heterogametic taxa lack global dosage compensation, although more species will need to be sampled to confirm this assertion.
Hu, Lisong; Hao, Chaoyun; Fan, Rui; Wu, Baoduo; Tan, Lehe; Wu, Huasong
Black pepper is one of the most popular and oldest spices in the world and valued for its pungent constituent alkaloids. Pinerine is the main bioactive compound in pepper alkaloids, which perform unique physiological functions. However, the mechanisms of piperine synthesis are poorly understood. This study is the first to describe the fruit transcriptome of black pepper by sequencing on Illumina HiSeq 2000 platform. A total of 56,281,710 raw reads were obtained and assembled. From these raw reads, 44,061 unigenes with an average length of 1,345 nt were generated. During functional annotation, 40,537 unigenes were annotated in Gene Ontology categories, Kyoto Encyclopedia of Genes and Genomes pathways, Swiss-Prot database, and Nucleotide Collection (NR/NT) database. In addition, 8,196 simple sequence repeats (SSRs) were detected. In a detailed analysis of the transcriptome, housekeeping genes for quantitative polymerase chain reaction internal control, polymorphic SSRs, and lysine/ornithine metabolism-related genes were identified. These results validated the availability of our database. Our study could provide useful data for further research on piperine synthesis in black pepper.
Full Text Available The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L. is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut and a reference species (oil palm to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/.
Armero, Alix; Baudouin, Luc; Bocs, Stéphanie; This, Dominique
The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).
Gu, Chunsun; Xu, Sheng; Wang, Zhiquan; Liu, Liangqin; Zhang, Yongxia; Deng, Yanming; Huang, Suzhen
As a halophyte, Iris lactea var. chinensis (I. lactea var. chinensis) is widely distributed and has good drought and heavy metal resistance. Moreover, it is an excellent ornamental plant. I. lactea var. chinensis has extensive application prospects owing to the global impacts of salinization. To better understand its molecular mechanism involved in salt resistance, the de novo sequencing, assembly, and analysis of I. lactea var. chinensis roots' transcriptome in response to salt-stress conditions was performed. On average, 74.17% of the clean reads were mapped to unigenes. A total of 121,093 unigenes were constructed and 56,398 (46.57%) were annotated. Among these, 13,522 differentially expressed genes (DEGs) were identified between salt-treated and control samples Compared to the transcriptional level of control, 7037 DEGs were up-regulated and 6539 down-regulated. In addition, 129 up-regulated and 1609 down-regulated genes were simultaneously detected in all three pairwise comparisons between control and salt-stressed libraries. At least 247 and 250 DEGs encoding transcription factors and transporter proteins were identified. Meanwhile, 130 DEGs regarding reactive oxygen species (ROS) scavenging system were also summarized. Based on real-time quantitative RT-PCR, we verified the changes in the expression patterns of 10 unigenes. Our study identified potential salt-responsive candidate genes and increased the understanding of halophyte responses to salinity stress. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Torales Susana L
Full Text Available Abstract Background Nothofagus nervosa is one of the most emblematic native tree species of Patagonian temperate forests. Here, the shotgun RNA-sequencing (RNA-Seq of the transcriptome of N. nervosa, including de novo assembly, functional annotation, and in silico discovery of potential molecular markers to support population and associations genetic studies, are described. Results Pyrosequencing of a young leaf cDNA library generated a total of 111,814 high quality reads, with an average length of 447 bp. De novo assembly using Newbler resulted into 3,005 tentative isotigs (including alternative transcripts. The non-assembled sequences (singletons were clustered with CD-HIT-454 to identify natural and artificial duplicates from pyrosequencing reads, leading to 21,881 unique singletons. 15,497 out of 24,886 non-redundant sequences or unigenes, were successfully annotated against a plant protein database. A substantial number of simple sequence repeat markers (SSRs were discovered in the assembled and annotated sequences. More than 40% of the SSR sequences were inside ORF sequences. To confirm the validity of these predicted markers, a subset of 73 SSRs selected through functional annotation evidences were successfully amplified from six seedlings DNA samples, being 14 polymorphic. Conclusions This paper is the first report that shows a highly precise representation of the mRNAs diversity present in young leaves of a native South American tree, N. nervosa, as well as its in silico deduced putative functionality. The reported Nothofagus transcriptome sequences represent a unique resource for genetic studies and provide a tool to discover genes of interest and genetic markers that will greatly aid questions involving evolution, ecology, and conservation using genetic and genomic approaches in the genus.
Full Text Available Antarctic krill (Euphausia superba is a key species in the Southern Ocean with an estimated biomass between 100 and 500 million tonnes. Changes in krill population viability would have catastrophic effect on the Antarctic ecosystem. One looming threat due to elevated levels of anthropogenic atmospheric carbon dioxide (CO2 is ocean acidification (lowering of sea water pH by CO2 dissolving into the oceans. The genetics of Antarctic krill has long been of scientific interest for both for the analysis of population structure and analysis of functional genetics. However, the genetic resources available for the species are relatively modest. We have developed the most advanced genetic database on Euphausia superba, KrillDB, which includes comprehensive data sets of former and present transcriptome projects. In particular, we have built a de novo transcriptome assembly using more than 360 million Illumina sequence reads generated from larval krill including individuals subjected to different CO2 levels. The database gives access to: 1 the full list of assembled genes and transcripts; 2 their level of similarity to transcripts and proteins from other species; 3 the predicted protein domains contained within each transcript; 4 their predicted GO terms; 5 the level of expression of each transcript in the different larval stages and CO2 treatments. All references to external entities (sequences, domains, GO terms are equipped with a link to the appropriate source database. Moreover, the software implements a full-text search engine that makes it possible to submit free-form queries. KrillDB represents the first large-scale attempt at classifying and annotating the full krill transcriptome. For this reason, we believe it will constitute a cornerstone of future approaches devoted to physiological and molecular study of this key species in the Southern Ocean food web.
Rajesh, M K; Fayas, T P; Naganeeswaran, S; Rachana, K E; Bhavyashree, U; Sajini, K K; Karun, Anitha
Production and supply of quality planting material is significant to coconut cultivation but is one of the major constraints in coconut productivity. Rapid multiplication of coconut through in vitro techniques, therefore, is of paramount importance. Although somatic embryogenesis in coconut is a promising technique that will allow for the mass production of high quality palms, coconut is highly recalcitrant to in vitro culture. In order to overcome the bottlenecks in coconut somatic embryogenesis and to develop a repeatable protocol, it is imperative to understand, identify, and characterize molecular events involved in coconut somatic embryogenesis pathway. Transcriptome analysis (RNA-Seq) of coconut embryogenic calli, derived from plumular explants of West Coast Tall cultivar, was undertaken on an Illumina HiSeq 2000 platform. After de novo transcriptome assembly and functional annotation, we have obtained 40,367 transcripts which showed significant BLASTx matches with similarity greater than 40 % and E value of ≤10(-5). Fourteen genes known to be involved in somatic embryogenesis were identified. Quantitative real-time PCR (qRT-PCR) analyses of these 14 genes were carried in six developmental stages. The result showed that CLV was upregulated in the initial stage of callogenesis. Transcripts GLP, GST, PKL, WUS, and WRKY were expressed more in somatic embryo stage. The expression of SERK, MAPK, AP2, SAUR, ECP, AGP, LEA, and ANT were higher in the embryogenic callus stage compared to initial culture and somatic embryo stages. This study provides the first insights into the gene expression patterns during somatic embryogenesis in coconut.
Hafidh, Said; Breznenová, Katarína; Honys, David
Roč. 7, č. 8 (2012), s. 918-921 ISSN 1559-2316 R&D Projects: GA ČR GPP501/11/P321; GA ČR GA522/09/0858 Institutional research plan: CEZ:AV0Z50380511 Keywords : de novo pollen tube transcriptome * male gametophyte development * pollen tube growth Subject RIV: ED - Physiology
Full Text Available BACKGROUND: The Asian clam (Corbicula fluminea is currently one of the most economically important aquatic species in China and has been used as a test organism in many environmental studies. However, the lack of genomic resources, such as sequenced genome, expressed sequence tags (ESTs and transcriptome sequences has hindered the research on C. fluminea. Recent advances in large-scale RNA-Seq enable generation of genomic resources in a short time, and provide large expression datasets for functional genomic analysis. METHODOLOGY/PRINCIPAL FINDINGS: We used a next-generation high-throughput DNA sequencing technique with an Illumina GAIIx method to analyze the transcriptome from the whole bodies of C. fluminea. More than 62,250,336 high-quality reads were generated based on the raw data, and 134,684 unigenes with a mean length of 791 bp were assembled using the Velvet and Oases software. All of the assembly unigenes were annotated by running BLASTx and BLASTn similarity searches on the Nt, Nr, Swiss-Prot, COG and KEGG databases. In addition, the Clusters of Orthologous Groups (COGs, Gene Ontology (GO terms and Kyoto Encyclopedia of Gene and Genome (KEGG annotations were also assigned to each unigene transcript. To provide a preliminary verification of the assembly and annotation results, and search for potential environmental pollution biomarkers, 15 functional genes (five antioxidase genes, two cytochrome P450 genes, three GABA receptor-related genes and five heat shock protein genes were cloned and identified. Expressions of the 15 selected genes following fluoxetine exposure confirmed that the genes are indeed linked to environmental stress. CONCLUSIONS/SIGNIFICANCE: The C. fluminea transcriptome advances the underlying molecular understanding of this freshwater clam, provides a basis for further exploration of C. fluminea as an environmental test organism and promotes further studies on other bivalve organisms.
Full Text Available Forsythia spp. are perennial woody plants which are one of the most extensively used medicinal sources of Chinese medicines and functional diets owing to their lignan contents. Lignans have received widespread attention as leading compounds in the development of antitumor drugs and healthy diets for reducing the risks of lifestyle-related diseases. However, the molecular basis of Forsythia has yet to be established. In this study, we have verified de novo deep transcriptome of Forsythia koreana leaf and callus using the Illumina HiSeq 1500 platform. A total of 89 million reads were assembled into 116,824 contigs using Trinity, and 1,576 of the contigs displayed the sequence similarity to the enzymes responsible for plant specialized metabolism including lignan biosynthesis. Notably, gene ontology (GO analysis indicated the remarkable enrichment of lignan-biosynthetic enzyme genes in the callus transcriptome. Nevertheless, precise annotation and molecular phylogenetic analyses were hindered by partial sequences of open reading frames (ORFs of the Trinity-based contigs. To obtain more numerous contigs harboring a full-length ORF, we developed a novel overlapping layout consensus-based procedure, virtual primer-based sequence reassembly (VP-seq. VP-seq elucidated 709 full-length ORFs, whereas only 146 full-length ORFs were assembled by Trinity. The comparison of expression profiles of leaf and callus using VP-seq-based full-length ORFs revealed 50-fold upregulation of secoisolariciresinol dehydrogenase (SIRD in callus. Expression and phylogenetic cluster analyses predicted candidates for matairesinol-glucosylating enzymes. We also performed VP-seq analysis of lignan-biosynthetic enzyme genes in the transcriptome data of other lignan-rich plants, Linum flavum, Linum usitatissimum and Podophyllum hexandrum. The comparative analysis indicated both common gene clusters involved in biosynthesis upstream of matairesinol such as SIRD and plant lineage
Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.
Full Text Available Abstract Background Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage. Results We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software. Conclusions We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.
Richardson, Mark F; Sequeira, Fernando; Selechnik, Daniel; Carneiro, Miguel; Vallinoto, Marcelo; Reid, Jack G; West, Andrea J; Crossland, Michael R; Shine, Richard; Rollins, Lee A
Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. © The Authors 2017. Published by Oxford University Press.
Full Text Available Almond (Prunus dulcis Mill., one of the most important nut crops, requires chilling during winter to develop fruiting buds. However, early spring chilling and late spring frost may damage the reproductive tissues leading to reduction in the rate of productivity. Despite the importance of transcriptional changes and regulation, little is known about the almond's transcriptome under the cold stress conditions. In the current research, we used RNA-seq technique to study the response of the reproductive tissues of almond (anther and ovary to frost stress. RNA sequencing resulted in more than 20 million reads from anther and ovary tissues of almond, individually. About 40,000 contigs were assembled and annotated de novo in each tissue. Profile of gene expression in ovary showed significant alterations in 5,112 genes, whereas in anther 6,926 genes were affected by freezing stress. Around two thousands of these genes were common altered genes in both ovary and anther libraries. Gene ontology indicated the involvement of differentially expressed (DE genes, responding to freezing stress, in metabolic and cellular processes. qRT-PCR analysis verified the expression pattern of eight genes randomly selected from the DE genes. In conclusion, the almond gene index assembled in this study and the reported DE genes can provide great insights on responses of almond and other Prunus species to abiotic stresses. The obtained results from current research would add to the limited available information on almond and Rosaceae. Besides, the findings would be very useful for comparative studies as the number of DE genes reported here is much higher than that of any previous reports in this plant.
Mousavi, Sadegh; Alisoltani, Arghavan; Shiran, Behrouz; Fallahi, Hossein; Ebrahimie, Esameil; Imani, Ali; Houshmand, Saadollah
Almond (Prunus dulcis Mill.), one of the most important nut crops, requires chilling during winter to develop fruiting buds. However, early spring chilling and late spring frost may damage the reproductive tissues leading to reduction in the rate of productivity. Despite the importance of transcriptional changes and regulation, little is known about the almond's transcriptome under the cold stress conditions. In the current research, we used RNA-seq technique to study the response of the reproductive tissues of almond (anther and ovary) to frost stress. RNA sequencing resulted in more than 20 million reads from anther and ovary tissues of almond, individually. About 40,000 contigs were assembled and annotated de novo in each tissue. Profile of gene expression in ovary showed significant alterations in 5,112 genes, whereas in anther 6,926 genes were affected by freezing stress. Around two thousands of these genes were common altered genes in both ovary and anther libraries. Gene ontology indicated the involvement of differentially expressed (DE) genes, responding to freezing stress, in metabolic and cellular processes. qRT-PCR analysis verified the expression pattern of eight genes randomly selected from the DE genes. In conclusion, the almond gene index assembled in this study and the reported DE genes can provide great insights on responses of almond and other Prunus species to abiotic stresses. The obtained results from current research would add to the limited available information on almond and Rosaceae. Besides, the findings would be very useful for comparative studies as the number of DE genes reported here is much higher than that of any previous reports in this plant.
Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.
Homoeologous copies of transcripts are abundant in many self-pollinating species including tetraploid peanut, and can impose a challenge to build a transcriptome reference without the merging of homoeologs. De novo transcriptome assembly of tetraploid OLin with single kmer and multiple kmer approach...
Full Text Available Isodon rubescens is an important medicinal plant in China that has been shown to reduce tumour growth due to the presence of the compound oridonin. In an effort to facilitate molecular research on oridonin biosynthesis, we reported the use of next generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of I. rubescens transcriptome. In our study, a total of 50,934,276 clean reads, 101,640 transcripts and 44,626 unigenes were generated through de novo transcriptome assembly. A number of unigenes – 23,987, 10,263, 7359, 18,245, 17,683, 19,485, 9361 – were annotated in the National Center for Biotechnology Information (NCBI non-redundant protein (Nr, NCBI nucleotide sequences (Nt, Kyoto Encyclopedia of Genes and Genomes (KEGG Orthology (KO, Swiss-Prot, protein family (Pfam, gene ontology (GO, eukaryotic ortholog groups (KOG databases, respectively. Furthermore, the annotated unigenes were functionally classified according to the GO, KOG and KEGG. Based on these results, candidate genes encoding enzymes involved in terpenoids backbone biosynthesis were detected. Our data provided the most comprehensive sequence resource available for the study on I. rubescens, as well as demonstrated the effective use of Illumina sequencing and de novo transcriptome assembly on a species lacking genomic information.
Kamphuis, Lars G; Hane, James K; Nelson, Matthew N; Gao, Lingling; Atkins, Craig A; Singh, Karam B
Narrow-leafed lupin (NLL; Lupinus angustifolius L.) is an important grain legume crop that is valuable for sustainable farming and is becoming recognized as a human health food. NLL breeding is directed at improving grain production, disease resistance, drought tolerance and health benefits. However, genetic and genomic studies have been hindered by a lack of extensive genomic resources for the species. Here, the generation, de novo assembly and annotation of transcriptome datasets derived from five different NLL tissue types of the reference accession cv. Tanjil are described. The Tanjil transcriptome was compared to transcriptomes of an early domesticated cv. Unicrop, a wild accession P27255, as well as accession 83A:476, together being the founding parents of two recombinant inbred line (RIL) populations. In silico predictions for transcriptome-derived gene-based length and SNP polymorphic markers were conducted and corroborated using a survey assembly sequence for NLL cv. Tanjil. This yielded extensive indel and SNP polymorphic markers for the two RIL populations. A total of 335 transcriptome-derived markers and 66 BAC-end sequence-derived markers were evaluated, and 275 polymorphic markers were selected to genotype the reference NLL 83A:476 × P27255 RIL population. This significantly improved the completeness, marker density and quality of the reference NLL genetic map. PMID:25060816
Aug 5, 2016 ... Most of these assemblies are done using some de novo short read assemblers and other related approaches. .... benchmarking projects like Assemblathon 1, Assemblathon ... from a large insert library (at least 1000 bases).
Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent
or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...
Han, Jeongsukhyeon; Thamilarasan, Senthil Kumar; Natarajan, Sathishkumar; Park, Jong-In; Chung, Mi-Young; Nou, Ill-Sup
Bulb onion (Allium cepa) is the second most widely cultivated and consumed vegetable crop in the world. During winter, cold injury can limit the production of bulb onion. Genomic resources available for bulb onion are still very limited. To date, no studies on heritably durable cold and freezing tolerance have been carried out in bulb onion genotypes. We applied high-throughput sequencing technology to cold (2°C), freezing (-5 and -15°C), and control (25°C)-treated samples of cold tolerant (CT) and cold susceptible (CS) genotypes of A. cepa lines. A total of 452 million paired-end reads were de novo assembled into 54,047 genes with an average length of 1,331 bp. Based on similarity searches, these genes were aligned with entries in the public non-redundant (nr) database, as well as KEGG and COG database. Differentially expressed genes (DEGs) were identified using log10 values with the FPKM method. Among 5,167DEGs, 491 genes were differentially expressed at freezing temperature compared to the control temperature in both CT and CS libraries. The DEG results were validated with qRT-PCR. We performed GO and KEGG pathway enrichment analyses of all DEGs and iPath interactive analysis found 31 pathways including those related to metabolism of carbohydrate, nucleotide, energy, cofactors and vitamins, other amino acids and xenobiotics biodegradation. Furthermore, a large number of molecular markers were identified from the assembled genes, including simple sequence repeats (SSRs) 4,437 and SNP substitutions of transition and transversion types of CT and CS. Our study is the first to provide a transcriptome sequence resource for Allium spp. with regard to cold and freezing stress. We identified a large set of genes and determined their DEG profiles under cold and freezing conditions using two different genotypes. These data represent a valuable resource for genetic and genomic studies of Allium spp.
Full Text Available Bulb onion (Allium cepa is the second most widely cultivated and consumed vegetable crop in the world. During winter, cold injury can limit the production of bulb onion. Genomic resources available for bulb onion are still very limited. To date, no studies on heritably durable cold and freezing tolerance have been carried out in bulb onion genotypes. We applied high-throughput sequencing technology to cold (2°C, freezing (-5 and -15°C, and control (25°C-treated samples of cold tolerant (CT and cold susceptible (CS genotypes of A. cepa lines. A total of 452 million paired-end reads were de novo assembled into 54,047 genes with an average length of 1,331 bp. Based on similarity searches, these genes were aligned with entries in the public non-redundant (nr database, as well as KEGG and COG database. Differentially expressed genes (DEGs were identified using log10 values with the FPKM method. Among 5,167DEGs, 491 genes were differentially expressed at freezing temperature compared to the control temperature in both CT and CS libraries. The DEG results were validated with qRT-PCR. We performed GO and KEGG pathway enrichment analyses of all DEGs and iPath interactive analysis found 31 pathways including those related to metabolism of carbohydrate, nucleotide, energy, cofactors and vitamins, other amino acids and xenobiotics biodegradation. Furthermore, a large number of molecular markers were identified from the assembled genes, including simple sequence repeats (SSRs 4,437 and SNP substitutions of transition and transversion types of CT and CS. Our study is the first to provide a transcriptome sequence resource for Allium spp. with regard to cold and freezing stress. We identified a large set of genes and determined their DEG profiles under cold and freezing conditions using two different genotypes. These data represent a valuable resource for genetic and genomic studies of Allium spp.
Yu, Dongliang; Meng, Yijun; Zuo, Ziwei; Xue, Jie; Wang, Huizhong
Nat-siRNAs (small interfering RNAs originated from natural antisense transcripts) are a class of functional small RNA (sRNA) species discovered in both plants and animals. These siRNAs are highly enriched within the annealed regions of the NAT (natural antisense transcript) pairs. To date, great research efforts have been taken for systematical identification of the NATs in various organisms. However, developing a freely available and easy-to-use program for NAT prediction is strongly demanded by researchers. Here, we proposed an integrative pipeline named NATpipe for systematical discovery of NATs from de novo assembled transcriptomes. By utilizing sRNA sequencing data, the pipeline also allowed users to search for phase-distributed nat-siRNAs within the perfectly annealed regions of the NAT pairs. Additionally, more reliable nat-siRNA loci could be identified based on degradome sequencing data. A case study on the non-model plant Dendrobium officinale was performed to illustrate the utility of NATpipe. Finally, we hope that NATpipe would be a useful tool for NAT prediction, nat-siRNA discovery, and related functional studies. NATpipe is available at www.bioinfolab.cn/NATpipe/NATpipe.zip. PMID:26858106
Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K
Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition
Full Text Available Salix matsudana Koidz. is a deciduous, rapidly growing, and drought resistant tree and is one of the most widely distributed and commonly cultivated willow species in China. Currently little transcriptomic and small RNAomic data are available to reveal the genes involve in the stress resistant in S. matsudana. Here, we report the RNA-seq analysis results of both transcriptome and small RNAome data using Illumina deep sequencing of shoot tips from two willow variants(Salix. matsudana and Salix matsudana Koidz. cultivar 'Tortuosa'. De novo gene assembly was used to generate the consensus transcriptome and small RNAome, which contained 106,403 unique transcripts with an average length of 944 bp and a total length of 100.45 MB, and 166 known miRNAs representing 35 miRNA families. Comparison of transcriptomes and small RNAomes combined with quantitative real-time PCR from the two Salix libraries revealed a total of 292 different expressed genes(DEGs and 36 different expressed miRNAs (DEMs. Among the DEGs and DEMs, 196 genes and 24 miRNAs were up regulated, 96 genes and 12 miRNA were down regulated in S. matsudana. Functional analysis of DEGs and miRNA targets showed that many genes were involved in stress resistance in S. matsudana. Our global gene expression profiling presents a comprehensive view of the transcriptome and small RNAome which provide valuable information and sequence resources for uncovering the stress response genes in S. matsudana. Moreover the transcriptome and small RNAome data provide a basis for future study of genetic resistance in Salix.
Cannon Charles H
Full Text Available Abstract Background Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168 and one legume-specific family (miR2086. Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs in the transcriptomes of A. auriculiformis and A. mangium
Zhang, Senhao; Shi, Yinghua; Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang
Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa.
Čegan, R.; Hudzieczek, V.; Hobza, Roman
Roč. 11, MAR (2017), s. 118-119 ISSN 2213-5960 Institutional support: RVO:61389030 Keywords : genome * Silene dioica * RNA-Seq * Transcriptome * Heavy metal tolerance * Sex chromosomes Subject RIV: EB - Genetics ; Molecular Biology OBOR OECD: Plant sciences, botany
Fang, Lu; Yang, Yuchen; Guo, Wuxia; Li, Jianfang; Zhong, Cairong; Huang, Yelin; Zhou, Renchao; Shi, Suhua
Aegiceras corniculatum (L.) Blanco is one of the most salt tolerant mangrove species and can thrive in 3% salinity at the seaward edge of mangrove forests. Here we sequenced the transcriptome of A. corniculatum used Illumina GA platform to develop its genomic resources for ecological and evolutionary studies. We obtained about 50 million high-quality paired-end reads with 75bp in length. Using the short read assembler Velvet, we yielded 49,437 contigs with the average length of 625bp. A total of 32,744 (66.23%) contigs showed significant similarity to the GenBank non-redundant (NR) protein database. 30,911 and 18,004 of these sequences were assigned to Gene Ontology and eukaryotic orthologous groups of proteins (KOG). A total of 4942 transcripts from our assemblies had significant similarity with KEGG Orthologs and were involved in 144 KEGG pathways, while 9899 unigenes had enzyme commission (EC) numbers. In addition, 9792 transcriptome-derived SSRs were identified from 7342 sequences. With our strict criteria, 4165 candidate SNPs were also identified from 2058 contigs. Some of these SNPs were further validated by Sanger sequencing. Genomic resources generated in this study should be valuable in ecological, evolutionary, and functional genomics studies for this mangrove species. Copyright © 2016 Elsevier B.V. All rights reserved.
De novo genome assemblers designed for short k-mer length or using short raw reads are unlikely to recover complex features of the underlying genome, such as repeats hundreds of bases long. We implement a stochastic machine-learning method which obtains accurate assemblies with repeats and
Liu, Miaomiao; Zhu, Jinhang; Wu, Shengbing; Wang, Chenkai; Guo, Xingyi; Wu, Jiawen; Zhou, Meiqi
Artemisia argyi Lev. et Vant. (A. argyi) is widely utilized for moxibustion in Chinese medicine, and the mechanism underlying terpenoid biosynthesis in its leaves is suggested to play an important role in its medicinal use. However, the A. argyi transcriptome has not been sequenced. Herein, we performed RNA sequencing for A. argyi leaf, root and stem tissues to identify as many as possible of the transcribed genes. In total, 99,807 unigenes were assembled by analysing the expression profiles generated from the three tissue types, and 67,446 of those unigenes were annotated in public databases. We further performed differential gene expression analysis to compare leaf tissue with the other two tissue types and identified numerous genes that were specifically expressed or up-regulated in leaf tissue. Specifically, we identified multiple genes encoding significant enzymes or transcription factors related to terpenoid synthesis. This study serves as a valuable resource for transcriptome information, as many transcribed genes related to terpenoid biosynthesis were identified in the A. argyi transcriptome, providing a functional genomic basis for additional studies on molecular mechanisms underlying the medicinal use of A. argyi.
Xu, Elvis Genbo; Mager, Edward M.; Grosell, Martin; Hazard, E. Starr; Hardiman, Gary; Schlenk, Daniel
The impacts of Deepwater Horizon (DWH) oil on morphology and function during embryonic development have been documented for a number of fish species, including the economically and ecologically important pelagic species, mahi-mahi (Coryphaena hippurus). However, further investigations on molecular events and pathways responsible for developmental toxicity have been largely restricted due to the limited molecular data available for this species. We sought to establish the de novo transcriptomic database from the embryos and larvae of mahi-mahi exposed to water accommodated fractions (HEWAFs) of two DWH oil types (weathered and source oil), in an effort to advance our understanding of the molecular aspects involved during specific toxicity responses. By high throughput sequencing (HTS), we obtained the first de novo transcriptome of mahi-mahi, with 60,842 assembled transcripts and 30,518 BLAST hits. Among them, 2,345 genes were significantly regulated in 96hpf larvae after exposure to weathered oil. With comparative analysis to a reference-transcriptome-guided approach on gene ontology and tox-pathways, we confirmed the novel approach effective for exploring tox-pathways in non-model species, and also identified a list of co-expressed genes as potential biomarkers which will provide information for the construction of an Adverse Outcome Pathway which could be useful in Ecological Risk Assessments.
Duan, Jun; Ladd, Tim; Doucet, Daniel; Cusson, Michel; vanFrankenhuyzen, Kees; Mittapalli, Omprakash; Krell, Peter J; Quan, Guoxing
The Emerald ash borer (EAB), Agrilus planipennis, is an invasive phloem-feeding insect pest of ash trees. Since its initial discovery near the Detroit, US- Windsor, Canada area in 2002, the spread of EAB has had strong negative economic, social and environmental impacts in both countries. Several transcriptomes from specific tissues including midgut, fat body and antenna have recently been generated. However, the relatively low sequence depth, gene coverage and completeness limited the usefulness of these EAB databases. High-throughput deep RNA-Sequencing (RNA-Seq) was used to obtain 473.9 million pairs of 100 bp length paired-end reads from various life stages and tissues. These reads were assembled into 88,907 contigs using the Trinity strategy and integrated into 38,160 unigenes after redundant sequences were removed. We annotated 11,229 unigenes by searching against the public nr, Swiss-Prot and COG. The EAB transcriptome assembly was compared with 13 other sequenced insect species, resulting in the prediction of 536 unigenes that are Coleoptera-specific. Differential gene expression revealed that 290 unigenes are expressed during larval molting and 3,911 unigenes during metamorphosis from larvae to pupae, respectively (FDR2). In addition, 1,167 differentially expressed unigenes were identified from larval and adult midguts, 435 unigenes were up-regulated in larval midgut and 732 unigenes were up-regulated in adult midgut. Most of the genes involved in RNA interference (RNAi) pathways were identified, which implies the existence of a system RNAi in EAB. This study provides one of the most fundamental and comprehensive transcriptome resources available for EAB to date. Identification of the tissue- stage- or species- specific unigenes will benefit the further study of gene functions during growth and metamorphosis processes in EAB and other pest insects.
Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high...
Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...
Chen, Honglin; Wang, Lixia; Liu, Xiaoyan; Hu, Liangliang; Wang, Suhua; Cheng, Xuzhen
Cowpea [Vigna unguiculata (L.) Walp.] is one of the most important legumes in tropical and semi-arid regions. However, there is relatively little genomic information available for genetic research on and breeding of cowpea. The objectives of this study were to analyse the cowpea transcriptome and develop genic molecular markers for future genetic studies of this genus. Approximately 54 million high-quality cDNA sequence reads were obtained from cowpea based on Illumina paired-end sequencing technology and were de novo assembled to generate 47,899 unigenes with an N50 length of 1534 bp. Sequence similarity analysis revealed 36,289 unigenes (75.8%) with significant similarity to known proteins in the non-redundant (Nr) protein database, 23,471 unigenes (49.0%) with BLAST hits in the Swiss-Prot database, and 20,654 unigenes (43.1%) with high similarity in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Further analysis identified 5560 simple sequence repeats (SSRs) as potential genic molecular markers. Validating a random set of 500 SSR markers yielded 54 polymorphic markers among 32 cowpea accessions. This transcriptomic analysis of cowpea provided a valuable set of genomic data for characterizing genes with important agronomic traits in Vigna unguiculata and a new set of genic SSR markers for further genetic studies and breeding in cowpea and related Vigna species.
Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan
High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.
Sun, Xiujun; Li, Dongming; Liu, Zhihong; Zhou, Liqing; Wu, Biao; Yang, Aiguo
The pen shell ( Atrina pectinata) is a large wedge-shaped bivalve, which belongs to family Pinnidae. Due to its large and nutritious adductor muscle, it is the popular seafood with high commercial value in Asia-Pacific countries. However, limiting genomic and transcriptomic data have hampered its genetic investigations. In this study, the transcriptome of A. pectinata was deeply sequenced using Illumina pair-end sequencing technology. After assembling, a total of 127263 unigenes were obtained. Functional annotation indicated that the highest percentage of unigenes (18.60%) was annotated on GO database, followed by 18.44% on PFAM database and 17.04% on NR database. There were 270 biological pathways matched with those in KEGG database. Furthermore, a total of 23452 potential simple sequence repeats (SSRs) were identified, of them the most abundant type was mono-nucleotide repeats (12902, 55.01%), which was followed by di-nucleotide (8132, 34.68%), tri-nucleotide (2010, 8.57%), tetra-nucleotide (401, 1.71%), and penta-nucleotide (7, 0.03%) repeats. Sixty SSRs were selected for validating and developing genic SSR markers, of them 23 showed polymorphism in a cultured population with the average observed and expected heterozygosities of 0.412 and 0.579, respectively. In this study, we established the first comprehensive transcript dataset of A. pectinata genes. Our results demonstrated that RNA-Seq is a fast and cost-effective method for genic SSR development in non-model species.
Full Text Available The plant Dioscorea composita has important applications in the medical and energy industries, and can be used for the extraction of steroidal sapogenins (important raw materials for the synthesis of steroidal drugs and bioethanol production. However, little is known at the genetic level about how sapogenins are biosynthesized in this plant. Using Illumina deep sequencing, 62,341 unigenes were obtained by assembling its transcriptome, and 27,720 unigenes were annotated. Of these, 8,022 unigenes were mapped to 243 specific pathways, and 531 unigenes were identified to be involved in 24 secondary metabolic pathways. 35 enzymes, which were encoded by 79 unigenes, were related to the biosynthesis of steroidal sapogenins in this transcriptome database, covering almost all the nodes in the steroidal pathway. The results of real-time PCR experiments on ten related transcripts (HMGR, MK, SQLE, FPPS, DXS, CAS, HMED, CYP51, DHCR7, and DHCR24 indicated that sapogenins were mainly biosynthesized by the mevalonate pathway. The expression of these ten transcripts in the tuber and leaves was found to be much higher than in the stem. Also, expression in the shoots was low. The nucleotide and protein sequences and conserved domains of four related genes (HMGR, CAS, SQS, and SMT1 were highly conserved between D. composita and D. zingiberensis; but expression of these four genes is greater in D. composita. However, there is no expression of these key enzymes in potato and no steroidal sapogenins are synthesized.
Fu, Yuanshuai; Jia, Liang; Shi, Zhiyi; Zhang, Junling; Li, Wenjuan
The Japanese flounder (Paralichthys olivaceus) is one of the most important commercial and biological marine fishes. However, the molecular biology involved during embryogenesis and early development of the Japanese flounder remains largely unknown due to a lack of genomic resources. A comprehensive and integrated transcriptome is necessary to study the molecular mechanisms of early development and to allow for the detailed characterization of gene expression patterns during embryogenesis; this approach is critical to understanding the processes that occur prior to mesectoderm formation during early embryonic development. In this study, more than 117.8 million 100bp PE reads were generated from pooled RNA extracted from unfertilized eggs to 41dph (days post-hatching) embryos and were sequenced using Illumina pair-end sequencing technology. In total, 121,513 transcripts (≥200bp) were obtained using de novo assembly. A sequence similarity search indicated that 52,338 transcripts show significant similarity to 22,462 known proteins from the NCBI non-redundant database and the Swiss-Prot protein database and were annotated using Blast2GO. GO terms were assigned to 44,627 transcripts with 12,006 functional terms, and 10,024 transcripts were assigned to 133 KEGG pathways. Furthermore, gene expression differences between the unfertilized egg and the gastrula embryo were analysed using Illumina RNA-Seq with single-read sequencing technology, and 24,837 differentially and specifically expressed transcripts were identified and included 5,286 annotated transcripts and 19,569 non-annotated transcripts. All of the expressed transcripts in the unfertilized egg and gastrula embryo were further classified as maternal, zygotic, or maternal-zygotic transcripts, which may help us to understand the roles of these transcripts during the embryonic development of the Japanese flounder. Thus, the results will contribute to an improved understanding of the gene expression patterns and
Full Text Available Elettaria cardamomum (L. Maton, known as ‘queen of spices, is a perennial herbaceous monocot of the family Zingiberaceae, native to southern India. Cardamom is an economically valuable spice crop and used widely in culinary and medicinal purposes. In the present study, using Ion Proton RNA sequencing technology, we performed transcriptome sequencing and de novo transcriptome assembly of a wild and five cultivar genotypes of cardamom. RNA-seq generated a total of 22,811,983 (92 base and 24,889,197 (75 base raw reads accounting for approximately 8.21GB and 7.65GB of sequence data for wild and cultivar genotypes of cardamom respectively. The raw data were submitted to SRA database of NCBI under the accession numbers SRX1141272 (wild and SRX1141276 (cultivars. The raw reads were quality filtered and assembled using MIRA assembler resulted with 112,208 and 264,161contigs having N50 value 616 and 664 for wild and cultivar cardamom respectively. The assembled unigenes were functionally annotated using several databases including PlantCyc for pathway annotation. This work represents the first report on cardamom transcriptome sequencing. In order to generate a comprehensive reference transcriptome, we further assembled the raw reads of wild and cultivar genotypes which might enrich the plant transcriptome database and trigger advanced research in cardamom genomics.
Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for
Full Text Available Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data
Eshbach, Megan L; Sethi, Rahil; Avula, Raghunandan; Lamb, Janette; Hollingshead, Deborah J; Finegold, David N; Locker, Joseph D; Chandran, Uma R; Weisz, Ora A
The OK cell line derived from the kidney of a female opossum Didelphis virginiana has proven to be a useful model in which to investigate the unique regulation of ion transport and membrane trafficking mechanisms in the proximal tubule (PT). Sequence data and comparison of the transcriptome of this cell line to eutherian mammal PTs would further broaden the utility of this culture model. However, the genomic sequence for D. virginiana is not available and although a draft genome sequence for the opossum Monodelphis domestica (sequenced in 2012 by the Broad Institute) exists, transcripts sequenced from both species show significant divergence. The M. domestica sequence is not highly annotated, and the majority of transcripts are predicted rather than experimentally validated. Using deep RNA sequencing of the D. virginiana OK cell line, we characterized its transcriptome via de novo transcriptome assembly and alignment to the M. domestica genome. The quality of the de novo assembled transcriptome was assessed by the extent of homology to sequences in nucleotide and protein databases. Gene expression levels in the OK cell line, from both the de novo transcriptome and genes aligned to the M. domestica genome, were compared with publicly available rat kidney nephron segment expression data. Our studies demonstrate the expression in OK cells of numerous PT-specific ion transporters and other key proteins relevant for rodent and human PT function. Additionally, the sequence and expression data reported here provide an important resource for genetic manipulation and other studies on PT cell function using these cells. Copyright © 2017 the American Physiological Society.
Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo
In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.
Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi
Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Rebecca R. Murphy
Full Text Available Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors. We show that NxRepair can identify and correct large scaffolding errors, without use of a reference sequence, resulting in quantitative improvements in the assembly quality. NxRepair can be downloaded from GitHub or PyPI, the Python Package Index; a tutorial and user documentation are also available.
Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin
The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This
Jones, Beryl M; Wcislo, William T; Robinson, Gene E
Transcriptomes provide excellent foundational resources for mechanistic and evolutionary analyses of complex traits. We present a developmental transcriptome for the facultatively eusocial bee Megalopta genalis, which represents a potential transition point in the evolution of eusociality. A de novo transcriptome assembly of Megalopta genalis was generated using paired-end Illumina sequencing and the Trinity assembler. Males and females of all life stages were aligned to this transcriptome for analysis of gene expression profiles throughout development. Gene Ontology analysis indicates that stage-specific genes are involved in ion transport, cell-cell signaling, and metabolism. A number of distinct biological processes are upregulated in each life stage, and transitions between life stages involve shifts in dominant functional processes, including shifts from transcriptional regulation in embryos to metabolism in larvae, and increased lipid metabolism in adults. We expect that this transcriptome will provide a useful resource for future analyses to better understand the molecular basis of the evolution of eusociality and, more generally, phenotypic plasticity. Copyright © 2015 Jones et al.
Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang
Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.
Full Text Available Sinapis alba is an important condiment crop and can also be used as a phytoremediation plant. Though it has important economic and agronomic values, sequence data and the genetic tools are still rare in this plant. In the present study, a de novo transcriptome based on the transcriptions of leaves, stems and roots was assembled for S. alba for the first time. The transcriptome contains 47,972 unigenes with a mean length of 1,185 nt and an N50 of 1,672 nt. Among these unigenes, 46,535 (97% unigenes were annotated by at least one of the following databases: NCBI non-redundant (Nr, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG pathway, Gene Ontology (GO, and Clusters of Orthologous Groups of proteins (COGs. The tissue expression pattern profiles revealed that 3,489, 1,361 and 8,482 unigenes were predominantly expressed in the leaves, stems and roots of S. alba, respectively. Genes predominantly expressed in the leaf were enriched in photosynthesis- and carbon fixation-related pathways. Genes predominantly expressed in the stem were enriched in not only pathways related to sugar, ether lipid and amino acid metabolisms but also plant hormone signal transduction and circadian rhythm pathways, while the root-dominant genes were enriched in pathways related to lignin and cellulose syntheses, involved in plant-pathogen interactions, and potentially responsible for heavy metal chelating and detoxification. Based on this transcriptome, 14,727 simple sequence repeats (SSRs were identified, and 12,830 pairs of primers were developed for 2,522 SSR-containing unigenes. Additionally, the glucosinolate (GSL and phytochelatin metabolic pathways, which give the characteristic flavor and the heavy metal tolerance of this plant, were intensively analyzed. The genes of aliphatic GSLs pathway were predominantly expressed in roots. The absence of aliphatic GSLs in leaf tissues was due to the shutdown of BCAT4, MAM1 and CYP79F1 expressions. Glutathione was
Full Text Available Aromatic grasses of the genus Cymbopogon (Poaceae family represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavour, fragrance, cosmetic and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step towards understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases (TPS, pyrophosphatases (PPase, alcohol dehydrogenases (ADH, aldo-keto reductases (AKR, carotenoid cleavage dioxygenases (CCD, alcohol acetyltransferases (AAT and aldehyde dehydrogenases (ALDH, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes and acetates. Molecular modeling and docking further supported the role of identified enzymes in aroma formation in Cymbopogon. Also, simple sequence repeats (SSRs were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.
Meena, Seema; Kumar, Sarma R.; Venkata Rao, D. K.; Dwivedi, Varun; Shilpashree, H. B.; Rastogi, Shubhra; Shasany, Ajit K.; Nagegowda, Dinesh A.
Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768
Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A
Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.
Clark, Melody S; Thorne, Michael A S
454 RNA-Seq transcriptome data were generated from foot tissue of the Antarctic brooding gastropod mollusc Margarella antarctica. A total of 6195 contigs were assembled de novo, providing a useful resource for researchers with an interest in Antarctic marine species, phylogenetics and mollusc biology, especially shell production. Copyright © 2015 Elsevier B.V. All rights reserved.
Full Text Available Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS. We further identified 1663 differentially-expressed genes (DEGs between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO, using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including “oxidation reduction”, “transcription factor activity”, and “ion channel transporter”. Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress.
Full Text Available Low temperature is a major abiotic stress that impedes plant growth and development. Brassica juncea is an economically important oil seed crop and is sensitive to freezing stress during pod filling subsequently leading to abortion of seeds. To understand the cold stress mediated global perturbations in gene expression, whole transcriptome of B. juncea siliques that were exposed to sub-optimal temperature was sequenced. Manually self-pollinated siliques at different stages of development were subjected to either short (6 h or long (12 h durations of chilling stress followed by construction of RNA-seq libraries and deep sequencing using Illumina’s NGS platform. De-novo assembly of B. juncea transcriptome resulted in 133641 transcripts, whose combined length was 117 Mb and N50 value was 1428 bp. We identified 13342 differentially regulated transcripts by pair-wise comparison of 18 transcriptome libraries. Hierarchical clustering of these differentially expressed transcripts along with Spearman correlation analysis identified two major clusters representing early (5-15 DAP and late stages (20-30 DAP of silique development. Detailed analysis led to the discovery of two gene expression clusters whose transcripts were inducible at both durations of the cold stress irrespective of the developmental stages. We further explored the expression patterns of gene families encoding for transcription factors (TFs, transcription regulators (TRs and kinases, and found that cold stress induced protein kinases specifically during early silique development. We validated the digital gene expression profiles of selected transcripts by qPCR and found a high degree of concordance between the two analyses. To our knowledge this is the first report of transcriptome sequencing of cold-stressed B. juncea siliques. The data generated in this study would be a valuable resource for not only understanding the cold stress signaling pathway but also for introducing cold
Kisand, Veljo; Lettieri, Teresa
De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize
Zhu, Fengjiao; Yang, Zongying; Zhang, Yiliu; Hu, Kun; Fang, Wenhong
Enrofloxacin is the most commonly used antibiotic to control diseases in aquatic animals caused by A. hydrophila. This study conducted de novo transcriptome sequencing and compared the global transcriptomes of enrofloxacin-resistant and enrofloxacin-susceptible strains. We got a total of 4,714 unigenes were assembled. Of these, 4,122 were annotated. A total of 3,280 unigenes were assigned to GO, 3,388 unigenes were classified into Cluster of Orthologous Groups of proteins (COG) using BLAST an...
Background Planarian flatworms can regenerate their head, including a functional brain, within less than a week. Despite the enormous potential of these animals for medical research and regenerative medicine, the mechanisms of regeneration and the molecules involved remain largely unknown. Results To identify genes that are differentially expressed during early stages of planarian head regeneration, we generated a de novo transcriptome assembly from more than 300 million paired-end reads from planarian fragments regenerating the head at 16 different time points. The assembly yielded 26,018 putative transcripts, including very long transcripts spanning multiple genomic supercontigs, and thousands of isoforms. Using short-read data from two platforms, we analyzed dynamic gene regulation during the first three days of head regeneration. We identified at least five different temporal synexpression classes, including genes specifically induced within a few hours after injury. Furthermore, we characterized the role of a conserved Runx transcription factor, smed-runt-like1. RNA interference (RNAi) knockdown and immunofluorescence analysis of the regenerating visual system indicated that smed-runt-like1 encodes a transcriptional regulator of eye morphology and photoreceptor patterning. Conclusions Transcriptome sequencing of short reads allowed for the simultaneous de novo assembly and differential expression analysis of transcripts, demonstrating highly dynamic regulation during head regeneration in planarians. PMID:21846378
Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas
Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .
Full Text Available Abstract Background Transcriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants. One issue for both transcriptome assembly and differential gene expression analyses is the common occurrence in plants of hybridisation and whole genome duplication (WGD and hybridization resulting in allopolyploidy. The divergence of duplicated genes following WGD creates near identical homeologues that can be problematic for de novo assembly and also reference based assembly protocols that use short reads (35 - 100 bp. Results Here we report a successful strategy for the assembly of two transcriptomes made using 75 bp Illumina reads from Pachycladon fastigiatum and Pachycladon cheesemanii. Both are allopolyploid plant species (2n = 20 that originated in the New Zealand Alps about 0.8 million years ago. In a systematic analysis of 19 different coverage cutoffs and 20 different k-mer sizes we showed that i none of the genes could be assembled across all of the parameter space ii assembly of each gene required an optimal set of parameter values and iii these parameter values could be explained in part by different gene expression levels and different degrees of similarity between genes. Conclusions To obtain optimal transcriptome assemblies for allopolyploid plants, k-mer size and k-mer coverage need to be considered simultaneously across a broad parameter space. This is important for assembling a maximum number of full length ESTs and for avoiding chimeric assemblies of homeologous and paralogous gene copies.
Zhang, Xia; Song, Zhenqiao; Liu, Tian; Guo, Linlin; Li, Xingfeng
Toona sinensis Roem is a popular leafy vegetable in Chinese cuisine and is also used as a traditional Chinese medicine. In this study, leaf samples were collected from the same plant on two development stages and then used for high-throughput Illumina RNA-sequencing (RNA-Seq). 125,884 transcripts and 54,628 unigenes were obtained through de novo assembly. A total of 25,570 could be annotated with known biological functions, which indicated that the T. sinensis leaves and shoots were undergoing multiple developmental processes especially for active metabolic processes. Analysis of differentially expressed unigenes between the two libraries showed that the lysine biosynthesis was an enriched KEGG pathway, and candidate genes involved in the lysine biosynthesis pathway in T. sinensis leaves and shoots were identified. Our results provide a primary analysis of the gene expression files of T. sinensis leaf and shoot on different development stages and afford a valuable resource for genetic and genomic research on plant lysine biosynthesis.
Full Text Available Wheat (Triticum aestivum L., one of the world's most important food crops, is a strictly autogamous (self-pollinating species with exclusively perfect flowers. Male sterility induced by chemical hybridizing agents has increasingly attracted attention as a tool for hybrid seed production in wheat; however, the molecular mechanisms of male sterility induced by the agent SQ-1 remain poorly understood due to limited whole transcriptome data. Therefore, a comparative analysis of wheat anther transcriptomes for male fertile wheat and SQ-1-induced male sterile wheat was carried out using next-generation sequencing technology. In all, 42,634,123 sequence reads were generated and were assembled into 82,356 high-quality unigenes with an average length of 724 bp. Of these, 1,088 unigenes were significantly differentially expressed in the fertile and sterile wheat anthers, including 643 up-regulated unigenes and 445 down-regulated unigenes. The differentially expressed unigenes with functional annotations were mapped onto 60 pathways using the Kyoto Encyclopedia of Genes and Genomes database. They were mainly involved in coding for the components of ribosomes, photosynthesis, respiration, purine and pyrimidine metabolism, amino acid metabolism, glutathione metabolism, RNA transport and signal transduction, reactive oxygen species metabolism, mRNA surveillance pathways, protein processing in the endoplasmic reticulum, protein export, and ubiquitin-mediated proteolysis. This study is the first to provide a systematic overview comparing wheat anther transcriptomes of male fertile wheat with those of SQ-1-induced male sterile wheat and is a valuable source of data for future research in SQ-1-induced wheat male sterility.
Full Text Available BACKGROUND: The use of cytoplasmic male sterility (CMS in F1 hybrid seed production of chili pepper is increasingly popular. However, the molecular mechanisms of cytoplasmic male sterility and fertility restoration remain poorly understood due to limited transcriptomic and genomic data. Therefore, we analyzed the difference between a CMS line 121A and its near-isogenic restorer line 121C in transcriptome level using next generation sequencing technology (NGS, aiming to find out critical genes and pathways associated with the male sterility. RESULTS: We generated approximately 53 million sequencing reads and assembled de novo, yielding 85,144 high quality unigenes with an average length of 643 bp. Among these unigenes, 27,191 were identified as putative homologs of annotated sequences in the public protein databases, 4,326 and 7,061 unigenes were found to be highly abundant in lines 121A and 121C, respectively. Many of the differentially expressed unigenes represent a set of potential candidate genes associated with the formation or abortion of pollen. CONCLUSIONS: Our study profiled anther transcriptomes of a chili pepper CMS line and its restorer line. The results shed the lights on the occurrence and recovery of the disturbances in nuclear-mitochondrial interaction and provide clues for further investigations.
Tamvacakis, Arianna N.; Senatore, Adriano; Katz, Paul S.
The sea slug "Hermissenda crassicornis" (Mollusca, Gastropoda, Nudibranchia) has been studied extensively in associative learning paradigms. However, lack of genetic information previously hindered molecular-level investigations. Here, the "Hermissenda" brain transcriptome was sequenced and assembled de novo, producing 165,743…
Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D
Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to
Feldmesser, Ester; Rosenwasser, Shilo; Vardi, Assaf; Ben-Dor, Shifra
Background The advent of Next Generation Sequencing technologies and corresponding bioinformatics tools allows the definition of transcriptomes in non-model organisms. Non-model organisms are of great ecological and biotechnological significance, and consequently the understanding of their unique metabolic pathways is essential. Several methods that integrate de novo assembly with genome-based assembly have been proposed. Yet, there are many open challenges in defining genes, particularly whe...
Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Full Text Available Isaria cateniannulata is a very important and virulent entomopathogenic fungus that infects many insect pest species. Although I. cateniannulata is commonly exposed to extreme environmental temperature conditions, little is known about its molecular response mechanism to temperature stress. Here, we sequenced and de novo assembled the transcriptome of I. cateniannulata in response to high and low temperature stresses using Illumina RNA-Seq technology. Our assembly encompassed 17,514 unigenes (mean length = 1,197 bp, in which 11,445 unigenes (65.34% showed significant similarities to known sequences in NCBI non-redundant protein sequences (Nr database. Using digital gene expression analysis, 4,483 differentially expressed genes (DEGs were identified after heat treatment, including 2,905 up-regulated genes and 1,578 down-regulated genes. Under cold stress, 1,927 DEGs were identified, including 1,245 up-regulated genes and 682 down-regulated genes. The expression patterns of 18 randomly selected candidate DEGs resulting from quantitative real-time PCR (qRT-PCR were consistent with their transcriptome analysis results. Although DEGs were involved in many pathways, we focused on the genes that were involved in endocytosis: In heat stress, the pathway of clathrin-dependent endocytosis (CDE was active; however at low temperature stresses, the pathway of clathrin-independent endocytosis (CIE was active. Besides, four categories of DEGs acting as temperature sensors were observed, including cell-wall-major-components-metabolism-related (CWMCMR genes, heat shock protein (Hsp genes, intracellular-compatible-solutes-metabolism-related (ICSMR genes and glutathione S-transferase (GST. These results enhance our understanding of the molecular mechanisms of I. cateniannulata in response to temperature stresses and provide a valuable resource for the future investigations.
Niu, Donghong; Wang, Fei; Xie, Shumei; Sun, Fanyue; Wang, Ze; Peng, Maoxiao; Li, Jiale
The razor clam Sinonovacula constricta is an important commercial species. The deficiency of developmental transcriptomic data is becoming the bottleneck of further researches on the mechanisms underlying settlement and metamorphosis in early development. In this study, de novo transcriptome sequencing was performed for S. constricta at different early developmental stages by using Illumina HiSeq 2000 paired-end (PE) sequencing technology. A total of 112,209,077 PE clean reads were generated. De novo assembly generated 249,795 contigs with an average length of 585 bp. Gene annotation resulted in the identification of 22,870 unigene hits against the NCBI database. Eight unique sequences related to metamorphosis were identified and analyzed using real-time PCR. The razor clam reference transcriptome would provide useful information on early developmental and metamorphosis mechanisms and could be used in the genetic breeding of shellfish.
Gazara, Rajesh K; Cardoso, Christiane; Bellieny-Rabelo, Daniel; Ferreira, Clélia; Terra, Walter R; Venancio, Thiago M
Despite the great morphological diversity of insects, there is a regularity in their digestive functions, which is apparently related to their physiology. In the present work we report the de novo midgut transcriptomes of four non-model insects from four distinct orders: Spodoptera frugiperda (Lepidoptera), Musca domestica (Diptera), Tenebrio molitor (Coleoptera) and Dysdercus peruvianus (Hemiptera). We employed a computational strategy to merge assemblies obtained with two different algorithms, which substantially increased the quality of the final transcriptomes. Unigenes were annotated and analyzed using the eggNOG database, which allowed us to assign some level of functional and evolutionary information to 79.7% to 93.1% of the transcriptomes. We found interesting transcriptional patterns, such as: i) the intense use of lysozymes in digestive functions of M. domestica larvae, which are streamlined and adapted to feed on bacteria; ii) the up-regulation of orthologous UDP-glycosyl transferase and cytochrome P450 genes in the whole midguts different species, supporting the existence of an ancient defense frontline to counter xenobiotics; iii) evidence supporting roles for juvenile hormone binding proteins in the midgut physiology, probably as a way to activate genes that help fight anti-nutritional substances (e.g. protease inhibitors). The results presented here shed light on the digestive and structural properties of the digestive systems of these distantly related species. Furthermore, the produced datasets will also be useful for scientists studying these insects. Copyright © 2017. Published by Elsevier B.V.
Colbourne John K
Full Text Available Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing using 454, as well as strategies for assembling a useful catalog of genes from the output. We have applied these methods to sequence the transcriptome of planulae larvae from the coral Acropora millepora. Results More than 600,000 reads produced in a single 454 sequencing run were assembled into ~40,000 contigs with five-fold average sequencing coverage. Based on sequence similarity with known proteins, these analyses identified ~11,000 different genes expressed in a range of conditions including thermal stress and settlement induction. Assembled sequences were annotated with gene names, conserved domains, and Gene Ontology terms. Targeted searches using these annotations identified the majority of genes associated with essential metabolic pathways and conserved signaling pathways, as well as novel candidate genes for stress-related processes. Comparisons with the genome of the anemone Nematostella vectensis revealed ~8,500 pairs of orthologs and ~100 candidate coral-specific genes. More than 30,000 SNPs were detected in the coral sequences, and a subset of these validated by re-sequencing. Conclusion The methods described here for deep sequencing of the transcriptome should be widely applicable to generate catalogs of genes and genetic markers in emerging model organisms. Our data provide the most comprehensive sequence resource currently available for reef-building corals, and include an extensive collection of potential genetic markers for association and
Li, Guoxi; Zhao, Yinli; Liu, Zhonghu; Gao, Chunsheng; Yan, Fengbin; Liu, Bianzhi; Feng, Jianxin
Common carp (Cyprinus carpio) is one of the most important aquacultured species of the family Cyprinidae, and breeding this species for disease resistance is becoming more and more important. However, at the genome or transcriptome levels, study of the immunogenetics of disease resistance in the common carp is lacking. In this study, 60,316,906 and 75,200,328 paired-end clean reads were obtained from two cDNA libraries of the common carp spleen by Illumina paired-end sequencing technology. Totally, 130,293 unique transcript fragments (unigenes) were assembled, with an average length of 1400.57 bp. Approximately 105,612 (81.06%) unigenes could be annotated according to their homology with matches in the Nr, Nt, Swiss-Prot, COG, GO, or KEGG databases, and they were found to represent 46,747 non-redundant genes. Comparative analysis showed that 59.82% of the unigenes have significant similarity to zebrafish Refseq proteins. Gene expression comparison revealed that 10,432 and 6889 annotated unigenes were, respectively, up- and down-regulated with at least twofold changes between two developmental stages of the common carp spleen. Gene ontology and KEGG analysis were performed to classify all unigenes into functional categories for understanding gene functions and regulation pathways. In addition, 46,847 simple sequence repeats (SSRs) were detected from 35,618 unigenes, and a large number of single nucleotide polymorphism (SNP) and insertion/deletion (INDEL) sites were identified in the spleen transcriptome of common carp. This study has characterized the spleen transcriptome of the common carp for the first time, providing a valuable resource for a better understanding of the common carp immune system and defense mechanisms. This knowledge will also facilitate future functional studies on common carp immunogenetics that may eventually be applied in breeding programs. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun
The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.
Full Text Available Upland cotton ( L. has a narrow germplasm base, which constrains marker development and hampers intraspecific breeding. A pressing need exists for high-throughput single nucleotide polymorphism (SNP markers that can be readily applied to germplasm in breeding and breeding-related research programs. Despite progress made in developing new sequencing technologies during the past decade, the cost of sequencing remains substantial when one is dealing with numerous samples and large genomes. Several strategies have been proposed to lower the cost of sequencing for multiple genotypes of large-genome species like cotton, such as transcriptome sequencing and reduced-representation DNA sequencing. This paper reports the development of a transcriptome assembly of the inbred line Texas Marker-1 (TM-1, a genetic standard for cotton, its usefulness as a reference for RNA sequencing (RNA-seq-based SNP identification, and the availability of transcriptome sequences of four other cotton cultivars. An assembly of TM-1 was made using Roche 454 transcriptome reads combined with an assembly of all available public expressed sequence tag (EST sequences of TM-1. The TM-1 assembly consists of 72,450 contigs with a total of 70 million bp. Functional predictions of the transcripts were estimated by alignment to selected protein databases. Transcriptome sequences of the five lines, including TM-1, were obtained using an Illumina Genome Analyzer-II, and the short reads were mapped to the TM-1 assembly to discover SNPs among the five lines. We identified >14,000 unfiltered allelic SNPs, of which ∼3,700 SNPs were retained for assay development after applying several rigorous filters. This paper reports availability of the reference transcriptome assembly and shows its utility in developing intraspecific SNP markers in upland cotton.
Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...
De novo transcriptome sequencing and digital gene expression analysis predict biosynthetic pathway of rhynchophylline and isorhynchophylline from Uncaria rhynchophylla, a non-model plant with potent anti-alzheimer's properties.
Guo, Qianqian; Ma, Xiaojun; Wei, Shugen; Qiu, Deyou; Wilson, Iain W; Wu, Peng; Tang, Qi; Liu, Lijun; Dong, Shoukun; Zu, Wei
The major medicinal alkaloids isolated from Uncaria rhynchophylla (gouteng in chinese) capsules are rhynchophylline (RIN) and isorhynchophylline (IRN). Extracts containing these terpene indole alkaloids (TIAs) can inhibit the formation and destabilize preformed fibrils of amyloid β protein (a pathological marker of Alzheimer's disease), and have been shown to improve the cognitive function of mice with Alzheimer-like symptoms. The biosynthetic pathways of RIN and IRN are largely unknown. In this study, RNA-sequencing of pooled Uncaria capsules RNA samples taken at three developmental stages that accumulate different amount of RIN and IRN was performed. More than 50 million high-quality reads from a cDNA library were generated and de novo assembled. Sequences for all of the known enzymes involved in TIAs synthesis were identified. Additionally, 193 cytochrome P450 (CYP450), 280 methyltransferase and 144 isomerase genes were identified, that are potential candidates for enzymes involved in RIN and IRN synthesis. Digital gene expression profile (DGE) analysis was performed on the three capsule developmental stages, and based on genes possessing expression profiles consistent with RIN and IRN levels; four CYP450s, three methyltransferases and three isomerases were identified as the candidates most likely to be involved in the later steps of RIN and IRN biosynthesis. A combination of de novo transcriptome assembly and DGE analysis was shown to be a powerful method for identifying genes encoding enzymes potentially involved in the biosynthesis of important secondary metabolites in a non-model plant. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the capsule extract from Uncaria, and provides information that may aid in metabolic engineering to increase yields of these important alkaloids.
Garzón-Martínez Gina A
Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the
Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo
Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S
Kenkel, Carly D; Bay, Line K
Transcriptomic resources for coral species can provide insight into coral evolutionary history and stress-response physiology. Goniopora columna, Galaxea astreata, and Galaxea acrhelia are scleractinian corals of the Indo-Pacific, representing a diversity of morphologies and life-history traits. G. columna and G. astreata are common and cosmopolitan, while G. acrhelia is largely restricted to the coral triangle and Great Barrier Reef. Reference transcriptomes for these species were assembled from replicate colony fragments exposed to elevated (31°C) and ambient (27°C) temperatures. Trinity was used to create de novo assemblies for each species from 92-102 million raw Illumina Hiseq 2 × 150 bp reads. Host-specific assemblies contained 65 460-72 405 contigs, representing 26 693-37 894 isogroups (∼genes) with an average N50 of 2254. Gene name and/or gene ontology annotations were possible for 58% of isogroups on average. Transcriptomes contained 93.1-94.3% of EuKaryotic Orthologous Groups comprising the core eukaryotic gene set, and 89.98-91.92% of the single-copy metazoan core gene set orthologs were complete, indicating fairly comprehensive assemblies. This work expands the complement of transcriptomic resources available for scleractinian coral species, including the first reference for a representative of Goniopora spp. as well as species with novel morphology. © The Authors 2017. Published by Oxford University Press.
Full Text Available A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinumTranscriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201, comprising 46,369 transcript assembly contigs (TACs has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8% of the TACs and gene ontology assignments were determined for 21,471 (46.3%. The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs and intron spanning regions (ISRs for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding
Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue
genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....
Mikheenko, Alla; Valin, Gleb; Prjibelski, Andrey; Saveliev, Vladislav; Gurevich, Alexey
: Data visualization plays an increasingly important role in NGS data analysis. With advances in both sequencing and computational technologies, it has become a new bottleneck in genomics studies. Indeed, evaluation of de novo genome assemblies is one of the areas that can benefit from the visualization. However, even though multiple quality assessment methods are now available, existing visualization tools are hardly suitable for this purpose. Here, we present Icarus-a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on the tool QUAST. Icarus can be used in studies where a related reference genome is available, as well as for non-model organisms. The tool is available online and as a standalone application. http://cab.spbu.ru/software/icarus CONTACT: email@example.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Gross, Stephen M.; Martin, Jeffrey A.; Simpson, June; Wang, Zhong; Visel, Axel
Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development.
Yang, Wei; Chen, Huapu; Cui, Xuefan; Zhang, Kewei; Jiang, Dongneng; Deng, Siping; Zhu, Chunhua; Li, Guangli
Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Full Text Available The Chinese green mussel, Perna viridis, is a marine bivalve with important economic values as well as biomonitoring roles for aquatic pollution. Byssus, secreted by the foot gland, has been proved to bind heavy metals effectively. In this study, using the RNA sequencing technology, we performed comparative transcriptomic analysis on the mussel feet with or without inducing by cadmium (Cd. Our current work is aiming at providing insights into the molecular mechanisms of byssus binding to heavy metal ions. The transcriptome sequencing generated a total of 26.13-Gb raw data. After a careful assembly of clean data, we obtained a primary set of 105,127 unigenes, in which 32,268 unigenes were annotated. Based on the expression profiles, we identified 9,048 differentially expressed genes (DEGs between Cd treatment (50 or 100 μg/L at 48 h and the control, suggesting an extensive transcriptome response of the mussels during the Cd stimulation. Moreover, we observed that the expression levels of 54 byssus protein coding genes increased significantly after the 48-h Cd stimulation. In addition, 16 critical byssus protein coding genes were picked for profiling by quantitative real-time PCR (qRT-PCR. Finally, we reached a primary conclusion that high content of tyrosine (Tyr, cysteine (Cys, histidine (His residues or the special motif plays an important role in the accumulation of heavy metals in byssus. We also proposed an interesting model for the confirmed byssal Cd accumulation, in which biosynthesis of byssus proteins may play simultaneously critical roles since their transcription levels were significantly elevated.
Long, Yong; Li, Qing; Zhou, Bolan; Song, Guili; Li, Tao; Cui, Zongbin
Fish skin serves as the first line of defense against a wide variety of chemical, physical and biological stressors. Secretion of mucus is among the most prominent characteristics of fish skin and numerous innate immune factors have been identified in the epidermal mucus. However, molecular mechanisms underlying the mucus secretion and immune activities of fish skin remain largely unclear due to the lack of genomic and transcriptomic data for most economically important fish species. In this study, we characterized the skin transcriptome of mud loach using Illumia paired-end sequencing. A total of 40364 unigenes were assembled from 86.6 million (3.07 gigabases) filtered reads. The mean length, N50 size and maximum length of assembled transcripts were 387, 611 and 8670 bp, respectively. A total of 17336 (43.76%) unigenes were annotated by blast searches against the NCBI non-redundant protein database. Gene ontology mapping assigned a total of 108513 GO terms to 15369 (38.08%) unigenes. KEGG orthology mapping annotated 9337 (23.23%) unigenes. Among the identified KO categories, immune system is the largest category that contains various components of multiple immune pathways such as chemokine signaling, leukocyte transendothelial migration and T cell receptor signaling, suggesting the complexity of immune mechanisms in fish skin. As for mucin biosynthesis, 37 unigenes were mapped to 7 enzymes of the mucin type O-glycan biosynthesis pathway and 8 members of the polypeptide N-acetylgalactosaminyltransferase family were identified. Additionally, 38 unigenes were mapped to 23 factors of the SNARE interactions in vesicular transport pathway, indicating that the activity of this pathway is required for the processes of epidermal mucus storage and release. Moreover, 1754 simple sequence repeats (SSRs) were detected in 1564 unigenes and dinucleotide repeats represented the most abundant type. These findings have laid the foundation for further understanding the secretary
Wang, Ying; Ding, Jia-tong; Yang, Hai-ming; Yan, Zheng-jie; Cao, Wei; Li, Yang-bai
Monochromatic light is widely applied to promote poultry reproductive performance, yet little is currently known regarding the mechanism by which light wavelengths affect pigeon reproduction. Recently, high-throughput sequencing technologies have been used to provide genomic information for solving this problem. In this study, we employed Illumina Hiseq 2000 to identify differentially expressed genes in ovary tissue from pigeons under blue and white light conditions and de novo transcriptome assembly to construct a comprehensive sequence database containing information on the mechanisms of follicle development. A total of 157,774 unigenes (mean length: 790 bp) were obtained by the Trinity program, and 35.83% of these unigenes were matched to genes in a non-redundant protein database. Gene description, gene ontology, and the clustering of orthologous group terms were performed to annotate the transcriptome assembly. Differentially expressed genes between blue and white light conditions included those related to oocyte maturation, hormone biosynthesis, and circadian rhythm. Furthermore, 17,574 SSRs and 533,887 potential SNPs were identified in this transcriptome assembly. This work is the first transcriptome analysis of the Columba ovary using Illumina technology, and the resulting transcriptome and differentially expressed gene data can facilitate further investigations into the molecular mechanism of the effect of blue light on follicle development and reproduction in pigeons and other bird species. PMID:26599806
Full Text Available Monochromatic light is widely applied to promote poultry reproductive performance, yet little is currently known regarding the mechanism by which light wavelengths affect pigeon reproduction. Recently, high-throughput sequencing technologies have been used to provide genomic information for solving this problem. In this study, we employed Illumina Hiseq 2000 to identify differentially expressed genes in ovary tissue from pigeons under blue and white light conditions and de novo transcriptome assembly to construct a comprehensive sequence database containing information on the mechanisms of follicle development. A total of 157,774 unigenes (mean length: 790 bp were obtained by the Trinity program, and 35.83% of these unigenes were matched to genes in a non-redundant protein database. Gene description, gene ontology, and the clustering of orthologous group terms were performed to annotate the transcriptome assembly. Differentially expressed genes between blue and white light conditions included those related to oocyte maturation, hormone biosynthesis, and circadian rhythm. Furthermore, 17,574 SSRs and 533,887 potential SNPs were identified in this transcriptome assembly. This work is the first transcriptome analysis of the Columba ovary using Illumina technology, and the resulting transcriptome and differentially expressed gene data can facilitate further investigations into the molecular mechanism of the effect of blue light on follicle development and reproduction in pigeons and other bird species.
Duan, Jinjie; Sanggaard, Kristian Wejse; Schauser, Leif; Lauridsen, Sanne Enok; Enghild, Jan J.; Schierup, Mikkel Heide; Wang, Tobias
Abstract Exceptional and extreme feeding behaviour makes the Burmese python (Python bivittatus) an interesting model to study physiological remodelling and metabolic adaptation in response to refeeding after prolonged starvation. In this study, we used transcriptome sequencing of 5 visceral organs during fasting as well as 24 hours and 48 hours after ingestion of a large meal to unravel the postprandial changes in Burmese pythons. We first used the pooled data to perform a de novo assembly of...
Cribbin, Kayla M; Quackenbush, Corey R; Taylor, Kyle; Arias-Rodriguez, Lenin; Kelley, Joanna L
The tropical gar (Atractosteus tropicus) is the southernmost species of the seven extant species of gar fishes in the world. In Mexico and Central America, the species is an important food source due to its nutritional quality and low price. Despite its regional importance and increasing concerns about overexploitation and habitat degradation, basic genetic information on the tropical gar is lacking. Determining genetic information on the tropical gar is important for the sustainable management of wild populations, implementation of best practices in aquaculture settings, evolutionary studies of ancient lineages, and an understanding of sex-specific gene expression. In this study, the transcriptome of the tropical gar was sequenced and assembled de novo using tissues from three males and three females using Illumina sequencing technology. Sex-specific and highly differentially expressed transcripts in brain and muscle tissues between adult males and females were subsequently identified. The transcriptome was assembled de novo resulting in 80,611 transcripts with a contig N50 of 3,355 base pairs and over 168 kilobases in total length. Male muscle, brain, and gonad as well as female muscle and brain were included in the assembly. The assembled transcriptome was annotated to identify the putative function of expressed transcripts using Trinotate and SwissProt, a database of well-annotated proteins. The brain and muscle datasets were then aligned to the assembled transcriptome to identify transcripts that were differentially expressed between males and females. The contrast between male and female brain identified 109 transcripts from 106 genes that were significantly differentially expressed. In the muscle comparison, 82 transcripts from 80 genes were identified with evidence for significant differential expression. Almost all genes identified as differentially expressed were sex-specific. The differentially expressed transcripts were enriched for genes involved in
Gao, Bei; Zhang, Daoyuan; Li, Xiaoshuang; Yang, Honglan; Zhang, Yuanming; Wood, Andrew J
The desiccation-tolerant moss Bryum argenteum is an important component of the Biological Soil Crusts (BSCs) found in the Gurbantunggut desert. Desiccation tolerance is defined as the ability to revive from the air dried state. To elucidate the molecular mechanisms related to desiccation tolerance, we employed RNA-Seq and digital gene expression (DGE) technologies to study the genome-wide expression profiles of the dehydration and rehydration processes in this important desert plant. We applied a two-step approach to investigate the gene expression profile upon rehydration in the moss Bryum argenteum using Illumina HiSeq2000 sequencing platform. First, a total of 57,247 transcript assembly contigs (TACs) were obtained from 54.79 million reads by de novo assembly, with an average length of 863 bp and N50 of 1,372 bp. Among the reconstructed TACs, 36,916 (64.5%) revealed similarity with existing protein sequences in the public databases. 23,509 and 21,607 TACs were assigned GO and KEGG annotation information, respectively. Second, samples were taken from 3 hydration stages: desiccated (Dry), rehydrated 2 h (R2) and rehydrated 24 h (R24), and DEG libraries were constructed for Differentially Expressed Genes (DEGs) discovery. 4,081 and 6,709 DEGs were identified in R2 and R24, compared with Dry, respectively. Compared to the desiccated sample, up-regulated genes after two hours of hydration are primarily related to stress responses. GO function enrichment network, EKGG metabolic pathway and MapMan analysis supports the idea of the rapid recovery of photosynthesis after 24 h of rehydration. We identified 770 transcription factors (TFs) which were classified into 50 TF families. 142 TF transcripts were up-regulated upon rehydration including 23 members of the ERF family. In this study, we constructed a pioneering, high-quality reference transcriptome in B. argenteum and generated three DGE libraries to elucidate the changes of gene expression upon rehydration. Expression
Full Text Available Japanese flounder (Paralichthys olivaceus is an economically important marine fish in Asia and has suffered from disease outbreaks caused by various pathogens, which requires more information for immune relevant genes on genome background. However, genomic and transcriptomic data for Japanese flounder remain scarce, which limits studies on the immune system of this species. In this study, we characterized the Japanese flounder spleen transcriptome using an Illumina paired-end sequencing platform to identify putative genes involved in immunity.A cDNA library from the spleen of P. olivaceus was constructed and randomly sequenced using an Illumina technique. The removal of low quality reads generated 12,196,968 trimmed reads, which assembled into 96,627 unigenes. A total of 21,391 unigenes (22.14% were annotated in the NCBI Nr database, and only 1.1% of the BLASTx top-hits matched P. olivaceus protein sequences. Approximately 12,503 (58.45% unigenes were categorized into three Gene Ontology groups, 19,547 (91.38% were classified into 26 Cluster of Orthologous Groups, and 10,649 (49.78% were assigned to six Kyoto Encyclopedia of Genes and Genomes pathways. Furthermore, 40,928 putative simple sequence repeats and 47, 362 putative single nucleotide polymorphisms were identified. Importantly, we identified 1,563 putative immune-associated unigenes that mapped to 15 immune signaling pathways.The P. olivaceus transciptome data provides a rich source to discover and identify new genes, and the immune-relevant sequences identified here will facilitate our understanding of the mechanisms involved in the immune response. Furthermore, the plentiful potential SSRs and SNPs found in this study are important resources with respect to future development of a linkage map or marker assisted breeding programs for the flounder.
Full Text Available Abstract Background Human Artificial Chromosomes (HACs are potentially useful vectors for gene transfer studies and for functional annotation of the genome because of their suitability for cloning, manipulating and transferring large segments of the genome. However, development of HACs for the transfer of large genomic loci into mammalian cells has been limited by difficulties in manipulating high-molecular weight DNA, as well as by the low overall frequencies of de novo HAC formation. Indeed, to date, only a small number of large (>100 kb genomic loci have been reported to be successfully packaged into de novo HACs. Results We have developed novel methodologies to enable efficient assembly of HAC vectors containing any genomic locus of interest. We report here the creation of a novel, bimolecular system based on bacterial artificial chromosomes (BACs for the construction of HACs incorporating any defined genomic region. We have utilized this vector system to rapidly design, construct and validate multiple de novo HACs containing large (100–200 kb genomic loci including therapeutically significant genes for human growth hormone (HGH, polycystic kidney disease (PKD1 and ß-globin. We report significant differences in the ability of different genomic loci to support de novo HAC formation, suggesting possible effects of cis-acting genomic elements. Finally, as a proof of principle, we have observed sustained ß-globin gene expression from HACs incorporating the entire 200 kb ß-globin genomic locus for over 90 days in the absence of selection. Conclusion Taken together, these results are significant for the development of HAC vector technology, as they enable high-throughput assembly and functional validation of HACs containing any large genomic locus. We have evaluated the impact of different genomic loci on the frequency of HAC formation and identified segments of genomic DNA that appear to facilitate de novo HAC formation. These genomic loci
Full Text Available Background: Locoweeds (toxic Oxytropis and Astraglus species, containing the toxic agent swainsonine, pose serious threats to animal husbandry on grasslands in both China and the US. Some locoweeds have evolved adaptations in order to resist various stress conditions such as drought, salt and cold. As a result they replace other plants in their communities and become an ecological problem. Currently very limited genetic information of locoweeds is available and this hinders our understanding in the molecular basis of their environmental plasticity, and the interaction between locoweeds and their symbiotic swainsonine producing endophytes. Next-generation sequencing provides a means of obtaining transcriptomic sequences in a timely manner, which is particularly useful for non-model plants. In this study, we performed transcriptome sequencing of Oxytropis ochrocephala plants followed by a de nove assembly. Our primary aim was to provide an enriched pool of genetic sequences of an Oxytropis sp. for further locoweed research. Results: Transcriptomes of four different O. ochrocephala samples, from control (CK plants, and those that had experienced either drought (20% PEG, salt (150 mM NaCl or cold (4 °C stress were sequenced using an Illumina Hiseq 2000 platform. From 232,209,506 clean reads 23,220,950,600 (~23 G nucleotides, 182,430 transcripts and 88,942 unigenes were retrieved, with an N50 value of 1,237. Differential expression analysis revealed putative genes encoding heat shock proteins (HSPs and late embryogenesis abundant (LEA proteins, enzymes in secondary metabolite and plant hormone biosyntheses, and transcription factors which are involved in stress tolerance in O. ochrocephala. In order to validate our sequencing results, we further analyzed the expression profiles of nine genes by quantitative real-time PCR. Finally, we discuss the possible mechanism of O. ochrocephala’s adaptations to stress environment. Conclusion: Our
Full Text Available Abstract Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil
Natarajan, Purushothaman; Parani, Madasamy
Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of
Cerdeira, Louise Teixeira; Carneiro, Adriana Ribeiro; Ramos, Rommel Thiago Jucá; de Almeida, Sintia Silva; D'Afonseca, Vivian; Schneider, Maria Paula Cruz; Baumbach, Jan; Tauch, Andreas; McCulloch, John Anthony; Azevedo, Vasco Ariston Carvalho; Silva, Artur
Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. Copyright © 2011 Elsevier B.V. All rights reserved.
Full Text Available C4 photosynthesis is a carbon-concentrating mechanism that evolved independently more than 60 times in a wide range of angiosperm lineages. Among other alterations, the evolution of C4 from ancestral C3 photosynthesis requires changes in the expression of a vast number of genes. Differential gene expression analyses between closely related C3 and C4 species have significantly increased our understanding of C4 functioning and evolution. In Chenopodiaceae, a family that is rich in C4 origins and photosynthetic types, the anatomy, physiology and phylogeny of C4, C2, and C3 species of Salsoleae has been studied in great detail, which facilitated the choice of six samples of five representative species with different photosynthetic types for transcriptome comparisons. mRNA from assimilating organs of each species was sequenced in triplicates, and sequence reads were de novo assembled. These novel genetic resources were then analyzed to provide a better understanding of differential gene expression between C3, C2 and C4 species. All three analyzed C4 species belong to the NADP-ME type as most genes encoding core enzymes of this C4 cycle are highly expressed. The abundance of photorespiratory transcripts is decreased compared to the C3 and C2 species. Like in other C4 lineages of Caryophyllales, our results suggest that PEPC1 is the C4-specific isoform in Salsoleae. Two recently identified transporters from the PHT4 protein family may not only be related to the C4 syndrome, but also active in C2 photosynthesis in Salsoleae. In the two populations of the C2 species S. divaricata transcript abundance of several C4 genes are slightly increased, however, a C4 cycle is not detectable in the carbon isotope values. Most of the core enzymes of photorespiration are highly increased in the C2 species compared to both C3 and C4 species, confirming a successful establishment of the C2 photosynthetic pathway. Furthermore, a function of PEP-CK in C2 photosynthesis
Wang, Won-Jing; Acehan, Devrim; Kao, Chien-Han; Jane, Wann-Neng; Uryu, Kunihiro; Tsou, Meng-Fu Bryan
Vertebrate centrioles normally propagate through duplication, but in the absence of preexisting centrioles, de novo synthesis can occur. Consistently, centriole formation is thought to strictly rely on self-assembly, involving self-oligomerization of the centriolar protein SAS-6. Here, through reconstitution of de novo synthesis in human cells, we surprisingly found that normal looking centrioles capable of duplication and ciliation can arise in the absence of SAS-6 self-oligomerization. Moreover, whereas canonically duplicated centrioles always form correctly, de novo centrioles are prone to structural errors, even in the presence of SAS-6 self-oligomerization. These results indicate that centriole biogenesis does not strictly depend on SAS-6 self-assembly, and may require preexisting centrioles to ensure structural accuracy, fundamentally deviating from the current paradigm.
Hou, Siyu; Sun, Zhaoxia; Li, Yaoshen; Wang, Yijie; Ling, Hubin; Xing, Guofang; Han, Yuanhuai; Li, Hongying
Proso millet ( Panicum miliaceum ; Poaceae) is a minor crop with good nutritional qualities and strong tolerance to drought stress and soil infertility. However, studies on genetic diversity have been limited due to a lack of efficient genetic markers. Illumina sequencing technology was used to generate short read sequences of proso millet, and de novo transcriptome assemblies were used to develop a de novo assembly of proso millet. Genic simple sequence repeat (SSR) markers were identified and used to detect polymorphism among 56 accessions. Population structure and genetic similarity coefficient were estimated. In total, 25,341 unique gene sequences and 4724 SSR loci were obtained from the transcriptome, of which 229 pairs of SSR primers were validated, which resulted in 14 polymorphic genic SSR primers exhibiting 43 total alleles. According to the ratio of polymorphic markers (6.1%, 14/229), there are potentially 288 polymorphic genic SSR markers available for genetic assay development in the future. Bayesian population analyses showed that the 56 accessions comprised two distinct groups. A genetic structure and cluster assay indicated that the accessions from the Loess Plateau of China shared a high genetic similarity coefficient with those from other regions and that there was no correlation between genetic diversity and geographic origin. The transcriptome sequencing data and millet-specific SSR markers developed in this study establish an excellent resource for gene discovery and may improve the development of breeding programs in proso millet in the future.
Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun
To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ email@example.com or firstname.lastname@example.org; email@example.com Supplementary data are available at Bioinformatics online.
Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941
Full Text Available Fish skin serves as the first line of defense against a wide variety of chemical, physical and biological stressors. Secretion of mucus is among the most prominent characteristics of fish skin and numerous innate immune factors have been identified in the epidermal mucus. However, molecular mechanisms underlying the mucus secretion and immune activities of fish skin remain largely unclear due to the lack of genomic and transcriptomic data for most economically important fish species. In this study, we characterized the skin transcriptome of mud loach using Illumia paired-end sequencing. A total of 40364 unigenes were assembled from 86.6 million (3.07 gigabases filtered reads. The mean length, N50 size and maximum length of assembled transcripts were 387, 611 and 8670 bp, respectively. A total of 17336 (43.76% unigenes were annotated by blast searches against the NCBI non-redundant protein database. Gene ontology mapping assigned a total of 108513 GO terms to 15369 (38.08% unigenes. KEGG orthology mapping annotated 9337 (23.23% unigenes. Among the identified KO categories, immune system is the largest category that contains various components of multiple immune pathways such as chemokine signaling, leukocyte transendothelial migration and T cell receptor signaling, suggesting the complexity of immune mechanisms in fish skin. As for mucin biosynthesis, 37 unigenes were mapped to 7 enzymes of the mucin type O-glycan biosynthesis pathway and 8 members of the polypeptide N-acetylgalactosaminyltransferase family were identified. Additionally, 38 unigenes were mapped to 23 factors of the SNARE interactions in vesicular transport pathway, indicating that the activity of this pathway is required for the processes of epidermal mucus storage and release. Moreover, 1754 simple sequence repeats (SSRs were detected in 1564 unigenes and dinucleotide repeats represented the most abundant type. These findings have laid the foundation for further understanding
Feldmesser, Ester; Rosenwasser, Shilo; Vardi, Assaf; Ben-Dor, Shifra
The advent of Next Generation Sequencing technologies and corresponding bioinformatics tools allows the definition of transcriptomes in non-model organisms. Non-model organisms are of great ecological and biotechnological significance, and consequently the understanding of their unique metabolic pathways is essential. Several methods that integrate de novo assembly with genome-based assembly have been proposed. Yet, there are many open challenges in defining genes, particularly where genomes are not available or incomplete. Despite the large numbers of transcriptome assemblies that have been performed, quality control of the transcript building process, particularly on the protein level, is rarely performed if ever. To test and improve the quality of the automated transcriptome reconstruction, we used manually defined and curated genes, several of them experimentally validated. Several approaches to transcript construction were utilized, based on the available data: a draft genome, high quality RNAseq reads, and ESTs. In order to maximize the contribution of the various data, we integrated methods including de novo and genome based assembly, as well as EST clustering. After each step a set of manually curated genes was used for quality assessment of the transcripts. The interplay between the automated pipeline and the quality control indicated which additional processes were required to improve the transcriptome reconstruction. We discovered that E. huxleyi has a very high percentage of non-canonical splice junctions, and relatively high rates of intron retention, which caused unique issues with the currently available tools. While individual tools missed genes and artificially joined overlapping transcripts, combining the results of several tools improved the completeness and quality considerably. The final collection, created from the integration of several quality control and improvement rounds, was compared to the manually defined set both on the DNA and
Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias provides a molecular tool for biological research and reveals new genes involved in osmoregulation.
Full Text Available The spiny dogfish shark (Squalus acanthias is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary, providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG. To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The
Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation.
Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K; Andreasen, Peter A; Bendixen, Christian; Panitz, Frank
The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome
Full Text Available Black gram [V. mungo (L. Hepper] is an important legume crop extensively grown in south and south-east Asia, where it is a major source of dietary protein for its predominantly vegetarian population. However, lack of genomic information and markers has become a limitation for genetic improvement of this crop. Here, we report the transcriptome sequencing of the immature seeds of black gram cv. TU94-2, by Illumina paired end sequencing technology to generate transcriptome sequences for gene discovery and genic-SSR marker development. A total of 17.2 million paired-end reads were generated and 48,291 transcript contigs (TCS were assembled with an average length of 443 bp. Based on sequence similarity search, 33,766 TCS showed significant similarity to known proteins. Among these, only 29,564 TCS were annotated with gene ontology (GO functional categories. A total number of 138 unique KEGG (Kyoto Encyclopedia of Genes and Genomes pathways were identified, of which majority of TCS are grouped into purine metabolism (678 followed by pyrimidine metabolism (263. A total of 48,291 TCS were searched for SSRs and 1,840 SSRs were identified in 1,572 TCS with an average frequency of one SSR per 11.9 kb. The tri-nucleotide repeats were most abundant (35% followed by di-nucleotide repeats (32%. PCR primer pairs were successfully designed for 933 SSR loci. Sequences analyses indicate that about 64.4% and 35.6% of the SSR motifs were present in the coding sequences (CDS and untranslated regions (UTRs respectively. Tri-nucleotide repeats (57.3% were preferentially present in the CDS. The rate of successful amplification and polymorphism were investigated using selected primers among 18 black gram accessions. Genic-SSR markers developed from the Illumina paired end sequencing of black gram immature seed transcriptome will provide a valuable resource for genetic diversity, evolution, linkage mapping, comparative genomics and marker-assisted selection in black gram.
Souframanien, J.; Reddy, Kandali Sreenivasulu
Black gram [V. mungo (L.) Hepper] is an important legume crop extensively grown in south and south-east Asia, where it is a major source of dietary protein for its predominantly vegetarian population. However, lack of genomic information and markers has become a limitation for genetic improvement of this crop. Here, we report the transcriptome sequencing of the immature seeds of black gram cv. TU94-2, by Illumina paired end sequencing technology to generate transcriptome sequences for gene discovery and genic-SSR marker development. A total of 17.2 million paired-end reads were generated and 48,291 transcript contigs (TCS) were assembled with an average length of 443 bp. Based on sequence similarity search, 33,766 TCS showed significant similarity to known proteins. Among these, only 29,564 TCS were annotated with gene ontology (GO) functional categories. A total number of 138 unique KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were identified, of which majority of TCS are grouped into purine metabolism (678) followed by pyrimidine metabolism (263). A total of 48,291 TCS were searched for SSRs and 1,840 SSRs were identified in 1,572 TCS with an average frequency of one SSR per 11.9 kb. The tri-nucleotide repeats were most abundant (35%) followed by di-nucleotide repeats (32%). PCR primer pairs were successfully designed for 933 SSR loci. Sequences analyses indicate that about 64.4% and 35.6% of the SSR motifs were present in the coding sequences (CDS) and untranslated regions (UTRs) respectively. Tri-nucleotide repeats (57.3%) were preferentially present in the CDS. The rate of successful amplification and polymorphism were investigated using selected primers among 18 black gram accessions. Genic-SSR markers developed from the Illumina paired end sequencing of black gram immature seed transcriptome will provide a valuable resource for genetic diversity, evolution, linkage mapping, comparative genomics and marker-assisted selection in black gram. PMID
Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
Full Text Available Annual bluegrass ( L. is one of the most widespread weed species in this world. As a young allotetraploid, has occupied diverse environments from Antarctic area to subtropical regions. To unveil the evolutionary mystery behind ’s wide distribution, extensive adaptability and phenotypic plasticity needs collaboration from multiple research scopes from ecology and plant physiology to population genetics and molecular biology. However, the lack of omic data and reference has greatly hampered the study. This is the first comprehensive transcriptome study on species. Total RNA was extracted from and its two proposed diploid parents, Schrad and Kunth, and sequenced in Illumina Hiseq2000. Optimized, nonredundant transcriptome references were generated for each species using four de novo assemblers (Trinity, Velvet, SOAPdenovo, and CLC Genomics Workbench and a redundancy-reducing pipeline (CD-HIT-EST and EvidentialGene tr2aacds. Using the constructed transcriptomes together with sequencing reads, we found high similarity in nucleotide sequences and homeologous polymorphisms between and the two proposed parents. Comparison of chloroplast and mitochondrion genes further confirmed as the maternal parent. Less nucleotide percentage differences were observed between and homeologs than between and homeologs, indicating a higher nucleotide substitution rates in homeologs than in homeologs. Gene ontology (GO enrichment analysis suggested the more compatible cytoplasmic environment and cellular apparatus for homeologs as the major cause for this phenomenon.
Wang, W.Z.; Tong, W.S.; Gao, R.
Dryopteris fragrans is a species of fern and contains flavonoids compounds with medicinal value. This study explain the temperature stress impact flavonoids synthesis in D. fragrans tissue culture seedlings under the low temperature at 4 degree C, high temperature at 35 degree C and moderate temperature at 25 degree C. By using Illumina HiSeq 2000 sequencing, 80.9 million raw sequence reads were de novo assembled into 66,716 non-redundant unigenes. 38,486 unigenes (57.7%) were annotated for their function. 13,973 unigenes and 29,598 unigenes were allocated to gene ontology (GO) and clusters of orthologous group (COG), respectively. 18,989 sequences mapped to 118 Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 204 genes were involved in flavonoid biosynthesis, regulation and transport. 25,292 and 16,817 unigenes exhibited marked differential expression in response to temperature shifts of 25 degree C to 4 degree C and 25 degree C to 35 degree C, respectively. 4CL and CHS genes involved in flavonoid biosynthesis were tested and suggested that they were responsible for biosynthesis of flavonoids. This study provides the first published data to describe the D. fragrans transcriptome and should accelerate understanding of flavonoids biosynthesis, regulation and transport mechanisms. Since most unigenes described here were successfully annotated, these results should facilitate future functional genomic understanding and research of D. fragrans. (author)
Full Text Available Abstract Background The lack of sequenced genomes for oleaginous microalgae limits our understanding of the mechanisms these organisms utilize to become enriched in triglycerides. Here we report the de novo transcriptome assembly and quantitative gene expression analysis of the oleaginous microalga Neochloris oleoabundans, with a focus on the complex interaction of pathways associated with the production of the triacylglycerol (TAG biofuel precursor. Results After growth under nitrogen replete and nitrogen limiting conditions, we quantified the cellular content of major biomolecules including total lipids, triacylglycerides, starch, protein, and chlorophyll. Transcribed genes were sequenced, the transcriptome was assembled de novo, and the expression of major functional categories, relevant pathways, and important genes was quantified through the mapping of reads to the transcriptome. Over 87 million, 77 base pair high quality reads were produced on the Illumina HiSeq sequencing platform. Metabolite measurements supported by genes and pathway expression results indicated that under the nitrogen-limiting condition, carbon is partitioned toward triglyceride production, which increased fivefold over the nitrogen-replete control. In addition to the observed overexpression of the fatty acid synthesis pathway, TAG production during nitrogen limitation was bolstered by repression of the β-oxidation pathway, up-regulation of genes encoding for the pyruvate dehydrogenase complex which funnels acetyl-CoA to lipid biosynthesis, activation of the pentose phosphate pathway to supply reducing equivalents to inorganic nitrogen assimilation and fatty acid biosynthesis, and the up-regulation of lipases—presumably to reconstruct cell membranes in order to supply additional fatty acids for TAG biosynthesis. Conclusions Our quantitative transcriptome study reveals a broad overview of how nitrogen stress results in excess TAG production in N. oleoabundans, and
Full Text Available The hard-shelled mussel (Mytilus coruscus has considerably one of the most economically important marine shellfish worldwide and considered as a good invertebrate model for ecotoxicity study for a long time. In the present study, we used Illumina sequencing technology (HiSeq2000 to sequence, assemble and annotate the transcriptome of the hard-shelled mussel which challenged with copper pollution. A total of 21,723,913 paired-end clean reads (NCBI SRA database SRX1411195 were generated from HiSeq2000 sequencer and 96,403 contigs (with N50 = 1118 bp were obtained after de novo assembling with Trinity software. Digital gene expression analysis reveals 1156 unigenes are upregulated and 1681 unigenes are downregulated when challenged with copper. By KEGG pathway enrichment analysis, we found that unigenes in four KEGG pathways (aminoacyl-tRNA biosynthesis, apoptosis, DNA replication and mismatch repair show significant differential expressed between control and copper treated groups. We hope that the gill transcriptome in copper treated hard-shelled mussel can give useful information to understand how mussel handles with heavy metal stress at molecular level. Keywords: Hard-shelled mussel, Heavy metal, Transcriptome, Ecotoxicity
Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M
In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.
Uriel Alonso Hurtado Páez
Full Text Available Natural rubber (Hevea brasiliensis is a tropical tree used commercially for the production of latex, from which 40,000 products are generated. The fungus Microcyclus ulei infects this tree, causing South American leaf blight (SALB disease. This disease causes developmental delays and significant crop losses, thereby decreasing the production of latex. Currently several groups are working on obtaining clones of rubber tree with durable resistance to SALB through the use of extensive molecular biology techniques. In this study, we used a secondary clone that was resistant to M. ulei isolate GCL012. This clone, FX 3864 was obtained by crossing between clones PB 86 and B 38 (H. brasiliensis x H. brasiliensis. RNA-Seq high-throughput sequencing technology was used to analyze the differential expression of the FX 3864 clone transcriptome at 0 and 48 h post infection (hpi with the M. ulei isolate GCL012. A total of 158,134,220 reads were assembled using the de novo assembly strategy to generate 90,775 contigs with an N50 of 1672. Using a reference-based assembly, 76,278 contigs were generated with an N50 of 1324. We identified 86 differentially expressed genes associated with the defense response of FX 3864 to GCL012. Seven putative genes members of the AP2/ERF ethylene (ET-dependent superfamily were found to be down-regulated. An increase in salicylic acid (SA was associated with the up-regulation of three genes involved in cell wall synthesis and remodeling, as well as in the down-regulation of the putative gene CPR5. The defense response of FX 3864 against the GCL012 isolate was associated with the antagonistic SA, ET and jasmonic acid (JA pathways. These responses are characteristic of plant resistance to biotrophic pathogens.
Al-Nakeeb, Kosai Ali Ahmed; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas
and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences...
Hou, Siyu; Sun, Zhaoxia; Li, Yaoshen; Wang, Yijie; Ling, Hubin; Xing, Guofang; Han, Yuanhuai; Li, Hongying
Premise of the study: Proso millet (Panicum miliaceum; Poaceae) is a minor crop with good nutritional qualities and strong tolerance to drought stress and soil infertility. However, studies on genetic diversity have been limited due to a lack of efficient genetic markers. Methods: Illumina sequencing technology was used to generate short read sequences of proso millet, and de novo transcriptome assemblies were used to develop a de novo assembly of proso millet. Genic simple sequence repeat (SSR) markers were identified and used to detect polymorphism among 56 accessions. Population structure and genetic similarity coefficient were estimated. Results: In total, 25,341 unique gene sequences and 4724 SSR loci were obtained from the transcriptome, of which 229 pairs of SSR primers were validated, which resulted in 14 polymorphic genic SSR primers exhibiting 43 total alleles. According to the ratio of polymorphic markers (6.1%, 14/229), there are potentially 288 polymorphic genic SSR markers available for genetic assay development in the future. Bayesian population analyses showed that the 56 accessions comprised two distinct groups. Discussion: A genetic structure and cluster assay indicated that the accessions from the Loess Plateau of China shared a high genetic similarity coefficient with those from other regions and that there was no correlation between genetic diversity and geographic origin. The transcriptome sequencing data and millet-specific SSR markers developed in this study establish an excellent resource for gene discovery and may improve the development of breeding programs in proso millet in the future. PMID:28791202
Full Text Available BACKGROUND: The seagrass Zostera marina is a monocotyledonous angiosperm belonging to a polyphyletic group of plants that can live submerged in marine habitats. Zostera marina L. is one of the most common seagrasses and is considered a cornerstone of marine plant molecular ecology research and comparative studies. However, the mechanisms underlying its adaptation to the marine environment still remain poorly understood due to limited transcriptomic and genomic data. PRINCIPAL FINDINGS: Here we explored the transcriptome of Z. marina leaves under different environmental conditions using Illumina paired-end sequencing. Approximately 55 million sequencing reads were obtained, representing 58,457 transcripts that correspond to 24,216 unigenes. A total of 14,389 (59.41% unigenes were annotated by blast searches against the NCBI non-redundant protein database. 45.18% and 46.91% of the unigenes had significant similarity with proteins in the Swiss-Prot database and Pfam database, respectively. Among these, 13,897 unigenes were assigned to 57 Gene Ontology (GO terms and 4,745 unigenes were identified and mapped to 233 pathways via functional annotation against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG. We compared the orthologous gene family of the Z. marina transcriptome to Oryza sativa and Pyropia yezoensis and 11,667 orthologous gene families are specific to Z. marina. Furthermore, we identified the photoreceptors sensing red/far-red light and blue light. Also, we identified a large number of genes that are involved in ion transporters and channels including Na+ efflux, K+ uptake, Cl- channels, and H+ pumping. CONCLUSIONS: Our study contains an extensive sequencing and gene-annotation analysis of Z. marina. This information represents a genetic resource for the discovery of genes related to light sensing and salt tolerance in this species. Our transcriptome can be further utilized in future studies on molecular adaptation to
Full Text Available Elms, especially Ulmus minor and Ulmus americana, are carrying out a hard battle against Dutch elm disease (DED. This vascular wilt disease, caused by Ophiostoma ulmi and O. novo-ulmi, appeared in the twentieth century and killed millions of elms across North America and Europe. Elm breeding and conservation programmes have identified a reduced number of DED tolerant genotypes. In this study, three U. minor genotypes with contrasted levels of tolerance to DED were exposed to several biotic and abiotic stresses in order to (i obtain a de novo assembled transcriptome of U. minor using 454 pyrosequencing, (ii perform a functional annotation of the assembled transcriptome, (iii identify genes potentially involved in the molecular response to environmental stress, and (iv develop gene-based markers to support breeding programmes. A total of 58,429 putative unigenes were identified after assembly and filtering of the transcriptome. 32,152 of these unigenes showed homology with proteins identified in the genome from the most common plant model species. Well-known family proteins and transcription factors involved in abiotic, biotic or both stresses were identified after functional annotation. A total of 30,693 polymorphisms were identified in 7,125 isotigs, a large number of them corresponding to SNPs (27,359. In a subset randomly selected for validation, 87 % of the SNPs were confirmed. The material generated may be valuable for future Ulmus gene expression, population genomics and association genetics studies, especially taking into account the scarce molecular information available for this genus and the great impact that DED has on elm populations.
Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation
Chana Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene
The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi......-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads...... and provides a new molecular tool to assist biological research in cartilaginous fishes....
Rodd F Helen
Full Text Available Abstract Background Next-generation sequencing is providing researchers with a relatively fast and affordable option for developing genomic resources for organisms that are not among the traditional genetic models. Here we present a de novo assembly of the guppy (Poecilia reticulata transcriptome using 454 sequence reads, and we evaluate potential uses of this transcriptome, including detection of sex-specific transcripts and deployment as a reference for gene expression analysis in guppies and a related species. Guppies have been model organisms in ecology, evolutionary biology, and animal behaviour for over 100 years. An annotated transcriptome and other genomic tools will facilitate understanding the genetic and molecular bases of adaptation and variation in a vertebrate species with a uniquely well known natural history. Results We generated approximately 336 Mbp of mRNA sequence data from male brain, male body, female brain, and female body. The resulting 1,162,670 reads assembled into 54,921 contigs, creating a reference transcriptome for the guppy with an average read depth of 28×. We annotated nearly 40% of this reference transcriptome by searching protein and gene ontology databases. Using this annotated transcriptome database, we identified candidate genes of interest to the guppy research community, putative single nucleotide polymorphisms (SNPs, and male-specific expressed genes. We also showed that our reference transcriptome can be used for RNA-sequencing-based analysis of differential gene expression. We identified transcripts that, in juveniles, are regulated differently in the presence and absence of an important predator, Rivulus hartii, including two genes implicated in stress response. For each sample in the RNA-seq study, >50% of high-quality reads mapped to unique sequences in the reference database with high confidence. In addition, we evaluated the use of the guppy reference transcriptome for gene expression analyses in
Ethan A. G. Baker
Full Text Available Conifers are the dominant plant species throughout the high latitude boreal forests as well as some lower latitude temperate forests of North America, Europe, and Asia. As such, they play an integral economic and ecological role across much of the world. This study focused on the characterization of needle transcriptomes from four ecologically important and understudied North American white pines within the Pinus subgenus Strobus. The populations of many Strobus species are challenged by native and introduced pathogens, native insects, and abiotic factors. RNA from the needles of western white pine (Pinus monticola, limber pine (Pinus flexilis, whitebark pine (Pinus albicaulis, and sugar pine (Pinus lambertiana was sampled, Illumina short read sequenced, and de novo assembled. The assembled transcripts and their subsequent structural and functional annotations were processed through custom pipelines to contend with the challenges of non-model organism transcriptome validation. Orthologous gene family analysis of over 58,000 translated transcripts, implemented through Tribe-MCL, estimated the shared and unique gene space among the four species. This revealed 2025 conserved gene families, of which 408 were aligned to estimate levels of divergence and reveal patterns of selection. Specific candidate genes previously associated with drought tolerance and white pine blister rust resistance in conifers were investigated.
Kareem, Abdul; Radhakrishnan, Dhanya; Sondhi, Yash; Aiyaz, Mohammed; Roy, Merin V; Sugimoto, Kaoru; Prasad, Kalika
While in the movie Deadpool it is possible for a human to recreate an arm from scratch, in reality plants can even surpass that. Not only can they regenerate lost parts, but also the whole plant body can be reborn from a few existing cells. Despite the decades old realization that plant cells possess the ability to regenerate a complete shoot and root system, it is only now that the underlying mechanisms are being unraveled. De novo plant regeneration involves the initiation of regenerative mass, acquisition of the pluripotent state, reconstitution of stem cells and assembly of regulatory interactions. Recent studies have furthered our understanding on the making of a complete plant system in the absence of embryonic positional cues. We review the recent studies probing the molecular mechanisms of de novo plant regeneration in response to external inductive cues and our current knowledge of direct reprogramming of root to shoot and vice versa. We further discuss how de novo regeneration can be exploited to meet the demands of green culture industries and to serve as a general model to address the fundamental questions of regeneration across the plant kingdom.
Guzman, Frank; Kulcheski, Franceli Rodrigues; Turchetto-Zolet, Andreia Carina; Margis, Rogerio
Pitanga (Eugenia uniflora L.) is a member of the Myrtaceae family and is of particular interest due to its medicinal properties that are attributed to specialized metabolites with known biological activities. Among these molecules, terpenoids are the most abundant in essential oils that are found in the leaves and represent compounds with potential pharmacological benefits. The terpene diversity observed in Myrtaceae is determined by the activity of different members of the terpene synthase and oxidosqualene cyclase families. Therefore, the aim of this study was to perform a de novo assembly of transcripts from E. uniflora leaves and to annotation to identify the genes potentially involved in the terpenoid biosynthesis pathway and terpene diversity. In total, 72,742 unigenes with a mean length of 1048bp were identified. Of these, 43,631 and 36,289 were annotated with the NCBI non-redundant protein and Swiss-Prot databases, respectively. The gene ontology categorized the sequences into 53 functional groups. A metabolic pathway analysis with KEGG revealed 8,625 unigenes assigned to 141 metabolic pathways and 40 unigenes predicted to be associated with the biosynthesis of terpenoids. Furthermore, we identified four putative full-length terpene synthase genes involved in sesquiterpenes and monoterpenes biosynthesis, and three putative full-length oxidosqualene cyclase genes involved in the triterpenes biosynthesis. The expression of these genes was validated in different E. uniflora tissues. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Liu, Siyang; Huang, Shujia; Rao, Junhua
present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...
Dudchenko, Olga; Batra, Sanjit S; Omer, Arina D; Nyquist, Sarah K; Hoeger, Marie; Durand, Neva C; Shamim, Muhammad S; Machol, Ido; Lander, Eric S; Aiden, Aviva Presser; Aiden, Erez Lieberman
The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Ae aegypti and Culex quinquefasciatus , each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species. Copyright © 2017, American Association for the Advancement of Science.
Wu, Shuanghua; Lei, Jianjun; Chen, Guoju; Chen, Hancai; Cao, Bihao; Chen, Changming
Chinese kale, a vegetable of the cruciferous family, is a popular crop in southern China and Southeast Asia due to its high glucosinolate content and nutritional qualities. However, there is little research on the molecular genetics and genes involved in glucosinolate metabolism and its regulation in Chinese kale. In this study, we sequenced and characterized the transcriptomes and expression profiles of genes expressed in 11 tissues of Chinese kale. A total of 216 million 150-bp clean reads were generated using RNA-sequencing technology. From the sequences, 98,180 unigenes were assembled for the whole plant, and 49,582~98,423 unigenes were assembled for each tissue. Blast analysis indicated that a total of 80,688 (82.18%) unigenes exhibited similarity to known proteins. The functional annotation and classification tools used in this study suggested that genes principally expressed in Chinese kale, were mostly involved in fundamental processes, such as cellular and molecular functions, the signal transduction, and biosynthesis of secondary metabolites. The expression levels of all unigenes were analyzed in various tissues of Chinese kale. A large number of candidate genes involved in glucosinolate metabolism and its regulation were identified, and the expression patterns of these genes were analyzed. We found that most of the genes involved in glucosinolate biosynthesis were highly expressed in the root, petiole, and in senescent leaves. The expression patterns of ten glucosinolate biosynthetic genes from RNA-seq were validated by quantitative RT-PCR in different tissues. These results provided an initial and global overview of Chinese kale gene functions and expression activities in different tissues. PMID:28228764
Gao, Bei; Li, Xiaoshuang; Zhang, Daoyuan; Liang, Yuqing; Yang, Honglan; Chen, Moxian; Zhang, Yuanming; Zhang, Jianhua; Wood, Andrew J
The desiccation tolerant bryophyte Bryum argenteum is an important component of desert biological soil crusts (BSCs) and is emerging as a model system for studying vegetative desiccation tolerance. Here we present and analyze the hydration-dehydration-rehydration transcriptomes in B. argenteum to establish a desiccation-tolerance transcriptomic atlas. B. argenteum gametophores representing five different hydration stages (hydrated (H0), dehydrated for 2 h (D2), 24 h (D24), then rehydrated for 2 h (R2) and 48 h (R48)), were sampled for transcriptome analyses. Illumina high throughput RNA-Seq technology was employed and generated more than 488.46 million reads. An in-house de novo transcriptome assembly optimization pipeline based on Trinity assembler was developed to obtain a reference Hydration-Dehydration-Rehydration (H-D-R) transcriptome comprising of 76,206 transcripts, with an N50 of 2,016 bp and average length of 1,222 bp. Comprehensive transcription factor (TF) annotation discovered 978 TFs in 62 families, among which 404 TFs within 40 families were differentially expressed upon dehydration-rehydration. Pfam term enrichment analysis revealed 172 protein families/domains were significantly associated with the H-D-R cycle and confirmed early rehydration (i.e. the R2 stage) as exhibiting the maximum stress-induced changes in gene expression.
Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J
As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs firstname.lastname@example.org or email@example.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Hodgins, Kathryn A; Lai, Zhao; Oliveira, Luiz O; Still, David W; Scascitelli, Moira; Barker, Michael S; Kane, Nolan C; Dempewolf, Hannes; Kozik, Alex; Kesseli, Richard V; Burke, John M; Michelmore, Richard W; Rieseberg, Loren H
Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values. © 2013 John Wiley & Sons Ltd.
Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.
Full Text Available The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.
Carole F S Koning-Boucoiran
Full Text Available In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array.Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L. genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.
Full Text Available Ephedra plants are taxonomically classified as gymnosperms, and are medicinally important as the botanical origin of crude drugs and as bioresources that contain pharmacologically active chemicals. Here we show a comparative analysis of the transcriptomes of aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing by RNA-Seq. De novo assembly of short cDNA sequence reads generated 23,358, 13,373, and 28,579 contigs longer than 200 bases from aerial stems, roots, or both aerial stems and roots, respectively. The presumed functions encoded by these contig sequences were annotated by BLAST (blastx. Subsequently, these contigs were classified based on gene ontology slims, Enzyme Commission numbers, and the InterPro database. Furthermore, comparative gene expression analysis was performed between aerial stems and roots. These transcriptome analyses revealed differences and similarities between the transcriptomes of aerial stems and roots in E. sinica. Deep transcriptome sequencing of Ephedra should open the door to molecular biological studies based on the entire transcriptome, tissue- or organ-specific transcriptomes, or targeted genes of interest.
Full Text Available Paeonia lactiflora is a herbaceous flower in the family Paeoniaceae with both hypocotyl and epicotyl dormant seeds. We used high-throughput transcriptome sequencing on two different developmental stages of P. lactiflora seeds to identify seed dormancy and germination-related genes. We performed de novo assembly and annotated a total of 123,577 unigenes, which encoded 24,688 putative proteins with 47 GO categories. A total of 10,714 unigenes were annotated in the KEGG database, and 258 pathways were involved in the annotations. A total of 1795 genes were differentially expressed in the functional enrichment analysis. The key genes for seed germination and dormancy, such as GAI1 and ARF, were confirmed by quantitative reverse transcription-polymerase chain reaction analysis. This is the first report of sequencing the P. lactiflora seed transcriptome. Our results provide fundamental frame work and technical support for further selective breeding and cultivation of Paeonia. Our transcriptomic data also serves as the basis for future genetics and genomics research on Paeonia and its closely related species.
Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.
Full Text Available Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne'ohe Bay, Oahu, Hawai'i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length "giant" proteins (>4,000 amino acids, proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This
Christie, Andrew E.; Sommer, Stephanie A.; Cieslak, Matthew C.; Hartline, Daniel K.; Lenz, Petra H.
Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne‘ohe Bay, Oahu, Hawai‘i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length “giant” proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This
Zhang, Shuwei; Ding, Feng; He, Xinhua; Luo, Cong; Huang, Guixiang; Hu, Ying
Seedlessness is a desirable character in lemons and other citrus species. Seedless fruit can be induced in many ways, including through self-incompatibility (SI). SI is widely used as an intraspecific reproductive barrier that prevents self-fertilization in flowering plants. Although there have been many studies on SI, its mechanism remains unclear. The 'Xiangshui' lemon is an important seedless cultivar whose seedlessness has been caused by SI. It is essential to identify genes involved in SI in 'Xiangshui' lemon to clarify its molecular mechanism. In this study, candidate genes associated with SI were identified using high-throughput Illumina RNA sequencing (RNA-seq). A total of 61,224 unigenes were obtained (average, 948 bp; N50 of 1,457 bp), among which 47,260 unigenes were annotated by comparison to six public databases (Nr, Nt, Swiss-Prot, KEGG, COG, and GO). Differentially expressed genes were identified by comparing the transcriptomes of no-, self-, and cross-pollinated stigmas with styles of the 'Xiangshui' lemon. Several differentially expressed genes that might be associated with SI were identified, such as those involved in pollen tube growth, programmed cell death, signal transduction, and transcription. NADPH oxidase genes associated with apoptosis were highly upregulated in the self-pollinated transcriptome. The expression pattern of 12 genes was analyzed by quantitative real-time polymerase chain reaction. A putative S-RNase gene was identified that had not been previously associated with self-pollen rejection in lemon or citrus. This study provided a transcriptome dataset for further studies of SI and seedless lemon breeding.
Irene Liza Isaac
Full Text Available Ganoderma boninense is known to be the causal agent for basal stem rot (BSR affecting the oil palm industry worldwide thus cumulating to high economic losses every year. Several reports have shown that a compatible monokaryon pair needs to mate; producing dikaryotic mycelia to initiate the infection towards the oil palm. However, the molecular events occurs during mating process are not well understood. We performed transcriptome sequencing using Illumina RNA-seq technology and de novo assembly of the transcripts from monokaryon, mating junction and dikaryon mycelia of G. boninense. Raw reads from these three libraries were deposited in the NCBI database with accession number SRR1745787, SRR1745773 and SRR1745777, respectively.
Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan
Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176
Han, Zhaofang; Xiao, Shijun; Liu, Xiande; Liu, Yang; Li, Jiakai; Xie, Yangjie; Wang, Zhiyong
The large yellow croaker, Larimichthys crocea is an important marine fish in China with a high economic value. In the last decade, the stock conservation and aquaculture industry of this species have been facing severe challenges because of wild population collapse and degeneration of important economic traits. However, genes contributing to growth and immunity in L. crocea have not been thoroughly analyzed, and available molecular markers are still not sufficient for genetic resource management and molecular selection. In this work, we sequenced the transcriptome in L. crocea liver tissue with a Roche 454 sequencing platform and assembled the transcriptome into 93 801 transcripts. Of them, 38 856 transcripts were successfully annotated in nt, nr, Swiss-Prot, InterPro, COG, GO and KEGG databases. Based on the annotation information, 3 165 unigenes related to growth and immunity were identified. Additionally, a total of 6 391 simple sequence repeats (SSRs) were identified from the transcriptome, among which 4 498 SSRs had enough flanking regions to design primers for polymerase chain reactions (PCR). To access the polymorphism of these markers, 30 primer pairs were randomly selected for PCR amplification and validation in 30 individuals, and 12 primer pairs (40.0%) exhibited obvious length polymorphisms. This work applied RNA-Seq to assemble and analyze a live transcriptome in L. crocea. With gene annotation and sequence information, genes related to growth and immunity were identified and massive SSR markers were developed, providing valuable genetic resources for future gene functional analysis and selective breeding of L. crocea.
Oh, Dong-Ha; Barkla, Bronwyn J; Vera-Estrella, Rosario; Pantoja, Omar; Lee, Sang-Yeol; Bohnert, Hans J; Dassanayake, Maheshi
Mesembryanthemum crystallinum (ice plant) exhibits extreme tolerance to salt. Epidermal bladder cells (EBCs), developing on the surface of aerial tissues and specialized in sodium sequestration and other protective functions, are critical for the plant's stress adaptation. We present the first transcriptome analysis of EBCs isolated from intact plants, to investigate cell type-specific responses during plant salt adaptation. We developed a de novo assembled, nonredundant EBC reference transcriptome. Using RNAseq, we compared the expression patterns of the EBC-specific transcriptome between control and salt-treated plants. The EBC reference transcriptome consists of 37 341 transcript-contigs, of which 7% showed significantly different expression between salt-treated and control samples. We identified significant changes in ion transport, metabolism related to energy generation and osmolyte accumulation, stress signalling, and organelle functions, as well as a number of lineage-specific genes of unknown function, in response to salt treatment. The salinity-induced EBC transcriptome includes active transcript clusters, refuting the view of EBCs as passive storage compartments in the whole-plant stress response. EBC transcriptomes, differing from those of whole plants or leaf tissue, exemplify the importance of cell type-specific resolution in understanding stress adaptive mechanisms. No claim to original US government works. New Phytologist © 2015 New Phytologist Trust.
Full Text Available Abstract Background The teleost Zoarces viviparus (eelpout lives along the coasts of Northern Europe and has long been an established model organism for marine ecology and environmental monitoring. The scarce information about this species genome has however restrained the use of efficient molecular-level assays, such as gene expression microarrays. Results In the present study we present the first comprehensive characterization of the Zoarces viviparus liver transcriptome. From 400,000 reads generated by massively parallel pyrosequencing, more than 50,000 pieces of putative transcripts were assembled, annotated and functionally classified. The data was estimated to cover roughly 40% of the total transcriptome and homologues for about half of the genes of Gasterosteus aculeatus (stickleback were identified. The sequence data was consequently used to design an oligonucleotide microarray for large-scale gene expression analysis. Conclusion Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates enough genomic information for adequate de novo assembly of a large number of genes in a higher vertebrate. The generated sequence data, including the validated microarray probes, are publicly available to promote genome-wide research in Zoarces viviparus.
Li, Yingrui; Zheng, Hancheng; Luo, Ruibang
Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...
Full Text Available The beltfish (Trichiurus lepturus is considered as one of the most economically important marine fish in East Asia. It is a top predator with a robust swimming ability that is a good model to study muscle physiology in fish. In the present study, we used Illumina sequencing technology (NextSeq500 to sequence, assemble and annotate the muscle transcriptome of juvenile beltfish. A total of 57,509,280 clean reads (deposited in NCBI SRA database with accession number of SRX1674471 were obtained from RNA sequencing and 26,811 unigenes (with N50 of 1033 bp were obtained after de novo assembling with Trinity software. BLASTX against NR, GO, KEGG and eggNOG databases show 100%, 49%, 31% and 96% annotation rate, respectively. By mining beltfish muscle transcriptome, several key genes which play essential role on regulating myogenesis, including pax3, pax7, myf5, myoD, mrf4/myf6, myogenin and myostatin were identified with a low expression level. The muscle transcriptome of beltfish can provide some insight into the understanding of genome-wide transcriptome profile of teleost muscle tissue and give useful information to study myogenesis in juvenile/adult fish.
Gao, Jian [Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute of Sichuan Agricultural University, Wenjiang, Sichuan (China); Luo, Mao [Drug Discovery Research Center of Luzhou Medical College, Luzhou, Sichuan (China); Zhu, Ye; He, Ying; Wang, Qin [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China); Zhang, Chun, E-mail: email@example.com [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China)
Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.
Gao, Jian; Luo, Mao; Zhu, Ye; He, Ying; Wang, Qin; Zhang, Chun
Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR
Schneider, Valerie A; Graves-Lindsay, Tina; Howe, Kerstin; Bouk, Nathan; Chen, Hsiu-Chuan; Kitts, Paul A; Murphy, Terence D; Pruitt, Kim D; Thibaud-Nissen, Françoise; Albracht, Derek; Fulton, Robert S; Kremitzki, Milinn; Magrini, Vincent; Markovic, Chris; McGrath, Sean; Steinberg, Karyn Meltz; Auger, Kate; Chow, William; Collins, Joanna; Harden, Glenn; Hubbard, Timothy; Pelan, Sarah; Simpson, Jared T; Threadgold, Glen; Torrance, James; Wood, Jonathan M; Clarke, Laura; Koren, Sergey; Boitano, Matthew; Peluso, Paul; Li, Heng; Chin, Chen-Shan; Phillippy, Adam M; Durbin, Richard; Wilson, Richard K; Flicek, Paul; Eichler, Evan E; Church, Deanna M
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.
Full Text Available The Chinese giant salamander (Andrias davidianus occupies a seat at the phylogenetic and species evolution process, which makes it an invaluable model for genetics; however, the genetic information and gene sequences about the Chinese giant salamander in public databases are scanty. Hence, we aimed to perform transcriptome analysis with the help of high-throughput sequencing. In this data, 61,317,940 raw reads were acquired from Chinese giant salamander mRNA using Illumina paired-end sequencing platform. After de novo assembly, a total of 72,072 unigenes were gained, in which 33,834 (46.95% and 29,479 (40.91% transcripts exhibited homology to sequences in the Nr database and Swiss-Prot database, (E-value <10−5, respectively. In the obtained unigenes, 18,019 (25% transcripts were assigned with at least one Gene Ontology term, of which 1218 (6.8% transcripts were assigned to immune system processes. In addition, a total of 17,572 assembled sequences were assigned into 241 predicted KEGG metabolic pathways. Among these, 2552 (14.5% transcripts were assigned to the immune system relevant pathway and 5 transcripts were identified as potential antimicrobial peptides (AMPs. Keywords: Andrias davidianus, Transcriptome
Candice A Stafford-Banks
Full Text Available Saliva is known to play a crucial role in insect feeding behavior and virus transmission. Currently, little is known about the salivary glands and saliva of thrips, despite the fact that Frankliniella occidentalis (Pergande (the western flower thrips is a serious pest due to its destructive feeding, wide host range, and transmission of tospoviruses. As a first step towards characterizing thrips salivary gland functions, we sequenced the transcriptome of the primary salivary glands of F. occidentalis using short read sequencing (Illumina technology. A de novo-assembled transcriptome revealed 31,392 high quality contigs with an average size of 605 bp. A total of 12,166 contigs had significant BLASTx or tBLASTx hits (E≤1.0E-6 to known proteins, whereas a high percentage (61.24% of contigs had no apparent protein or nucleotide hits. Comparison of the F. occidentalis salivary gland transcriptome (sialotranscriptome against a published F. occidentalis full body transcriptome assembled from Roche-454 reads revealed several contigs with putative annotations associated with salivary gland functions. KEGG pathway analysis of the sialotranscriptome revealed that the majority (18 out of the top 20 predicted KEGG pathways of the salivary gland contig sequences match proteins involved in metabolism. We identified several genes likely to be involved in detoxification and inhibition of plant defense responses including aldehyde dehydrogenase, metalloprotease, glucose oxidase, glucose dehydrogenase, and regucalcin. We also identified several genes that may play a role in the extra-oral digestion of plant structural tissues including β-glucosidase and pectin lyase; and the extra-oral digestion of sugars, including α-amylase, maltase, sucrase, and α-glucosidase. This is the first analysis of a sialotranscriptome for any Thysanopteran species and it provides a foundational tool to further our understanding of how thrips interact with their plant hosts and the
Cristian O Rohr
Full Text Available Fungi of the genus Pycnoporus are white-rot basidiomycetes widely studied because of their ability to synthesize high added-value compounds and enzymes of industrial interest. Here we report the sequencing, assembly and analysis of the transcriptome of Pycnoporus sanguineus BAFC 2126 grown at stationary phase, in media supplemented with copper sulfate. Using the 454 pyrosequencing platform we obtained a total of 226,336 reads (88,779,843 bases that were filtered and de novo assembled to generate a reference transcriptome of 7,303 transcripts. Putative functions were assigned for 4,732 transcripts by searching similarities of six-frame translated sequences against a customized protein database and by the presence of conserved protein domains. Through the analysis of translated sequences we identified transcripts encoding 178 putative carbohydrate active enzymes, including representatives of 15 families with roles in lignocellulose degradation. Furthermore, we found many transcripts encoding enzymes related to lignin hydrolysis and modification, including laccases and peroxidases, as well as GMC oxidoreductases, copper radical oxidases and other enzymes involved in the generation of extracellular hydrogen peroxide and iron homeostasis. Finally, we identified the transcripts encoding all of the enzymes involved in terpenoid backbone biosynthesis pathway, various terpene synthases related to the biosynthesis of sesquiterpenoids and triterpenoids precursors, and also cytochrome P450 monooxygenases, glutathione S-transferases and epoxide hydrolases with potential functions in the biodegradation of xenobiotics and the enantioselective biosynthesis of biologically active drugs. To our knowledge this is the first report of a transcriptome of genus Pycnoporus and a resource for future molecular studies in P. sanguineus.
Feng, Bing; Yi, Soojin V; Zhang, Manman; Zhou, Xiaoyun
The co-existence of several ploidy types in natural populations makes the cyprinid loach Misgurnus anguillicaudatus an exciting model system to study the genetic and phenotypic consequences of ploidy variations. A first step in such effort is to identify the specific ploidy of an individual. Currently popular methods of karyotyping via cytological preparation or flow cytometry require a large amount of tissue (such as blood) samples, which can be damaging or fatal to the fishes. Here, we developed novel microsatellite markers (SSR markers) from M. anguillicaudatus and show that they can effectively discriminate ploidy using samples collected in a minimally invasive way. Specifically, we generated whole genome transcriptomes from multiple M. anguillicaudatus using the Illumina paired-end sequencing. Approximately 150 million raw reads were assembled into 76,544 non-redundant unigenes. A total of 8,194 potential SSR markers were identified. We selected 98 pairs with more than five tandem repeats for further assays. Out of 45 putative EST-SSR markers that successfully amplified and harbored polymorphism in diploids, 11 markers displayed high variability in tetraploids. We further demonstrate that a set of five EST-SSR markers selected from these are sufficient to distinguish ploidy levels, by first validating them on 69 reference specimens with known ploidy levels and then subsequently using fresh-collected 96 ploidy-unknown specimens. The results from EST-SSR markers are highly concordant with those from independent flow cytometry analysis. The novel EST-SSR markers developed here should facilitate genetic studies of polyploidy in the emerging model system M. anguillicaudatus.
Full Text Available The rock bream (Oplegnathus fasciatus is considerably one of the most economically important marine fish in East Asia and has a unique neo-Y chromosome system that is a good model to study the sex determination and differentiation in fish. In the present study, we used Illumina sequencing technology (HiSeq2000 to sequence, assemble and annotate the transcriptome of the testis and ovary tissues of rock bream. A total of 40,004,378 (NCBI SRA database SRX1406649 and 53,108,992 (NCBI SRA database SRX1406648 high quality reads were obtained from testis and ovary RNA sequencing, respectively, and 60,421 contigs (with average length of 1301 bp were obtained after de novo assembling with Trinity software. Digital gene expression analysis reveals 14,036 contigs that show gender-enriched expressional profile with either testis-enriched (237 contigs or ovary-enriched (581 contigs with RPKM >100. There are 237 male- and 582 female-abundant expressed genes that show sex dimorphic expression. We hope that the gonad transcriptome and those gender-enriched transcripts of rock bream can provide some insight into the understanding of genome-wide transcriptome profile of teleost gonad tissue and give useful information in fish gonad development. Keywords: Gonad transcriptome, Testis, Ovary, Rock bream
Full Text Available Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.
Li, Meng-Yao; Tan, Hua-Wei; Wang, Feng; Jiang, Qian; Xu, Zhi-Sheng; Tian, Chang; Xiong, Ai-Sheng
Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.
Full Text Available The information from ancient DNA (aDNA provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome of two extinct passenger pigeons (Ectopistes migratorius using de novo assembly of massive short (90 bp, paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.
Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien
The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111
Horta, Maria Augusta Crivelente; Vicentini, Renato; Delabona, Priscila da Silva; Laborda, Prianda; Crucello, Aline; Freitas, Sindélia; Kuroshu, Reginaldo Massanobu; Polikarpov, Igor; Pradella, José Geraldo da Cruz; Souza, Anete Pereira
Profiling the transcriptome that underlies biomass degradation by the fungus Trichoderma harzianum allows the identification of gene sequences with potential application in enzymatic hydrolysis processing. In the present study, the transcriptome of T. harzianum IOC-3844 was analyzed using RNA-seq technology. The sequencing generated 14.7 Gbp for downstream analyses. De novo assembly resulted in 32,396 contigs, which were submitted for identification and classified according to their identities. This analysis allowed us to define a principal set of T. harzianum genes that are involved in the degradation of cellulose and hemicellulose and the accessory genes that are involved in the depolymerization of biomass. An additional analysis of expression levels identified a set of carbohydrate-active enzymes that are upregulated under different conditions. The present study provides valuable information for future studies on biomass degradation and contributes to a better understanding of the role of the genes that are involved in this process.
Fruit ripening is a physiological and biochemical process genetically programmed to regulate fruit quality parameters like firmness, flavor, odor and color, as well as production of ethylene in climacteric fruit. In this study, a transcriptomic analysis of mango (Mangifera indica L.) mesocarp cv. "K...
Sarkar, Soumyadev; Chakravorty, Somnath; Mukherjee, Avishek; Bhattacharya, Debanjana; Bhattacharya, Semantee; Gachhui, Ratan
Nitrogen is a key nutrient for all cell forms. Most organisms respond to nitrogen scarcity by slowing down their growth rate. On the contrary, our previous studies have shown that Papiliotrema laurentii strain RY1 has a robust growth under nitrogen starvation. To understand the global regulation that leads to such an extraordinary response, we undertook a de novo approach for transcriptome analysis of the yeast. Close to 33 million sequence reads of high quality for nitrogen limited and enriched condition were generated using Illumina NextSeq500. Trinity analysis and clustered transcripts annotation of the reads produced 17,611 unigenes, out of which 14,157 could be annotated. Gene Ontology term analysis generated 44.92% cellular component terms, 39.81% molecular function terms and 15.24% biological process terms. The most over represented pathways in general were translation, carbohydrate metabolism, amino acid metabolism, general metabolism, folding, sorting, degradation followed by transport and catabolism, nucleotide metabolism, replication and repair, transcription and lipid metabolism. A total of 4256 Single Sequence Repeats were identified. Differential gene expression analysis detected 996 P-significant transcripts to reveal transmembrane transport, lipid homeostasis, fatty acid catabolism and translation as the enriched terms which could be essential for Papiliotrema laurentii strain RY1 to adapt during nitrogen deprivation. Transcriptome data was validated by quantitative real-time PCR analysis of twelve transcripts. To the best of our knowledge, this is the first report of Papiliotrema laurentii strain RY1 transcriptome which would play a pivotal role in understanding the biochemistry of the yeast under acute nitrogen stress and this study would be encouraging to initiate extensive investigations into this Papiliotrema system. Copyright © 2017 Elsevier B.V. All rights reserved.
Jiang, Peng; Nelson, Jeffrey D.; Leng, Ning; Collins, Michael; Swanson, Scott; Dewey, Colin N.; Thomson, James A.; Stewart, Ron
The axolotl (Ambystoma mexicanum) has long been the subject of biological research, primarily owing to its outstanding regenerative capabilities. However, the gene expression programs governing its embryonic development are particularly underexplored, especially when compared to other amphibian model species. Therefore, we performed whole transcriptome polyA+ RNA sequencing experiments on 17 stages of embryonic development. As the axolotl genome is unsequenced and its gene annotation is incomplete, we built de novo transcriptome assemblies for each stage and garnered functional annotation by comparing expressed contigs with known genes in other organisms. In evaluating the number of differentially expressed genes over time, we identify three waves of substantial transcriptome upheaval each followed by a period of relative transcriptome stability. The first wave of upheaval is between the one and two cell stage. We show that the number of differentially expressed genes per unit time is higher between the one and two cell stage than it is across the mid-blastula transition (MBT), the period of zygotic genome activation. We use total RNA sequencing to demonstrate that the vast majority of genes with increasing polyA+ signal between the one and two cell stage result from polyadenylation rather than de novo transcription. The first stable phase begins after the two cell stage and continues until the mid-blastula transition, corresponding with the pre-MBT phase of transcriptional quiescence in amphibian development. Following this is a peak of differential gene expression corresponding with the activation of the zygotic genome and a phase of transcriptomic stability from stages 9 to 11. We observe a third wave of transcriptomic change between stages 11 and 14, followed by a final stable period. The last two stable phases have not been documented in amphibians previously and correspond to times of major morphogenic change in the axolotl embryo: gastrulation and
Rodriguez-Alonso, Gustavo; Matvienko, Marta; López-Valle, Mayra L; Lázaro-Mixteco, Pedro E; Napsucialy-Mendivil, Selene; Dubrovsky, Joseph G; Shishkova, Svetlana
Many Cactaceae species exhibit determinate growth of the primary root as a consequence of root apical meristem (RAM) exhaustion. The genetic regulation of this growth pattern is unknown. Here, we de novo assembled and annotated the root apex transcriptome of the Pachycereus pringlei primary root at three developmental stages, with active or exhausted RAM. The assembled transcriptome is robust and comprehensive, and was used to infer a transcriptional regulatory network of the primary root apex. Putative orthologues of Arabidopsis regulators of RAM maintenance, as well as putative lineage-specific transcripts were identified. The transcriptome revealed putative orthologues of most proteins involved in housekeeping processes, hormone signalling, and metabolic pathways. Our results suggest that specific transcriptional programs operate in the root apex at specific developmental time points. Moreover, the transcriptional state of the P. pringlei root apex as the RAM becomes exhausted is comparable to the transcriptional state of cells from the meristematic, elongation, and differentiation zones of Arabidopsis roots along the root axis. We suggest that the transcriptional program underlying the drought stress response is induced during Cactaceae root development, and that lineage-specific transcripts could contribute to RAM exhaustion in Cactaceae.
Pradhan, Seema; Bandhiwal, Nitesh; Shah, Niraj; Kant, Chandra; Gaur, Rashmi; Bhatia, Sabhyata
Understanding developmental processes, especially in non-model crop plants, is extremely important in order to unravel unique mechanisms regulating development. Chickpea (C. arietinum L.) seeds are especially valued for their high carbohydrate and protein content. Therefore, in order to elucidate the mechanisms underlying seed development in chickpea, deep sequencing of transcriptomes from four developmental stages was undertaken. In this study, next generation sequencing platform was utilized to sequence the transcriptome of four distinct stages of seed development in chickpea. About 1.3 million reads were generated which were assembled into 51,099 unigenes by merging the de novo and reference assemblies. Functional annotation of the unigenes was carried out using the Uniprot, COG and KEGG databases. RPKM based digital expression analysis revealed specific gene activities at different stages of development which was validated using Real time PCR analysis. More than 90% of the unigenes were found to be expressed in at least one of the four seed tissues. DEGseq was used to determine differentially expressing genes which revealed that only 6.75% of the unigenes were differentially expressed at various stages. Homology based comparison revealed 17.5% of the unigenes to be putatively seed specific. Transcription factors were predicted based on HMM profiles built using TF sequences from five legume plants and analyzed for their differential expression during progression of seed development. Expression analysis of genes involved in biosynthesis of important secondary metabolites suggested that chickpea seeds can serve as a good source of antioxidants. Since transcriptomes are a valuable source of molecular markers like simple sequence repeats (SSRs), about 12,000 SSRs were mined in chickpea seed transcriptome and few of them were validated. In conclusion, this study will serve as a valuable resource for improved chickpea breeding.
Full Text Available Understanding developmental processes, especially in non-model crop plants, is extremely important in order to unravel unique mechanisms regulating development. Chickpea (C. arietinum L. seeds are especially valued for their high carbohydrate and protein content. Therefore, in order to elucidate the mechanisms underlying seed development in chickpea, deep sequencing of transcriptomes from four developmental stages was undertaken. In this study, next generation sequencing platform was utilised to sequence the transcriptome of four distinct stages of seed development in chickpea. About 1.3 million reads were generated which were assembled into 51,099 unigenes by merging the de novo and reference assemblies. Functional annotation of the unigenes was carried out using the Uniprot, COG and KEGG databases. RPKM based digital expression analysis revealed specific gene activities at different stages of development which was validated using Real time PCR analysis. More than 90% of the unigenes were found to be expressed in at least one of the four seed tissues. DEGseq was used to determine differentially expressing genes which revealed that only 6.75% of the unigenes were differentially expressed at various stages. Homology based comparison revealed 17.5% of the unigenes to be putatively seed specific. Transcription factors were predicted based on HMM profiles built using TF sequences from five legume plants and analysed for their differential expression during progression of seed development. Expression analysis of genes involved in biosynthesis of important secondary metabolites suggested that chickpea seeds can serve as a good source of antioxidants. Since transcriptomes are a valuable source of molecular markers like simple sequence repeats (SSRs, about 12,000 SSRs were mined in chickpea seed transcriptome and few of them were validated. In conclusion, this study will serve as a valuable resource for improved chickpea breeding.
Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia
It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.
Zhang, Yu-Juan; Hao, Youjin; Si, Fengling; Ren, Shuang; Hu, Ganyu; Shen, Li; Chen, Bin
The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, diapause, and further pest control of D. antiqua. Copyright © 2014 Zhang et al.
Onda, Yoshihiko; Mochida, Keiichi; Yoshida, Takuhiro; Sakurai, Tetsuya; Seymour, Roger S; Umekawa, Yui; Pirintsos, Stergios Arg; Shinozaki, Kazuo; Ito, Kikukatsu
Several plant species can generate enough heat to increase their internal floral temperature above ambient temperature. Among thermogenic plants, Arum concinnatum shows the highest respiration activity during thermogenesis. However, an overall understanding of the genes related to plant thermogenesis has not yet been achieved. In this study, we performed de novo transcriptome analysis of flower organs in A. concinnatum. The de novo transcriptome assembly represented, in total, 158,490 non-redundant transcripts, and 53,315 of those showed significant homology with known genes. To explore genes associated with thermogenesis, we filtered 1266 transcripts that showed a significant correlation between expression pattern and the temperature trend of each sample. We confirmed five putative alternative oxidase transcripts were included in filtered transcripts as expected. An enrichment analysis of the Gene Ontology terms for the filtered transcripts suggested over-representation of genes involved in 1-deoxy-D-xylulose-5-phosphate synthase (DXS) activity. The expression profiles of DXS transcripts in the methyl-D-erythritol 4-phosphate (MEP) pathway were significantly correlated with thermogenic levels. Our results suggest that the MEP pathway is the main biosynthesis route for producing scent monoterpenes. To our knowledge, this is the first report describing the candidate pathway and the key enzyme for floral scent production in thermogenic plants.
Hamidou Soumana, Illiassou; Klopp, Christophe; Ravel, Sophie; Nabihoudine, Ibouniyamine; Tchicaya, Bernadette; Parrinello, Hugues; Abate, Luc; Rialle, Stéphanie; Geiger, Anne
Trypanosoma brucei gambiense (Tbg), causing the sleeping sickness chronic form, completes its developmental cycle within the tsetse fly vector Glossina palpalis gambiensis (Gpg) before its transmission to humans. Within the framework of an anti-vector disease control strategy, a global gene expression profiling of trypanosome infected (susceptible), non-infected, and self-cured (refractory) tsetse flies was performed, on their midguts, to determine differential genes expression resulting from in vivo trypanosomes, tsetse flies (and their microbiome) interactions. An RNAseq de novo assembly was achieved. The assembled transcripts were mapped to reference sequences for functional annotation. Twenty-four percent of the 16,936 contigs could not be annotated, possibly representing untranslated mRNA regions, or Gpg- or Tbg-specific ORFs. The remaining contigs were classified into 65 functional groups. Only a few transposable elements were present in the Gpg midgut transcriptome, which may represent active transpositions and play regulatory roles. One thousand three hundred and seventy three genes differentially expressed (DEGs) between stimulated and non-stimulated flies were identified at day-3 post-feeding; 52 and 1025 between infected and self-cured flies at 10 and 20 days post-feeding, respectively. The possible roles of several DEGs regarding fly susceptibility and refractoriness are discussed. The results provide new means to decipher fly infection mechanisms, crucial to develop anti-vector control strategies.
Zhang, Xiao-Xuan; Cong, Wei; Elsheikha, Hany M; Liu, Guo-Hua; Ma, Jian-Gang; Huang, Wei-Yi; Zhao, Quan; Zhu, Xing-Quan
Fasciola gigantica is regarded as the major liver fluke causing fasciolosis in livestock in tropical countries. Despite the significant economic and public health impacts of F. gigantica there are few studies on the pathogenesis of this parasite and our understanding is further limited by the lack of genome and transcriptome information. In this study, de novo Illumina RNA sequencing (RNA-seq) was performed to obtain a comprehensive transcriptome profile of the juvenile (42days post infection) and adult stages of F. gigantica. A total of 49,720 unigenes were produced from juvenile and adult stages of F. gigantica, with an average length of 1286 nucleotides (nt) and N50 of 2076nt. A total of 27,862 (56.03%) unigenes were annotated by BLAST similarity searches against the NCBI non-redundant protein database. Because F. gigantica needs to feed and/or digest host tissues, some proteases (including cysteine proteases and aspartic proteases), which play a role in the degradation of host tissues (protein), have been paid more attention in the present study. A total of 6511 distinct genes were found differentially expressed between juveniles and adults, of which 3993 genes were up-regulated and 2518 genes were down-regulated in adults versus juveniles, respectively. Moreover, stage-specific differentially expressed genes were identified in juvenile (17,009) and adult (6517) F. gigantica. The significantly divergent pathways of differentially expressed genes included cAMP signaling pathway (226; 4.12%), proteoglycans in cancer (256; 4.67%) and focal adhesion (199; 3.63%). The transcription pattern also revealed two egg-laying-associated pathways: cGMP-PKG signaling pathway and TGF-β signaling pathway. This study provides the first comparative transcriptomic data concerning juvenile and adult stages of F. gigantica that will be of great value for future research efforts into understanding parasite pathogenesis and developing vaccines against this important parasite
Li, Lingli; Zhang, Hehua; Liu, Zhongshuai; Cui, Xiaoyue; Zhang, Tong; Li, Yanfang; Zhang, Lingyun
Blueberry is an economically important fruit crop in Ericaceae family. The substantial quantities of flavonoids in blueberry have been implicated in a broad range of health benefits. However, the information regarding fruit development and flavonoid metabolites based on the transcriptome level is still limited. In the present study, the transcriptome and gene expression profiling over berry development, especially during color development were initiated. A total of approximately 13.67 Gbp of data were obtained and assembled into 186,962 transcripts and 80,836 unigenes from three stages of blueberry fruit and color development. A large number of simple sequence repeats (SSRs) and candidate genes, which are potentially involved in plant development, metabolic and hormone pathways, were identified. A total of 6429 sequences containing 8796 SSRs were characterized from 15,457 unigenes and 1763 unigenes contained more than one SSR. The expression profiles of key genes involved in anthocyanin biosynthesis were also studied. In addition, a comparison between our dataset and other published results was carried out. Our high quality reads produced in this study are an important advancement and provide a new resource for the interpretation of high-throughput data for blueberry species whether regarding sequencing data depth or species extension. The use of this transcriptome data will serve as a valuable public information database for the studies of blueberry genome and would greatly boost the research of fruit and color development, flavonoid metabolisms and regulation and breeding of more healthful blueberries.
Full Text Available Abstract Background Sacha Inchi (Plukenetia volubilis L., Euphorbiaceae is a potential oilseed crop because the seeds of this plant are rich in unsaturated fatty acids (FAs. In particular, the fatty acid composition of its seed oil differs markedly in containing large quantities of α-linolenic acid (18C:3, a kind of ω-3 FAs. However, little is known about the molecular mechanisms responsible for biosynthesis of unsaturated fatty acids in the developing seeds of this species. Transcriptome data are needed to better understand these mechanisms. Results In this study, de novo transcriptome assembly and gene expression analysis were performed using Illumina sequencing technology. A total of 52.6 million 90-bp paired-end reads were generated from two libraries constructed at the initial stage and fast oil accumulation stage of seed development. These reads were assembled into 70,392 unigenes; 22,179 unigenes showed a 2-fold or greater expression difference between the two libraries. Using this data we identified unigenes that may be involved in de novo FA and triacylglycerol biosynthesis. In particular, a number of unigenes encoding desaturase for formation of unsaturated fatty acids with high expression levels in the fast oil accumulation stage compared with the initial stage of seed development were identified. Conclusions This study provides the first comprehensive dataset characterizing Sacha Inchi gene expression at the transcriptional level. These data provide the foundation for further studies on molecular mechanisms underlying oil accumulation and PUFA biosynthesis in Sacha Inchi seeds. Our analyses facilitate understanding of the molecular mechanisms responsible for the high unsaturated fatty acids (especially α-linolenic acid accumulation in Sacha Inchi seeds.
Full Text Available A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.
Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
Passos, Marco A N; de Cruz, Viviane Oliveira; Emediato, Flavia L; de Teixeira, Cristiane Camargo; Azevedo, Vânia C Rennó; Brasileiro, Ana C M; Amorim, Edson P; Ferreira, Claudia F; Martins, Natalia F; Togawa, Roberto C; Júnior, Georgios J Pappas; da Silva, Orzenil Bonfim; Miller, Robert N G
Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases.Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. A large set
Hornshøj, Henrik; Thomsen, Bo; Hedegaard, Jakob
The release of Sus scrofa genome assembly 10 supports improvement of the pig genome annotation and in depth transcriptome analyses using next-generation sequencing technologies. In this study we analyze RNA-seq reads from a tissue collection, including 10 separate tissues from Duroc boars and 10...... short read alignment software we mapped the reads to the genome assembly 10. We extracted contig sequences of gene transcripts using the Cufflinks software. Based on this information we identified expressed genes that are present in the genome assembly. The portion of these genes being previously known...... was roughly estimated by sequence comparison to known genes. Similarly, we searched for genes that are expressed in the tissues but not present in the genome assembly by aligning the non-genome-mapped reads to known gene transcripts. For the genes predicted to have alternative transcript variants by Cufflinks...
Xie, Wen; Guo, Litao; Jiao, Xiaoguo; Yang, Nina; Yang, Xin; Wu, Qingjun; Wang, Shaoli; Zhou, Xuguo; Zhang, Youjun
Sex difference involving chromosomes and gene expression has been extensively documented. In this study, the gender difference in the sweetpotato whitefly Bemisia tabaci was investigated using Illumina-based transcriptomic analysis. Gender-based RNAseq data produced 27 Gb reads, and subsequent de novo assembly generated 93,948 transcripts with a N50 of 1,853 bp. A total of 1,351 differentially expressed genes were identified between male and female B. tabaci, and majority of them were female-biased. Pathway and GO enrichment experiments exhibited a gender-specific expression, including enriched translation in females, and enhanced structural constituent of cuticle in male whiteflies. In addition, a putative transformer2 gene (tra2) was cloned, and the structural feature and expression profile of tra2 were investigated. Sexually dimorphic transcriptome is an uncharted territory for the agricultural insect pests. Molecular understanding of sex determination in B. tabaci, an emerging invasive insect pest worldwide, will provide potential molecular target(s) for genetic pest control alternatives.
Li, Zi-Wen; Chen, Xi; Wu, Qiong; Hagmann, Jörg; Han, Ting-Shen; Zou, Yu-Pan; Ge, Song; Guo, Ya-Long
De novo genes, which originate from ancestral nongenic sequences, are one of the most important sources of protein-coding genes. This origination process is crucial for the adaptation of organisms. However, how de novo genes arise and become fixed in a population or species remains largely unknown. Here, we identified 782 de novo genes from the model plant Arabidopsis thaliana and divided them into three types based on the availability of translational evidence, transcriptional evidence, and neither transcriptional nor translational evidence for their origin. Importantly, by integrating multiple types of omics data, including data from genomes, epigenomes, transcriptomes, and translatomes, we found that epigenetic modifications (DNA methylation and histone modification) play an important role in the origination process of de novo genes. Intriguingly, using the transcriptomes and methylomes from the same population of 84 accessions, we found that de novo genes that are transcribed in approximately half of the total accessions within the population are highly methylated, with lower levels of transcription than those transcribed at other frequencies within the population. We hypothesized that, during the origin of de novo gene alleles, those neutralized to low expression states via DNA methylation have relatively high probabilities of spreading and becoming fixed in a population. Our results highlight the process underlying the origin of de novo genes at the population level, as well as the importance of DNA methylation in this process. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The unarmored dinoflagellate Karenia brevis is among the most prominent harmful, bloom-forming phytoplankton species in the Gulf of Mexico. During blooms, the polyketides PbTx-1 and PbTx-2 (brevetoxins) are produced by K. brevis. Brevetoxins negatively impact human health and the Gulf shellfish harvest. However, the genes underlying brevetoxin synthesis are currently unknown. Because the K. brevis genome is extremely large ( 1 × 1011 base pairs long), and with a high proportion of repetitive, non-coding DNA, it has not been sequenced. In fact, large, repetitive genomes are common among the dinoflagellate group. High-throughput RNA sequencing technology enabled us to assemble Karenia transcriptomes de novo and investigate potential genes in the brevetoxin pathway through comparative transcriptomics. The brevetoxin profile varies among K. brevis clonal cultures. For example, well-documented Wilson-CCFWC268 typically produces 8-10 pg PbTx per cell, whereas SP1 produces differences in gene expression. Of the 85,000 transcripts in the K. brevis transcriptome, 4,600 transcripts, including novel unannotated orthologs and putative polyketide synthases (PKSs), were only expressed by brevetoxin-producing K. brevis and K. papilionacea, not K. mikimotoi. Examination of gene expression between the typical- and low-toxin Wilson clones identified about 3,500 genes with significantly different expression levels, including 2 putative PKSs. One of the 2 PKSs was only found in the brevetoxin-producing Karenia species. These transcriptomes could not have been characterized without high-throughput RNA sequencing.
Background: Bactrocera cucurbitae is an important agricultural pest. Basic genomic information is lacking for this species and this would be useful to inform methods of control, damage mitigation, and eradication efforts. Here, we have sequenced, assembled, and annotated a comprehensive transcriptom...
Zhang, Chenghao; Dong, Wenqi; Gen, Wei; Xu, Baoyu; Shen, Chenjia
Abelmoschus esculentus (okra or lady’s fingers) is a vegetable with high nutritional value, as well as having certain medicinal effects. It is widely used as food, in the food industry, and in herbal medicinal products, but also as an ornamental, in animal feed, and in other commercial sectors. Okra is rich in bioactive compounds, such as flavonoids, polysaccharides, polyphenols, caffeine, and pectin. In the present study, the concentrations of total flavonoids and polysaccharides in five organs of okra were determined and compared. Transcriptome sequencing was used to explore the biosynthesis pathways associated with the active constituents in okra. Transcriptome sequencing of five organs (roots, stem, leaves, flowers, and fruits) of okra enabled us to obtain 293,971 unigenes, of which 232,490 were annotated. Unigenes related to the enzymes involved in the flavonoid biosynthetic pathway or in fructose and mannose metabolism were identified, based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. All of the transcriptional datasets were uploaded to Sequence Read Archive (SRA). In summary, our comprehensive analysis provides important information at the molecular level about the flavonoid and polysaccharide biosynthesis pathways in okra. PMID:29495525
Gonzalez, Emmanuel; Brereton, Nicholas J B; Marleau, Julie; Guidi Nissim, Werther; Labrecque, Michel; Pitre, Frederic E; Joly, Simon
High concentrations of petroleum hydrocarbon (PHC) pollution can be hazardous to human health and leave soils incapable of supporting agricultural crops. A cheap solution, which can help restore biodiversity and bring land back to productivity, is cultivation of high biomass yielding willow trees. However, the genetic mechanisms which allow these fast-growing trees to tolerate PHCs are as yet unclear. Salix purpurea 'Fish Creek' trees were pot-grown in soil from a former petroleum refinery, either lacking or enriched with C10-C50 PHCs. De novo assembled transcriptomes were compared between tree organs and impartially annotated without a priori constraint to any organism. Over 45% of differentially expressed genes originated from foreign organisms, the majority from the two-spotted spidermite, Tetranychus urticae. Over 99% of T. urticae transcripts were differentially expressed with greater abundance in non-contaminated trees. Plant transcripts involved in the polypropanoid pathway, including phenylalanine ammonia-lyase (PAL), had greater expression in contaminated trees whereas most resistance genes showed higher expression in non-contaminated trees. The impartial approach to annotation of the de novo transcriptomes, allowing for the possibility for multiple species identification, was essential for interpretation of the crop's response treatment. The meta-transcriptomic pattern of expression suggests a cross-tolerance mechanism whereby abiotic stress resistance systems provide improved biotic resistance. These findings highlight a valuable but complex biotic and abiotic stress response to real-world, multidimensional contamination which could, in part, help explain why crops such as willow can produce uniquely high biomass yields on challenging marginal land.
Tzika, Athanasia C; Ullate-Agote, Asier; Grbic, Djordje; Milinkovitch, Michel C
Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the "Reptilian Transcriptomes Database 2.0," which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Full Text Available Parasteatoda tepidariorum is an increasingly popular model for the study of spider development and the evolution of development more broadly. However, fully understanding the regulation and evolution of P. tepidariorum development in comparison to other animals requires a genomic perspective. Although research on P. tepidariorum has provided major new insights, gene analysis to date has been limited to candidate gene approaches. Furthermore, the few available EST collections are based on embryonic transcripts, which have not been systematically annotated and are unlikely to contain transcripts specific to post-embryonic stages of development. We therefore generated cDNA from pooled embryos representing all described embryonic stages, as well as post-embryonic stages including nymphs, larvae and adults, and using Illumina HiSeq technology obtained a total of 625,076,514 100-bp paired end reads. We combined these data with 24,360 ESTs available in GenBank, and 1,040,006 reads newly generated from 454 pyrosequencing of a mixed-stage embryo cDNA library. The combined sequence data were assembled using a custom de novo assembly strategy designed to optimize assembly product length, number of predicted transcripts, and proportion of raw reads incorporated into the assembly. The de novo assembly generated 446,427 contigs with an N50 of 1,875 bp. These sequences obtained 62,799 unique BLAST hits against the NCBI non-redundant protein data base, including putative orthologs to 8,917 Drosophila melanogaster genes based on best reciprocal BLAST hit identity compared with the D. melanogaster proteome. Finally, we explored the utility of the transcriptome for RNA-Seq studies, and showed that this resource can be used as a mapping scaffold to detect differential gene expression in different cDNA libraries. This resource will therefore provide a platform for future genomic, gene expression and functional approaches using P. tepidariorum.
Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....
Gopinath, G R; Cinar, H N; Murphy, H R; Durigan, M; Almeria, M; Tall, B D; DaSilva, A J
Cyclospora cayetanensis is a coccidian parasite associated with large and complex foodborne outbreaks worldwide. Linking samples from cyclosporiasis patients during foodborne outbreaks with suspected contaminated food sources, using conventional epidemiological methods, has been a persistent challenge. To address this issue, development of new methods based on potential genomically-derived markers for strain-level identification has been a priority for the food safety research community. The absence of reference genomes to identify nucleotide and structural variants with a high degree of confidence has limited the application of using sequencing data for source tracking during outbreak investigations. In this work, we determined the quality of a high resolution, curated, public mitochondrial genome assembly to be used as a reference genome by applying bioinformatic analyses. Using this reference genome, three new mitochondrial genome assemblies were built starting with metagenomic reads generated by sequencing DNA extracted from oocysts present in stool samples from cyclosporiasis patients. Nucleotide variants were identified in the new and other publicly available genomes in comparison with the mitochondrial reference genome. A consolidated workflow, presented here, to generate new mitochondrion genomes using our reference-guided de novo assembly approach could be useful in facilitating the generation of other mitochondrion sequences, and in their application for subtyping C. cayetanensis strains during foodborne outbreak investigations.
Humberto J Debat
Full Text Available Yerba mate (Ilex paraguariensis A. St.-Hil. is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼ 32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼ 1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge
Aguilera, Patricia M.; Bubillo, Rosana E.; Otegui, Mónica B.; Ducasse, Daniel A.; Zapata, Pedro D.; Marti, Dardo A.
Yerba mate (Ilex paraguariensis A. St.-Hil.) is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge of yerba mate genes
Full Text Available Abstract Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology (KO identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.
Mao, Yunrui; Zhang, Yonghua; Xu, Chuan; Qiu, Yingxiong
Dysosma species (Berberidaceae, Podophylloideae) are of great medicinal pharmacogenetic importance and used as model systems to study the drivers and mechanisms of species diversification of temperate plants in East Asia. Recently, we have sequenced the transcriptome of the low-elevation D. versipellis. In this study, we sequenced the transcriptome of the high-elevation D. aurantiocaulis and used comparative genomic approaches to investigate the transcriptome evolution of the two species. We retrieved 53,929 unigenes from D. aurantiocaulis by de novo transcriptome assemblies using the Illumina HiSeq 2000 platform. Comparing the transcriptomes of both species, we identified 4593 orthologs. Estimation of Ka/Ks ratios for 3126 orthologs revealed that none had a Ka/Ks significantly greater than 1, whereas 1273 (Ka/Ks < 0.5, P < 0.05) were inferred to be under purifying selection. A total of 51 primer pairs were successfully designed from 461 EST-SSRs contained in 4593 orthologs. Marker validation assay revealed that 26 (51%) and 41 (80.4%) produced clear fragments with the expected sizes in all Podophylloideae species. Specifically, 19 different sequences of CYP719A were identified from PCR-amplified genomic DNA of all 12 species of Podophylloideae using primers designed from the assembled transcripts. The data further indicated that CYP719A was likely subject to strong selective constraints maintaining only one copy per genome. In Dysosma, there was relaxed purifying selection or more positive selection for high-elevation species. Overall, this study has generated a wealth of molecular resources potentially useful for pharmacogenetic and evolutionary studies in Dysosma and allied taxa. © 2015 John Wiley & Sons Ltd.
Full Text Available Abstract Background The green algal genus Ulva Linnaeus (Ulvaceae, Ulvales, Chlorophyta is well known for its wide distribution in marine, freshwater, and brackish environments throughout the world. The Ulva species are also highly tolerant of variations in salinity, temperature, and irradiance and are the main cause of green tides, which can have deleterious ecological effects. However, limited genomic information is currently available in this non-model and ecologically important species. Ulva linza is a species that inhabits bedrock in the mid to low intertidal zone, and it is a major contributor to biofouling. Here, we presented the global characterization of the U. linza transcriptome using the Roche GS FLX Titanium platform, with the aim of uncovering the genomic mechanisms underlying rapid and successful colonization of the coastal ecosystems. Results De novo assembly of 382,884 reads generated 13,426 contigs with an average length of 1,000 bases. Contiguous sequences were further assembled into 10,784 isotigs with an average length of 1,515 bases. A total of 304,101 reads were nominally identified by BLAST; 4,368 isotigs were functionally annotated with 13,550 GO terms, and 2,404 isotigs having enzyme commission (EC numbers were assigned to 262 KEGG pathways. When compared with four other full sequenced green algae, 3,457 unique isotigs were found in U. linza and 18 conserved in land plants. In addition, a specific photoprotective mechanism based on both LhcSR and PsbS proteins and a C4-like carbon-concentrating mechanism were found, which may help U. linza survive stress conditions. At least 19 transporters for essential inorganic nutrients (i.e., nitrogen, phosphorus, and sulphur were responsible for its ability to take up inorganic nutrients, and at least 25 eukaryotic cytochrome P450s, which is a higher number than that found in other algae, may be related to their strong allelopathy. Multi-origination of the stress related proteins
Kim, Hui-Su; Han, Jeonghoon; Kim, Hee-Jin; Hagiwara, Atsushi; Lee, Jae-Seong
Whole transcriptomes of the rotifer Brachionus plicatilis were analyzed using an Illumina sequencer. De novo assembly was performed with 49,122,780 raw reads using Trinity software. Among the assembled 42,820 contigs, 27,437 putative open reading frame contigs were identified (average length 1235bp; N50=1707bp). Functional gene annotation with Gene Ontology and InterProScan, in addition to Kyoto Encyclopedia of Genes and Genomes pathway analysis, highlighted the metabolism of xenobiotics by cytochrome P450 (CYP). In addition, 28 CYP genes were identified, and their transcriptional responses to benzo[α]pyrene (B[α]P) were investigated. Most of the CYPs were significantly upregulated or downregulated (Pplicatilis in response to exposure to various chemicals. Copyright © 2017 Elsevier Inc. All rights reserved.
Oct 25, 2017 ... (Shanghai, China) following manufacturer's protocols (Illumina, San .... suggests that pathways involved in musk production are expressed at a ..... Strickler S. R., Aureliano B. and Mueller L. A. 2012 Designing a transcriptome.
Full Text Available BACKGROUND: Despite the recent sequencing of seven ant genomes, no genomic data are available for the genus Formica, an important group for the study of eusocial traits. We sequenced the transcriptome of the ant Formica exsecta with the 454 FLX Titanium technology from a pooled sample of workers from 70 Finnish colonies. RESULTS: About 1,000,000 reads were obtained from a normalised cDNA library. We compared the assemblers MIRA3.0 and Newbler2.6 and showed that the latter performed better on this dataset due to a new option which is dedicated to improve contig formation in low depth portions of the assemblies. The 29,579 contigs represent 27 Mb. 50% showed similarity with known proteins and 25% could be assigned a category of gene ontology. We found more than 13,000 high-quality single nucleotide polymorphisms. The Δ9 desaturase gene family is an important multigene family involved in chemical communication in insects. We found six Δ9 desaturases in this Formica exsecta transcriptome dataset that were used to reconstruct a maximum-likelihood phylogeny of insect desaturases and to test for signatures of positive selection in this multigene family in ant lineages. We found differences with previous phylogenies of this gene family in ants, and found two clades potentially under positive selection. CONCLUSION: This first transcriptome reference sequence of Formica exsecta provided sequence and polymorphism data that will allow researchers working on Formica ants to develop studies to tackle the genetic basis of eusocial phenotypes. In addition, this study provided some general guidelines for de novo transcriptome assembly that should be useful for future transcriptome sequencing projects. Finally, we found potential signatures of positive selection in some clades of the Δ9 desaturase gene family in ants, which suggest the potential role of sequence divergence and adaptive evolution in shaping the large diversity of chemical cues in social insects.
Sun, Luchao; Rai, Amit; Rai, Megha; Nakamura, Michimi; Kawano, Noriaki; Yoshimatsu, Kayo; Suzuki, Hideyuki; Kawahara, Nobuo; Saito, Kazuki; Yamazaki, Mami
The three Forsythia species, F. suspensa, F. viridissima and F. koreana, have been used as herbal medicines in China, Japan and Korea for centuries and they are known to be rich sources of numerous pharmaceutical metabolites, forsythin, forsythoside A, arctigenin, rutin and other phenolic compounds. In this study, de novo transcriptome sequencing and assembly was performed on these species. Using leaf and flower tissues of F. suspensa, F. viridissima and F. koreana, 1.28-2.45-Gbp sequences of Illumina based pair-end reads were obtained and assembled into 81,913, 88,491 and 69,458 unigenes, respectively. Classification of the annotated unigenes in gene ontology terms and KEGG pathways was used to compare the transcriptome of three Forsythia species. The expression analysis of orthologous genes across all three species showed the expression in leaf tissues being highly correlated. The candidate genes presumably involved in the biosynthetic pathway of lignans and phenylethanoid glycosides were screened as co-expressed genes. They express highly in the leaves of F. viridissima and F. koreana. Furthermore, the three unigenes annotated as acyltransferase were predicted to be associated with the biosynthesis of acteoside and forsythoside A from the expression pattern and phylogenetic analysis. This study is the first report on comparative transcriptome analyses of medicinally important Forsythia genus and will serve as an important resource to facilitate further studies on biosynthesis and regulation of therapeutic compounds in Forsythia species.
Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong
We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.
Ni, Pan; Bhuiyan, Ali Akbar; Chen, Jian-Hai; Li, Jingjin; Zhang, Cheng; Zhao, Shuhong; Du, Xiaoyong; Li, Hua; Yu, Hui; Liu, Xiangdong; Li, Kui
Up to date, the scarcity of publicly available complete mitochondrial sequences for European wild pigs hampers deeper understanding about the genetic changes following domestication. Here, we have assembled 26 de novo mtDNA sequences of European wild boars from next generation sequencing (NGS) data and downloaded 174 complete mtDNA sequences to assess the genetic relationship, nucleotide diversity, and selection. The Bayesian consensus tree reveals the clear divergence between the European and Asian clade and a very small portion (10 out of 200 samples) of maternal introgression. The overall nucleotides diversities of the mtDNA sequences have been reduced following domestication. Interestingly, the selection efficiencies in both European and Asian domestic pigs are reduced, probably caused by changes in both selection constraints and maternal population size following domestication. This study suggests that de novo assembled mitogenomes can be a great boon to uncover the genetic turnover following domestication. Further investigation is warranted to include more samples from the ever-increasing amounts of NGS data to help us to better understand the process of domestication.
Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia
Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various pathways, including the sesquiterpenoid (ko00909) and plant-pathogen interaction (ko03040) pathways. These pathways were related to sesquiterpenoid biosynthesis and defensive responses to chemical stimulation. The transcriptome data of the different parts of the chemically induced A. sinensis trunk provide a rich source of materials for discovering and identifying the genes involved in sesquiterpenoid production and in defensive responses to chemical stimulation. This study is the first to use de novo sequencing and transcriptome assembly for different parts of chemically induced A. sinensis. Results demonstrate that the sesquiterpenoid biosynthesis pathway and WRKY transcription factor play important roles in agarwood formation via chemical induction. The comparative analysis of the transcriptome data of agarwood and A. sinensis lays the foundation
Full Text Available Athetis lepigone Möschler (Lepidoptera: Noctuidae has recently become an important insect pest of maize (Zea mays crops in China. In order to understand the characteristics of the different developmental stages of this pest, we used Illumina short-read sequences to perform de novo transcriptome assembly and gene expression analysis for egg, larva, pupa and adult developmental stages. We obtained 10.08 Gb of raw data from Illumina sequencing and recovered 81,356 unigenes longer than 100 bp through a de novo assembly. The total sequence length reached 49.75 Mb with 858 bp of N50 and an average unigene length of 612 bp. Annotation analysis of predicted proteins indicate that 33,736 unigenes (41.47% of total unigenes are matches to genes in the Genbank Nr database. The unigene sequences were subjected to GO, COG and KEGG functional classification. A large number of differentially expressed genes were recovered by pairwise comparison of the four developmental stages. The most dramatic differences in gene expression were found in the transitions from one stage to another stage. Some of these differentially expressed genes are related to cuticle and wing formation as well as the growth and development. We identified more than 2,500 microsatellite markers that may be used for population studies of A. lepigone. This study lays the foundation for further research on population genetics and gene function analysis in A. lepigone.
Full Text Available Abstract Background The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly. Results We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known. Conclusions Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/.
Full Text Available BACKGROUND: Bemisia tabaci (Gennadius is a phloem-feeding insect poised to become one of the major insect pests in open field and greenhouse production systems throughout the world. The high level of resistance to insecticides is a main factor that hinders continued use of insecticides for suppression of B. tabaci. Despite its prevalence, little is known about B. tabaci at the genome level. To fill this gap, an invasive B. tabaci B biotype was subjected to pyrosequencing-based transcriptome analysis to identify genes and gene networks putatively involved in various physiological and toxicological processes. METHODOLOGY AND PRINCIPAL FINDINGS: Using Roche 454 pyrosequencing, 857,205 reads containing approximately 340 megabases were obtained from the B. tabaci transcriptome. De novo assembly generated 178,669 unigenes including 30,980 from insects, 17,881 from bacteria, and 129,808 from the nohit. A total of 50,835 (28.45% unigenes showed similarity to the non-redundant database in GenBank with a cut-off E-value of 10-5. Among them, 40,611 unigenes were assigned to one or more GO terms and 6,917 unigenes were assigned to 288 known pathways. De novo metatranscriptome analysis revealed highly diverse bacterial symbionts in B. tabaci, and demonstrated the host-symbiont cooperation in amino acid production. In-depth transcriptome analysis indentified putative molecular markers, and genes potentially involved in insecticide resistance and nutrient digestion. The utility of this transcriptome was validated by a thiamethoxam resistance study, in which annotated cytochrome P450 genes were significantly overexpressed in the resistant B. tabaci in comparison to its susceptible counterparts. CONCLUSIONS: This transcriptome/metatranscriptome analysis sheds light on the molecular understanding of symbiosis and insecticide resistance in an agriculturally important phloem-feeding insect pest, and lays the foundation for future functional genomics research of the
Full Text Available Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms
Guo, Yufang; Wiegert-Rininger, Krystle E; Vallejo, Veronica A; Barry, Cornelius S; Warner, Ryan M
Petunia (Petunia × hybrida), derived from a hybrid between P. axillaris and P. integrifolia, is one of the most economically important bedding plant crops and Petunia spp. serve as model systems for investigating the mechanisms underlying diverse mating systems and pollination syndromes. In addition, we have previously described genetic variation and quantitative trait loci (QTL) related to petunia development rate and morphology, which represent important breeding targets for the floriculture industry to improve crop production and performance. Despite the importance of petunia as a crop, the floriculture industry has been slow to adopt marker assisted selection to facilitate breeding strategies and there remains a limited availability of sequences and molecular markers from the genus compared to other economically important members of the Solanaceae family such as tomato, potato and pepper. Here we report the de novo assembly, annotation and characterization of transcriptomes from P. axillaris, P. exserta and P. integrifolia. Each transcriptome assembly was derived from five tissue libraries (callus, 3-week old seedlings, shoot apices, flowers of mixed developmental stages, and trichomes). A total of 74,573, 54,913, and 104,739 assembled transcripts were recovered from P. axillaris, P. exserta and P. integrifolia, respectively and following removal of multiple isoforms, 32,994 P. axillaris, 30,225 P. exserta, and 33,540 P. integrifolia high quality representative transcripts were extracted for annotation and expression analysis. The transcriptome data was mined for single nucleotide polymorphisms (SNP) and simple sequence repeat (SSR) markers, yielding 89,007 high quality SNPs and 2949 SSRs, respectively. 15,701 SNPs were computationally converted into user-friendly cleaved amplified polymorphic sequence (CAPS) markers and a subset of SNP and CAPS markers were experimentally verified. CAPS markers developed from plastochron-related homologous transcripts
Song, Mi Hye; Miliaras, Nicholas B.; Peel, Nina; O'Connell, Kevin F.
Centrioles play an important role in organizing microtubules and are precisely duplicated once per cell cycle. New (daughter) centrioles typically arise in association with existing (mother) centrioles (canonical assembly), suggesting that mother centrioles direct the formation of daughter centrioles. However, under certain circumstances, centrioles can also self-assemble free of an existing centriole (de novo assembly). Recent work indicates that the canonical and de novo pathways utilize a ...
Duan, Jinjie; Sanggaard, Kristian Wejse; Schauser, Leif; Lauridsen, Sanne Enok; Enghild, Jan J; Schierup, Mikkel Heide; Wang, Tobias
Exceptional and extreme feeding behaviour makes the Burmese python (Python bivittatus) an interesting model to study physiological remodelling and metabolic adaptation in response to refeeding after prolonged starvation. In this study, we used transcriptome sequencing of 5 visceral organs during fasting as well as 24 hours and 48 hours after ingestion of a large meal to unravel the postprandial changes in Burmese pythons. We first used the pooled data to perform a de novo assembly of the transcriptome and supplemented this with a proteomic survey of enzymes in the plasma and gastric fluid. We constructed a high-quality transcriptome with 34 423 transcripts, of which 19 713 (57%) were annotated. Among highly expressed genes (fragments per kilo base per million sequenced reads > 100 in 1 tissue), we found that the transition from fasting to digestion was associated with differential expression of 43 genes in the heart, 206 genes in the liver, 114 genes in the stomach, 89 genes in the pancreas, and 158 genes in the intestine. We interrogated the function of these genes to test previous hypotheses on the response to feeding. We also used the transcriptome to identify 314 secreted proteins in the gastric fluid of the python. Digestion was associated with an upregulation of genes related to metabolic processes, and translational changes therefore appear to support the postprandial rise in metabolism. We identify stomach-related proteins from a digesting individual and demonstrate that the sensitivity of modern liquid chromatography/tandem mass spectrometry equipment allows the identification of gastric juice proteins that are present during digestion. © The Authors 2017. Published by Oxford University Press.
Full Text Available Abstract Background Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy. Results We performed de novo transcriptome assembly and digital gene expression (DGE profiling analyses of ‘Suli’ pear (Pyrus pyrifolia white pear group using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp, including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1% unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR. Conclusions The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species.
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Näätsaari, Laura; Krainer, Florian W; Schubert, Michael; Glieder, Anton; Thallinger, Gerhard G
Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes.
Kamenetsky, Rina; Faigenboim, Adi; Shemesh Mayer, Einat; Ben Michael, Tomer; Gershberg, Chen; Kimhi, Sagie; Esquira, Itzhak; Rohkin Shalom, Sarit; Eshel, Dani; Rabinowitch, Haim D; Sherman, Amir
Garlic is cultivated and consumed worldwide as a popular condiment and green vegetable with medicinal and neutraceutical properties. Garlic cultivars do not produce seeds, and therefore, this plant has not been the subject of either classical breeding or genetic studies. However, recent achievements in fertility restoration in a number of genotypes have led to flowering and seed production, thus enabling genetic studies and breeding in garlic. A transcriptome catalogue of fertile garlic was produced from multiplexed gene libraries, using RNA collected from various plant organs, including inflorescences and flowers. Over 32 million 250-bp paired-end reads were assembled into an extensive transcriptome of 240,000 contigs. An abundant transcriptome assembled separately from 102,000 highly expressed contigs was annotated and analyzed for gene ontology and metabolic pathways. Organ-specific analysis showed significant variation of gene expression between plant organs, with the highest number of specific reads in inflorescences and flowers. Analysis of the enriched biological processes and molecular functions revealed characteristic patterns for stress response, flower development and photosynthetic activity. Orthologues of key flowering genes were differentially expressed, not only in reproductive tissues, but also in leaves and bulbs, suggesting their role in flower-signal transduction and the bulbing process. More than 100 variants and isoforms of enzymes involved in organosulfur metabolism were differentially expressed and had organ-specific patterns. In addition to plant genes, viral RNA of at least four garlic viruses was detected, mostly in the roots and cloves, whereas only 1-4% of the reads were found in the foliage leaves. The de novo transcriptome of fertile garlic represents a new resource for research and breeding of this important crop, as well as for the development of effective molecular markers for useful traits, including fertility and seed production
Liu, Yang; Wu, Haoyang; Xie, Qiang; Bu, Wenjun
Erthesina fullo (Thunberg, 1783) is an economically important heteropteran species in China. Since only three nucleotide sequences of this species (COI, 16S rRNA, and 18S rRNA) appear in the GenBank database so far, no analysis of the molecular mechanisms underlying E. fullo's resistance to insecticide and environmental stress has been accomplished. We reported a de novo assembled and annotated transcriptome for adult E. fullo using the Illumina sequence system. A total of 53,359,458 clean reads of 4.8 billion nucleotides (nt) were assembled into 27,488 unigenes with an average length of 750 bp, of which 17,743 (64.55%) were annotated. In the present study, we identified 88 putative cytochrome P450 sequences and analyzed the evolution of cytochrome P450 superfamilies, genes of the CYP3 clan related to metabolizing xenobiotics and plant natural compounds, in E. fullo, increasing the candidate genes for the molecular mechanisms of insecticide resistance in P450. The sequenced transcriptome greatly expands the available genomic information and could allow a better understanding of the mechanisms of insecticide resistance at the systems biology level.
Yu, Guo-Jun; Wang, Man; Huang, Jie; Yin, Ya-Lin; Chen, Yi-Jie; Jiang, Shuai; Jin, Yan-Xia; Lan, Xian-Qing; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui
Background Ganoderma lucidum is a basidiomycete white rot fungus and is of medicinal importance in China, Japan and other countries in the Asiatic region. To date, much research has been performed in identifying the medicinal ingredients in Ganoderma lucidum. Despite its important therapeutic effects in disease, little is known about Ganoderma lucidum at the genomic level. In order to gain a molecular understanding of this fungus, we utilized Illumina high-throughput technology to sequence and analyze the transcriptome of Ganoderma lucidum. Methodology/Principal Findings We obtained 6,439,690 and 6,416,670 high-quality reads from the mycelium and fruiting body of Ganoderma lucidum, and these were assembled to form 18,892 and 27,408 unigenes, respectively. A similarity search was performed against the NCBI non-redundant nucleotide database and a customized database composed of five fungal genomes. 11,098 and 8, 775 unigenes were matched to the NCBI non-redundant nucleotide database and our customized database, respectively. All unigenes were subjected to annotation by Gene Ontology, Eukaryotic Orthologous Group terms and Kyoto Encyclopedia of Genes and Genomes. Differentially expressed genes from the Ganoderma lucidum mycelium and fruiting body stage were analyzed, resulting in the identification of 13 unigenes which are involved in the terpenoid backbone biosynthesis pathway. Quantitative real-time PCR was used to confirm the expression levels of these unigenes. Ganoderma lucidum was also studied for wood degrading activity and a total of 22 putative FOLymes (fungal oxidative lignin enzymes) and 120 CAZymes (carbohydrate-active enzymes) were predicted from our Ganoderma lucidum transcriptome. Conclusions Our study provides comprehensive gene expression information on Ganoderma lucidum at the transcriptional level, which will form the foundation for functional genomics studies in this fungus. The use of Illumina sequencing technology has made de novo transcriptome
Zhu, Fengjiao; Yang, Zongying; Zhang, Yiliu; Hu, Kun; Fang, Wenhong
Enrofloxacin is the most commonly used antibiotic to control diseases in aquatic animals caused by A. hydrophila. This study conducted de novo transcriptome sequencing and compared the global transcriptomes of enrofloxacin-resistant and enrofloxacin-susceptible strains. We got a total of 4,714 unigenes were assembled. Of these, 4,122 were annotated. A total of 3,280 unigenes were assigned to GO, 3,388 unigenes were classified into Cluster of Orthologous Groups of proteins (COG) using BLAST and BLAST2GO software, and 2,568 were mapped onto pathways using the Kyoto Encyclopedia of Gene and Genomes Pathway database. Furthermore, 218 unigenes were deemed to be DEGs. After enrofloxacin treatment, 135 genes were upregulated and 83 genes were downregulated. The GO terms biological process (126 genes) and metabolic process (136 genes) were the most enriched, and the terms for protein folding, response to stress, and SOS response were also significantly enriched. This study identified enrofloxacin treatment affects multiple biological functions of A. hydrophila. Enrofloxacin resistance in A. hydrophila is closely related to the reduction of intracellular drug accumulation caused by ABC transporters and increased expression of topoisomerase IV.
Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou
The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and
Full Text Available BACKGROUND: The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. METHODOLOGY/PRINCIPAL FINDINGS: We obtained a total of 31,282 unigenes (mean length 920 bp using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam. We identified 26,110 (83.47% unigenes and inferred 20,896 (66.8% coding sequences (CDS. Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. CONCLUSIONS/SIGNIFICANCE: This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of
Shen, Di; Wang, Haiping; Wu, Qingjun; Lu, Peng; Qiu, Yang; Song, Jiangping; Zhang, Youjun; Li, Xixiang
Background The diamondback moth (DBM, Plutella xylostella) is a crucifer-specific pest that causes significant crop losses worldwide. Barbarea vulgaris (Brassicaceae) can resist DBM and other herbivorous insects by producing feeding-deterrent triterpenoid saponins. Plant breeders have long aimed to transfer this insect resistance to other crops. However, a lack of knowledge on the biosynthetic pathways and regulatory networks of these insecticidal saponins has hindered their practical application. A pyrosequencing-based transcriptome analysis of B. vulgaris during DBM larval feeding was performed to identify genes and gene networks responsible for saponin biosynthesis and its regulation at the genome level. Principal Findings Approximately 1.22, 1.19, 1.16, 1.23, 1.16, 1.20, and 2.39 giga base pairs of clean nucleotides were generated from B. vulgaris transcriptomes sampled 1, 4, 8, 12, 24, and 48 h after onset of P. xylostella feeding and from non-inoculated controls, respectively. De novo assembly using all data of the seven transcriptomes generated 39,531 unigenes. A total of 37,780 (95.57%) unigenes were annotated, 14,399 of which were assigned to one or more gene ontology terms and 19,620 of which were assigned to 126 known pathways. Expression profiles revealed 2,016–4,685 up-regulated and 557–5188 down-regulated transcripts. Secondary metabolic pathways, such as those of terpenoids, glucosinolates, and phenylpropanoids, and its related regulators were elevated. Candidate genes for the triterpene saponin pathway were found in the transcriptome. Orthological analysis of the transcriptome with four other crucifer transcriptomes identified 592 B. vulgaris-specific gene families with a P-value cutoff of 1e−5. Conclusion This study presents the first comprehensive transcriptome analysis of B. vulgaris subjected to a series of DBM feedings. The biosynthetic and regulatory pathways of triterpenoid saponins and other DBM deterrent metabolites in this plant were
Duan, Xinle; Wang, Kang; Su, Sha; Tian, Ruizheng; Li, Yuting; Chen, Maohua
The bird cherry-oat aphid, Rhopalosiphum padi (L.), is one of the most abundant aphid pests of cereals and has a global distribution. Next-generation sequencing (NGS) is a rapid and efficient method for developing molecular markers. However, transcriptomic and genomic resources of R. padi have not been investigated. In this study, we used transcriptome information obtained by RNA-Seq to develop polymorphic microsatellites for investigating population genetics in this species. The transcriptome of R. padi was sequenced on an Illumina HiSeq 2000 platform. A total of 114.4 million raw reads with a GC content of 40.03% was generated. The raw reads were cleaned and assembled into 29,467 unigenes with an N50 length of 1,580 bp. Using several public databases, 82.47% of these unigenes were annotated. Of the annotated unigenes, 8,022 were assigned to COG pathways, 9,895 were assigned to GO pathways, and 14,586 were mapped to 257 KEGG pathways. A total of 7,936 potential microsatellites were identified in 5,564 unigenes, 60 of which were selected randomly and amplified using specific primer pairs. Fourteen loci were found to be polymorphic in the four R. padi populations. The transcriptomic data presented herein will facilitate gene discovery, gene analyses, and development of molecular markers for future studies of R. padi and other closely related aphid species.
Lee, Jungeun; Noh, Eun Kyeung; Choi, Hyung-Seok; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok
Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been studied as an extremophile that has successfully adapted to marginal land with the harshest environment for terrestrial plants. However, limited genetic research has focused on this species due to the lack of genomic resources. Here, we present the first de novo assembly of its transcriptome by massive parallel sequencing and its expression profile using D. antarctica grown under various stress conditions. Total sequence reads generated by pyrosequencing were assembled into 60,765 unigenes (28,177 contigs and 32,588 singletons). A total of 29,173 unique protein-coding genes were identified based on sequence similarities to known proteins. The combined results from all three stress conditions indicated differential expression of 3,110 genes. Quantitative reverse transcription polymerase chain reaction showed that several well-known stress-responsive genes encoding late embryogenesis abundant protein, dehydrin 1, and ice recrystallization inhibition protein were induced dramatically and that genes encoding U-box-domain-containing protein, electron transfer flavoprotein-ubiquinone, and F-box-containing protein were induced by abiotic stressors in a manner conserved with other plant species. We identified more than 2,000 simple sequence repeats that can be developed as functional molecular markers. This dataset is the most comprehensive transcriptome resource currently available for D. antarctica and is therefore expected to be an important foundation for future genetic studies of grasses and extremophiles.
Rozov, Roye; Goldshlager, Gil; Halperin, Eran; Shamir, Ron
We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased. Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata-coverage counts collected at junction k-mers and connections bridging between junction pairs-contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency-namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14-110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available. Faucet is available at https://github.com/Shamir-Lab/Faucet. firstname.lastname@example.org or email@example.com. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Full Text Available BACKGROUND: Yellow horn (Xanthoceras sorbifolia Bunge is an oil-rich seed shrub that grows well in cold, barren environments and has great potential for biodiesel production in China. However, the limited genetic data means that little information about the key genes involved in oil biosynthesis is available, which limits further improvement of this species. In this study, we describe sequencing and de novo transcriptome assembly to produce the first comprehensive and integrated genomic resource for yellow horn and identify the pathways and key genes related to oil accumulation. In addition, potential molecular markers were identified and compiled. METHODOLOGY/PRINCIPAL FINDINGS: Total RNA was isolated from 30 plants from two regions, including buds, leaves, flowers and seeds. Equal quantities of RNA from these tissues were pooled to construct a cDNA library for 454 pyrosequencing. A total of 1,147,624 high-quality reads with total and average lengths of 530.6 Mb and 462 bp, respectively, were generated. These reads were assembled into 51,867 unigenes, corresponding to a total of 36.1 Mb with a mean length, N50 and median of 696, 928 and 570 bp, respectively. Of the unigenes, 17,541 (33.82% were unmatched in any public protein databases. We identified 281 unigenes that may be involved in de novo fatty acid (FA and triacylglycerol (TAG biosynthesis and metabolism. Furthermore, 6,707 SSRs, 16,925 SNPs and 6,201 InDels with high-confidence were also identified in this study. CONCLUSIONS: This transcriptome represents a new functional genomics resource and a foundation for further studies on the metabolic engineering of yellow horn to increase oil content and modify oil composition. The potential molecular markers identified in this study provide a basis for polymorphism analysis of Xanthoceras, and even Sapindaceae; they will also accelerate the process of breeding new varieties with better agronomic characteristics.
Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K
Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.
Song, Mi Hye; Miliaras, Nicholas B; Peel, Nina; O'Connell, Kevin F
Centrioles play an important role in organizing microtubules and are precisely duplicated once per cell cycle. New (daughter) centrioles typically arise in association with existing (mother) centrioles (canonical assembly), suggesting that mother centrioles direct the formation of daughter centrioles. However, under certain circumstances, centrioles can also selfassemble free of an existing centriole (de novo assembly). Recent work indicates that the canonical and de novo pathways utilize a common mechanism and that a mother centriole spatially constrains the self-assembly process to occur within its immediate vicinity. Other recently identified mechanisms further regulate canonical assembly so that during each cell cycle, one and only one daughter centriole is assembled per mother centriole.
Full Text Available BACKGROUND: Goosegrass (Eleusine indica L., a serious annual weed in the world, has evolved resistance to several herbicides including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. RESULTS: The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. CONCLUSION: This is the first large-scale transcriptome sequencing of E. indica using the Illumina platform. Potential genes involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat resistance at the molecular level in goosegrass.
An, Jing; Shen, Xuefeng; Ma, Qibin; Yang, Cunyi; Liu, Simin; Chen, Yong
Goosegrass (Eleusine indica L.), a serious annual weed in the world, has evolved resistance to several herbicides including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. This is the first large-scale transcriptome sequencing of E. indica using the Illumina platform. Potential genes involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat resistance at the molecular level in goosegrass.
Shu, Cheng-Cheng; Wang, Dong; Guo, Jing; Song, Jia-Ming; Chen, Shou-Wen; Chen, Ling-Ling; Gao, Jun-Xiang
As an industrial bacterium, Bacillus licheniformis DW2 produces bacitracin which is an important antibiotic for many pathogenic microorganisms. Our previous study showed AbrB-knockout could significantly increase the production of bacitracin. Accordingly, it was meaningful to understand its genome features, expression differences between wild and AbrB-knockout (ΔAbrB) strains, and the regulation of bacitracin biosynthesis. Here, we sequenced, de novo assembled and annotated its genome, and also sequenced the transcriptomes in three growth phases. The genome of DW2 contained a DNA molecule of 4,468,952 bp with 45.93% GC content and 4,717 protein coding genes. The transcriptome reads were mapped to the assembled genome, and obtained 4,102∼4,536 expressed genes from different samples. We investigated transcription changes in B. licheniformis DW2 and showed that ΔAbrB caused hundreds of genes up-regulation and down-regulation in different growth phases. We identified a complete bacitracin synthetase gene cluster, including the location and length of bacABC, bcrABC, and bacT, as well as their arrangement. The gene cluster bcrABC were significantly up-regulated in ΔAbrB strain, which supported the hypothesis in previous study of bcrABC transporting bacitracin out of the cell to avoid self-intoxication, and was consistent with the previous experimental result that ΔAbrB could yield more bacitracin. This study provided a high quality reference genome for B. licheniformis DW2, and the transcriptome data depicted global alterations across two strains and three phases offered an understanding of AbrB regulation and bacitracin biosynthesis through gene expression. PMID:29599755
Ruttink, Tom; Sterck, Lieven; Rohde, Antje
to outbreeding crop species hamper De Bruijn Graph-based de novo assembly algorithms, causing transcript fragmentation and the redundant assembly of allelic contigs. If multiple genotypes are sequenced to study genetic diversity, primary de novo assembly is best performed per genotype to limit the level......Despite current advances in next-generation sequencing data analysis procedures, de novo assembly of a reference sequence required for SNP discovery and expression analysis is still a major challenge in genetically uncharacterized, highly heterozygous species. High levels of polymorphism inherent...... of polymorphism and avoid transcript fragmentation. Here, we propose an Orthology Guided Assembly procedure that first uses sequence similarity (tBLASTn) to proteins of a model species to select allelic and fragmented contigs from all genotypes and then performs CAP3 clustering on a gene-by-gene basis. Thus, we...
Zhang, Y Z; Xu, S Z; Cheng, Y W; Ya, H Y; Han, J M
This study aimed to analyze the transcriptome profile of red lettuce and identify the genes involved in anthocyanin accumulation. Red leaf lettuce is a popular vegetable and popular due to its high anthocyanin content. However, there is limited information available about the genes involved in anthocyanin biosynthesis in this species. In this study, transcriptomes of 15-day-old seedlings and 40-day-old red lettuce leaves were analyzed using an Illuminia HiseqTM 2500 platform. A total of 10.6 GB clean data were obtained and de novo assembled into 83,333 unigenes with an N50 of 1067. After annotation against public databases, 51,850 unigene sequences were identified, among which 46,087 were annotated in the NCBI non-redundant protein database, and 41,752 were annotated in the Swiss-Prot database. A total of 9125 unigenes were mapped into 163 pathways using the Kyoto Encyclopedia of Genes and Genomes database. Thirty-four structural genes were found to cover the main steps of the anthocyanin pathway, including chalcone synthase, chalcone isomerase, flavanone 3-hydroxylase, flavonoid 3'-hydroxylase, flavonoid 3',5'-hydroxylase, dihydroflavonol 4-reductase, and anthocyanidin synthase. Seven MYB, three bHLH, and two WD40 genes, considered anthocyanin regulatory genes, were also identified. In addition, 3607 simple sequence repeat (SSR) markers were identified from 2916 unigenes. This research uncovered the transcriptomic characteristics of red leaf lettuce seedlings and mature plants. The identified candidate genes related to anthocyanin biosynthesis and the detected SSRs provide useful tools for future molecular breeding studies.
Full Text Available Faba bean is an important food crop worldwide. However, progress in faba bean genomics lags far behind that of model systems due to limited availability of genetic and genomic information. Using the Illumina platform the faba bean transcriptome from leaves of two lines (29H and Vf136 subjected to Ascochyta fabae infection have been characterized. De novo transcriptome assembly provided a total of 39,185 different transcripts that were functionally annotated, and among these, 13,266 were assigned to gene ontology against Arabidopsis. Quality of the assembly was validated by RT-qPCR amplification of selected transcripts differentially expressed. Comparison of faba bean transcripts with those of better-characterized plant genomes such as Arabidopsis thaliana, Medicago truncatula and Cicer arietinum revealed a sequence similarity of 68.3%, 72.8% and 81.27%, respectively. Moreover, 39,060 single nucleotide polymorphism (SNP and 3,669 InDels were identified for genotyping applications. Mapping of the sequence reads generated onto the assembled transcripts showed that 393 and 457 transcripts were overexpressed in the resistant (29H and susceptible genotype (Vf136, respectively. Transcripts involved in plant-pathogen interactions such as leucine rich proteins (LRR or plant growth regulators involved in plant adaptation to abiotic and biotic stresses were found to be differently expressed in the resistant line. The results reported here represent the most comprehensive transcript database developed so far in faba bean, providing valuable information that could be used to gain insight into the pathways involved in the resistance mechanism against A. fabae and to identify potential resistance genes to be further used in marker assisted selection.
Soltis Douglas E
Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance
Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher
Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.
Full Text Available Syphacia muris is a ubiquitous nematode parasite and common contaminant of laboratory rats. A lthough S. muris infection is considered symptomless, it has some effects on the host’s immunity and therefore can interfere with experimental settings and interrupt final results. However, the molecular mechanisms involved in the alteration within the host’s immunity remain unclear because of the absence of information about mRNA expressed in this parasite. In this study we performed the transcriptome profiling of S. muris by next-generation sequencing. After de novo assembly and annotation, 14,821 contigs were found to have a sequence homology with any nematode sequence. Gene ontology analysis showed that the majority of the expressed genes are involved in cellular process, binding, and catalytic activity. Although the rate of expressed genes involved in the immune system was low, we found candidate genes that might be involved in the alteration within the host’s immunity by regulating the host’s innate immune response.
Arjan P Palstra
Full Text Available Deep RNA sequencing (RNA-seq was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss with the specific objective to identify expressed genes and quantify the transcriptomic effects of swimming-induced exercise. Pubertal autumn-spawning seawater-raised female rainbow trout were rested (n = 10 or swum (n = 10 for 1176 km at 0.75 body-lengths per second in a 6,000-L swim-flume under reproductive conditions for 40 days. Red and white muscle RNA of exercised and non-exercised fish (4 lanes was sequenced and resulted in 15-17 million reads per lane that, after de novo assembly, yielded 149,159 red and 118,572 white muscle contigs. Most contigs were annotated using an iterative homology search strategy against salmonid ESTs, the zebrafish Danio rerio genome and general Metazoan genes. When selecting for large contigs (>500 nucleotides, a number of novel rainbow trout gene sequences were identified in this study: 1,085 and 1,228 novel gene sequences for red and white muscle, respectively, which included a number of important molecules for skeletal muscle function. Transcriptomic analysis revealed that sustained swimming increased transcriptional activity in skeletal muscle and specifically an up-regulation of genes involved in muscle growth and developmental processes in white muscle. The unique collection of transcripts will contribute to our understanding of red and white muscle physiology, specifically during the long-term reproductive migration of salmonids.
Shao, En-Si; Lin, Gui-Fang; Liu, Sijun; Ma, Xiao-Li; Chen, Ming-Feng; Lin, Li; Wu, Song-Qing; Sha, Li; Liu, Zhao-Xia; Hu, Xiao-Hua; Guan, Xiong; Zhang, Ling-Ling
Tea production has been significantly impacted by the false-eye leafhopper, Empoasca vitis (Göthe), around Asia. To identify the key genes which are responsible for nutrition absorption, xenobiotic metabolism and immune response, the transcriptome of either alimentary tracts or bodies minus alimentary tract of E. vitis was sequenced and analyzed. Over 31 million reads were obtained from Illumina sequencing. De novo sequence assembly resulted in 52,182 unigenes with a mean size of 848nt. The assembled unigenes were then annotated using various databases. Transcripts of at least 566 digestion-, 224 detoxification-, and 288 immune-related putative genes in E. vitis were identified. In addition, relative expression of highly abundant transcripts was verified through quantitative real-time PCR. Results from this investigation provide genomic information about E. vitis, which will be helpful in further study of E. vitis biology and in the development of novel strategies to control this devastating pest. Copyright © 2016 Elsevier Inc. All rights reserved.
Full Text Available The Chinese giant salamander (Andrias davidianus is an economically important animal on academic value. However, the genomic information of this species has been less studied. In our study, the transcripts of A. davidianus were obtained by RNA-seq to conduct a transcriptomic analysis. In total 132,912 unigenes were generated with an average length of 690 bp and N50 of 1263 bp by de novo assembly using Trinity software. Using a sequence similarity search against the nine public databases (CDD, KOG, NR, NT, PFAM, Swiss-prot, TrEMBL, GO and KEGG databases, a total of 24,049, 18,406, 36,711, 15,858, 20,500, 27,515, 36,705, 28,879 and 10,958 unigenes were annotated in databases, respectively. Of these, 6323 unigenes were annotated in all database and 39,672 unigenes were annotated in at least one database. Blasted with KEGG pathway, 10,958 unigenes were annotated, and it was divided into 343 categories according to different pathways. In addition, we also identified 29,790 SSRs. This study provided a valuable resource for understanding transcriptomic information of A. davidianus and laid a foundation for further research on functional gene cloning, genomics, genetic diversity analysis and molecular marker exploitation in A. davidianus.
Seim, Inge; Ma, Siming; Zhou, Xuming; Gerashchenko, Maxim V.; Lee, Sang-Goo; Suydam, Robert; George, John C.; Bickham, John W.; Gladyshev, Vadim N.
Mammals vary dramatically in lifespan, by at least two-orders of magnitude, but the molecular basis for this difference remains largely unknown. The bowhead whale Balaena mysticetus is the longest-lived mammal known, with an estimated maximal lifespan in excess of two hundred years. It is also one of the two largest animals and the most cold-adapted baleen whale species. Here, we report the first genome-wide gene expression analyses of the bowhead whale, based on the de novo assembly of its transcriptome. Bowhead whale or cetacean-specific changes in gene expression were identified in the liver, kidney and heart, and complemented with analyses of positively selected genes. Changes associated with altered insulin signaling and other gene expression patterns could help explain the remarkable longevity of bowhead whales as well as their adaptation to a lipid-rich diet. The data also reveal parallels in candidate longevity adaptations of the bowhead whale, naked mole rat and Brandt's bat. The bowhead whale transcriptome is a valuable resource for the study of this remarkable animal, including the evolution of longevity and its important correlates such as resistance to cancer and other diseases. PMID:25411232
Kumar, Vikas; Kutschera, Verena E; Nilsson, Maria A; Janke, Axel
The genus Vulpes (true foxes) comprises numerous species that inhabit a wide range of habitats and climatic conditions, including one species, the Arctic fox (Vulpes lagopus) which is adapted to the arctic region. A close relative to the Arctic fox, the red fox (Vulpes vulpes), occurs in subarctic to subtropical habitats. To study the genetic basis of their adaptations to different environments, transcriptome sequences from two Arctic foxes and one red fox individual were generated and analyzed for signatures of positive selection. In addition, the data allowed for a phylogenetic analysis and divergence time estimate between the two fox species. The de novo assembly of reads resulted in more than 160,000 contigs/transcripts per individual. Approximately 17,000 homologous genes were identified using human and the non-redundant databases. Positive selection analyses revealed several genes involved in various metabolic and molecular processes such as energy metabolism, cardiac gene regulation, apoptosis and blood coagulation to be under positive selection in foxes. Branch site tests identified four genes to be under positive selection in the Arctic fox transcriptome, two of which are fat metabolism genes. In the red fox transcriptome eight genes are under positive selection, including molecular process genes, notably genes involved in ATP metabolism. Analysis of the three transcriptomes and five Sanger re-sequenced genes in additional individuals identified a lower genetic variability within Arctic foxes compared to red foxes, which is consistent with distribution range differences and demographic responses to past climatic fluctuations. A phylogenomic analysis estimated that the Arctic and red fox lineages diverged about three million years ago. Transcriptome data are an economic way to generate genomic resources for evolutionary studies. Despite not representing an entire genome, this transcriptome analysis identified numerous genes that are relevant to arctic
Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong
The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana
Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.
In a collaboration with National Center for Genome Resources researchers, we sequenced and assembled the testes transcriptome derived from the Pacora, Panama, production plant strain of the New World Screwworm, Cochliomyia hominivorax. This transcriptome contains 4,149 unigenes and the Transcriptome...
Valenzuela-Muñoz, Valentina; Sturm, Armin; Gallardo-Escárate, Cristian
ATP-binding cassette (ABC) protein family encode for membrane proteins involved in the transport of various biomolecules through the cellular membrane. These proteins have been identified in all taxa and present important physiological functions, including the process of insecticide detoxification in arthropods. For that reason the ectoparasite Caligus rogercresseyi represents a model species for understanding the molecular underpinnings involved in insecticide drug resistance. llumina sequencing was performed using sea lice exposed to 2 and 3 ppb of deltamethrin and azamethiphos. Contigs obtained from de novo assembly were annotated by Blastx. RNA-Seq analysis was performed and validated by qPCR analysis. From the transcriptome database of C. rogercresseyi, 57 putative members of ABC protein sequences were identified and phylogenetically classified into the eight subfamilies described for ABC transporters in arthropods. Transcriptomic profiles for ABC proteins subfamilies were evaluated throughout C. rogercresseyi development. Moreover, RNA-Seq analysis was performed for adult male and female salmon lice exposed to the delousing drugs azamethiphos and deltamethrin. High transcript levels of the ABCB and ABCC subfamilies were evidenced. Furthermore, SNPs mining was carried out for the ABC proteins sequences, revealing pivotal genomic information. The present study gives a comprehensive transcriptome analysis of ABC proteins from C. rogercresseyi, providing relevant information about transporter roles during ontogeny and in relation to delousing drug responses in salmon lice. This genomic information represents a valuable tool for pest management in the Chilean salmon aquaculture industry.
Full Text Available BACKGROUND: Host and parasitoid interaction is one of the most fascinating relationships of insects, which is currently receiving an increasing interest. Understanding the mechanisms evolved by the parasitoids to evade or suppress the host immune system is important for dissecting this interaction, while it was still poorly known. In order to gain insight into the immune response of Tenebrio molitor to parasitization by Scleroderma guani, the transcriptome of T. molitor pupae was sequenced with focus on immune-related gene, and the non-parasitized and parasitized T. molitor pupae were analyzed by digital gene expression (DGE analysis with special emphasis on parasitoid-induced immune-related genes using Illumina sequencing. METHODOLOGY/PRINCIPAL FINDINGS: In a single run, 264,698 raw reads were obtained. De novo assembly generated 71,514 unigenes with mean length of 424 bp. Of those unigenes, 37,373 (52.26% showed similarity to the known proteins in the NCBI nr database. Via analysis of the transcriptome data in depth, 430 unigenes related to immunity were identified. DGE analysis revealed that parasitization by S. guani had considerable impacts on the transcriptome profile of T. molitor pupae, as indicated by the significant up- or down-regulation of 3,431 parasitism-responsive transcripts. The expression of a total of 74 unigenes involved in immune response of T. molitor was significantly altered after parasitization. CONCLUSIONS/SIGNIFICANCE: obtained T. molitor transcriptome, in addition to establishing a fundamental resource for further research on functional genomics, has allowed the discovery of a large group of immune genes that might provide a meaningful framework to better understand the immune response in this species and other beetles. The DGE profiling data provides comprehensive T. molitor immune gene expression information at the transcriptional level following parasitization, and sheds valuable light on the molecular
Zhu, Jia-Ying; Yang, Pu; Zhang, Zhong; Wu, Guo-Xing; Yang, Bin
Host and parasitoid interaction is one of the most fascinating relationships of insects, which is currently receiving an increasing interest. Understanding the mechanisms evolved by the parasitoids to evade or suppress the host immune system is important for dissecting this interaction, while it was still poorly known. In order to gain insight into the immune response of Tenebrio molitor to parasitization by Scleroderma guani, the transcriptome of T. molitor pupae was sequenced with focus on immune-related gene, and the non-parasitized and parasitized T. molitor pupae were analyzed by digital gene expression (DGE) analysis with special emphasis on parasitoid-induced immune-related genes using Illumina sequencing. In a single run, 264,698 raw reads were obtained. De novo assembly generated 71,514 unigenes with mean length of 424 bp. Of those unigenes, 37,373 (52.26%) showed similarity to the known proteins in the NCBI nr database. Via analysis of the transcriptome data in depth, 430 unigenes related to immunity were identified. DGE analysis revealed that parasitization by S. guani had considerable impacts on the transcriptome profile of T. molitor pupae, as indicated by the significant up- or down-regulation of 3,431 parasitism-responsive transcripts. The expression of a total of 74 unigenes involved in immune response of T. molitor was significantly altered after parasitization. obtained T. molitor transcriptome, in addition to establishing a fundamental resource for further research on functional genomics, has allowed the discovery of a large group of immune genes that might provide a meaningful framework to better understand the immune response in this species and other beetles. The DGE profiling data provides comprehensive T. molitor immune gene expression information at the transcriptional level following parasitization, and sheds valuable light on the molecular understanding of the host-parasitoid interaction.
Zhu, Jia-Ying; Yang, Pu; Zhang, Zhong; Wu, Guo-Xing; Yang, Bin
Background Host and parasitoid interaction is one of the most fascinating relationships of insects, which is currently receiving an increasing interest. Understanding the mechanisms evolved by the parasitoids to evade or suppress the host immune system is important for dissecting this interaction, while it was still poorly known. In order to gain insight into the immune response of Tenebrio molitor to parasitization by Scleroderma guani, the transcriptome of T. molitor pupae was sequenced with focus on immune-related gene, and the non-parasitized and parasitized T. molitor pupae were analyzed by digital gene expression (DGE) analysis with special emphasis on parasitoid-induced immune-related genes using Illumina sequencing. Methodology/Principal Findings In a single run, 264,698 raw reads were obtained. De novo assembly generated 71,514 unigenes with mean length of 424 bp. Of those unigenes, 37,373 (52.26%) showed similarity to the known proteins in the NCBI nr database. Via analysis of the transcriptome data in depth, 430 unigenes related to immunity were identified. DGE analysis revealed that parasitization by S. guani had considerable impacts on the transcriptome profile of T. molitor pupae, as indicated by the significant up- or down-regulation of 3,431 parasitism-responsive transcripts. The expression of a total of 74 unigenes involved in immune response of T. molitor was significantly altered after parasitization. Conclusions/Significance obtained T. molitor transcriptome, in addition to establishing a fundamental resource for further research on functional genomics, has allowed the discovery of a large group of immune genes that might provide a meaningful framework to better understand the immune response in this species and other beetles. The DGE profiling data provides comprehensive T. molitor immune gene expression information at the transcriptional level following parasitization, and sheds valuable light on the molecular understanding of the host
Christopher N. Balakrishnan
Full Text Available Emberizid sparrows (emberizidae have played a prominent role in the study of avian vocal communication and social behavior. We present here brain transcriptomes for three emberizid model systems, song sparrow Melospiza melodia, white-throated sparrow Zonotrichia albicollis, and Gambel’s white-crowned sparrow Zonotrichia leucophrys gambelii. Each of the assemblies covered fully or in part, over 89% of the previously annotated protein coding genes in the zebra finch Taeniopygia guttata, with 16,846, 15,805, and 16,646 unique BLAST hits in song, white-throated and white-crowned sparrows, respectively. As in previous studies, we find tissue of origin (auditory forebrain versus hypothalamus and whole brain as an important determinant of overall expression profile. We also demonstrate the successful isolation of RNA and RNA-sequencing from post-mortem samples from building strikes and suggest that such an approach could be useful when traditional sampling opportunities are limited. These transcriptomes will be an important resource for the study of social behavior in birds and for data driven annotation of forthcoming whole genome sequences for these and other bird species.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Full Text Available Non-target-site resistance (NTSR to herbicides is a worldwide concern for weed control. However, as the dominant NTSR mechanism in weeds, metabolic resistance is not yet well-characterized at the genetic level. For this study, we have identified a shortawn foxtail (Alopecurus aequalis Sobol. population displaying both TSR and NTSR to mesosulfuron-methyl and fenoxaprop-P-ethyl, yet the molecular basis for this NTSR remains unclear. To investigate the mechanisms of metabolic resistance, an RNA-Seq transcriptome analysis was used to find candidate genes that may confer metabolic resistance to the herbicide mesosulfuron-methyl in this plant population. The RNA-Seq libraries generated 831,846,736 clean reads. The de novo transcriptome assembly yielded 95,479 unigenes (averaging 944 bp in length that were assigned putative annotations. Among these, a total of 29,889 unigenes were assigned to 67 GO terms that contained three main categories, and 14,246 unigenes assigned to 32 predicted KEGG metabolic pathways. Global gene expression was measured using the reads generated from the untreated control (CK, water-only control (WCK, and mesosulfuron-methyl treatment (T of R and susceptible (S. Contigs that showed expression differences between mesosulfuron-methyl-treated R and S biotypes, and between mesosulfuron-methyl-treated, water-treated and untreated R plants were selected for further quantitative real-time PCR (qRT-PCR validation analyses. Seventeen contigs were consistently highly expressed in the resistant A. aequalis plants, including four cytochrome P450 monooxygenase (CytP450 genes, two glutathione S-transferase (GST genes, two glucosyltransferase (GT genes, two ATP-binding cassette (ABC transporter genes, and seven additional contigs with functional annotations related to oxidation, hydrolysis, and plant stress physiology. These 17 contigs could serve as major candidate genes for contributing to metabolic mesosulfuron-methyl resistance; hence
Yu, Jinwen; Liu, Zhiyu; Jiang, Wen; Wang, Guansong; Mao, Chengde
Rational, de novo design of RNA nanostructures can potentially integrate a wide array of structural and functional diversities. Such nanostructures have great promises in biomedical applications. Despite impressive progress in this field, all RNA building blocks (or tiles) reported so far are not geometrically well defined. They are generally flexible and can only assemble into a mixture of complexes with different sizes. To achieve defined structures, multiple tiles with different sequences are needed. In this study, we design an RNA tile that can homo-oligomerize into a uniform RNA nanostructure. The designed RNA nanostructure is characterized by gel electrophoresis, atomic force microscopy and cryogenic electron microscopy imaging. We believe that development along this line would help RNA nanotechnology to reach the structural control that is currently associated with DNA nanotechnology.
Full Text Available Grouper is one of the favorite sea food resources in Southeast Asia. However, the outbreaks of the viral nervous necrosis (VNN disease due to nervous necrosis virus (NNV infection have caused mass mortality of grouper larvae. Many aqua-farms have suffered substantial financial loss due to the occurrence of VNN. To better understand the infection mechanism of NNV, we performed the transcriptome analysis of sevenband grouper brain tissue, the main target of NNV infection. After artificial NNV challenge, transcriptome of brain tissues of sevenband grouper was subjected to next generation sequencing (NGS using an Illumina Hi-seq 2500 system. Both mRNAs from pooled samples of mock and NNV-infected sevenband grouper brains were sequenced. Clean reads of mock and NNV-infected samples were de novo assembled and obtained 104,348 unigenes. In addition, 628 differentially expressed genes (DEGs in response to NNV infection were identified. This result could provide critical information not only for the identification of genes involved in NNV infection, but for the understanding of the response of sevenband groupers to NNV infection.
Full Text Available The Chinese salamander (Hynobius chinensis, an endangered amphibian species of salamander endemic to China, has attracted much attention because of its value of studying paleontology evolutionary history and decreasing population size. Despite increasing interest in the Hynobius chinensis genome, genomic resources for the species are still very limited. A comprehensive transcriptome of Hynobius chinensis, which will provide a resource for genome annotation, candidate genes identification and molecular marker development should be generated to supplement it.We performed a de novo assembly of Hynobius chinensis transcriptome by Illumina sequencing. A total of 148,510 nonredundant unigenes with an average length of approximately 580 bp were obtained. In all, 60,388 (40.66% unigenes showed homologous matches in at least one database and 33,537 (22.58% unigenes were annotated by all four databases. In total, 41,553 unigenes were categorized into 62 sub-categories by BLAST2GO search, and 19,468 transcripts were assigned to 140 KEGG pathways. A large number of unigenes involved in immune system, local adaptation, reproduction and sex determination were identified, as well as 31,982 simple sequence repeats (SSRs and 460,923 putative single nucleotide polymorphisms (SNPs.This dataset represents the first transcriptome analysis of the Chinese salamander (Hynobius chinensis, an endangered species, to be also the first time of hynobiidae. The transcriptome will provide valuable resource for further research in discovery of new genes, protection of population, adaptive evolution and survey of various pathways, as well as development of molecule markers in Chinese salamander; and reference information for closely related species.
A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera, Demospongiae, Poecilosclerida): transcriptome sequencing, characterization and overview of the gene expression along three life cycle stages.
Pérez-Porro, A R; Navarro-Gómez, D; Uriz, M J; Giribet, G
Sponges can be dominant organisms in many marine and freshwater habitats where they play essential ecological roles. They also represent a key group to address important questions in early metazoan evolution. Recent approaches for improving knowledge on sponge biological and ecological functions as well as on animal evolution have focused on the genetic toolkits involved in ecological responses to environmental changes (biotic and abiotic), development and reproduction. These approaches are possible thanks to newly available, massive sequencing technologies-such as the Illumina platform, which facilitate genome and transcriptome sequencing in a cost-effective manner. Here we present the first NGS (next-generation sequencing) approach to understanding the life cycle of an encrusting marine sponge. For this we sequenced libraries of three different life cycle stages of the Mediterranean sponge Crella elegans and generated de novo transcriptome assemblies. Three assemblies were based on sponge tissue of a particular life cycle stage, including non-reproductive tissue, tissue with sperm cysts and tissue with larvae. The fourth assembly pooled the data from all three stages. By aggregating data from all the different life cycle stages we obtained a higher total number of contigs, contigs with blast hit and annotated contigs than from one stage-based assemblies. In that multi-stage assembly we obtained a larger number of the developmental regulatory genes known for metazoans than in any other assembly. We also advance the differential expression of selected genes in the three life cycle stages to explore the potential of RNA-seq for improving knowledge on functional processes along the sponge life cycle. © 2013 Blackwell Publishing Ltd.
Full Text Available Angiosperms are renown for their diversity of flower colors. Often considered adaptations to pollinators, the most common underlying pigments, anthocyanins, are also involved in plants' stress response. Although the anthocyanin biosynthetic pathway is well characterized across many angiosperms and is composed of a few candidate genes, the consequences of blocking this pathway and producing white flowers has not been investigated at the transcriptome scale. We take a transcriptome-wide approach to compare expression differences between purple and white petal buds in the arctic mustard, Parrya nudicaulis, to determine which genes' expression are consistently correlated with flower color. Using mRNA-Seq and de novo transcriptome assembly, we assembled an average of 722 bp per gene (49.81% coding sequence based on the A. thaliana homolog for 12,795 genes from the petal buds of a pair of purple and white samples. Our results correlate strongly with qRT-PCR analysis of nine candidate genes in the anthocyanin biosynthetic pathway where chalcone synthase has the greatest difference in expression between color morphs (P/W = ∼7×. Among the most consistently differentially expressed genes between purple and white samples, we found 3× more genes with higher expression in white petals than in purple petals. These include four unknown genes, two drought-response genes (CDSP32, ERD5, a cold-response gene (GR-RBP2, and a pathogen defense gene (DND1. Gene ontology analysis of the top 2% of genes with greater expression in white relative to purple petals revealed enrichment in genes associated with stress responses including cold, drought and pathogen defense. Unlike the uniform downregulation of chalcone synthase that may be directly involved in the loss of petal anthocyanins, the variable expression of several genes with greater expression in white petals suggest that the physiological and ecological consequences of having white petals may be
Hwang, Hwan-Su; Lee, Hyoshin; Choi, Yong Eui
Eleutherococcus senticosus, Siberian ginseng, is a highly valued woody medicinal plant belonging to the family Araliaceae. E. senticosus produces a rich variety of saponins such as oleanane-type, noroleanane-type, 29-hydroxyoleanan-type, and lupane-type saponins. Genomic or transcriptomic approaches have not been used to investigate the saponin biosynthetic pathway in this plant. In this study, de novo sequencing was performed to select candidate genes involved in the saponin biosynthetic pathway. A half-plate 454 pyrosequencing run produced 627,923 high-quality reads with an average sequence length of 422 bases. De novo assembly generated 72,811 unique sequences, including 15,217 contigs and 57,594 singletons. Approximately 48,300 (66.3%) unique sequences were annotated using BLAST similarity searches. All of the mevalonate pathway genes for saponin biosynthesis starting from acetyl-CoA were isolated. Moreover, 206 reads of cytochrome P450 (CYP) and 145 reads of uridine diphosphate glycosyltransferase (UGT) sequences were isolated. Based on methyl jasmonate (MeJA) treatment and real-time PCR (qPCR) analysis, 3 CYPs and 3 UGTs were finally selected as candidate genes involved in the saponin biosynthetic pathway. The identified sequences associated with saponin biosynthesis will facilitate the study of the functional genomics of saponin biosynthesis and genetic engineering of E. senticosus.
Byadgi, Omkar; Chen, Yao-Chung; Barnes, Andrew C; Tsai, Ming-An; Wang, Pei-Chyi; Chen, Shih-Chu
Grey mullet (Mugil cephalus) is an economically important fish species in Taiwan mariculture industry. Moreover, grey mullet are common hosts of a bacterial infection by Lactococcus garvieae. However, until now the information related to the immune system of grey mullet is unclear. Therefore, to understand the molecular basis underlying the host immune response to L. garvieae infection, Illumina HiSeq™ 2000 was used to analyse the head kidney and spleen transcriptome of infected grey mullet. De novo assembly of paired-end reads yielded 55,203 unigenes. Comparative analysis of the expression profiles between bacterial challenge fish and control fish identified a total of 7192 from head kidney and 7280 in spleen differentially expressed genes (P grey mullet to Lactococcus garvieae, carrying out detailed functional analysis of these genes and developing strategies for efficient immune protection against infections in grey mullet. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.
Ruttink, Tom; Roldán-Ruiz, Isabel; Asp, Torben
To advance the application of molecular breeding in Lolium perenne, we have generated a sequence resource to facilitate gene discovery and SNP marker development. Illumina GAII transcriptome sequencing was performed on meristem-enriched samples of 14 Lolium genotypes. De novo assemblies for indiv......To advance the application of molecular breeding in Lolium perenne, we have generated a sequence resource to facilitate gene discovery and SNP marker development. Illumina GAII transcriptome sequencing was performed on meristem-enriched samples of 14 Lolium genotypes. De novo assemblies...... of SNP markers in selected candidate genes. In parallel, a germplasm collection of 602 Lolium genotypes was established and is being phenotyped for plant architecture, reproductive characteristics, flowering time, and forage quality traits. We will test through association genetics whether phenotypic...
De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing
Zhuang Hu; Tian Zhang; Xiao-Xiao Gao; Yang Wang; Qiang Zhang; Hui-Juan Zhou; Gui-Fang Zhao; Ma-Li Wang; Keith E. Woeste; Peng Zhao
Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome...
Garcia de la serrana Daniel
Full Text Available Abstract Background The gilthead sea bream (Sparus aurata L. occurs around the Mediterranean and along Eastern Atlantic coasts from Great Britain to Senegal. It is tolerant of a wide range of temperatures and salinities and is often found in brackish coastal lagoons and estuarine areas, particularly early in its life cycle. Gilthead sea bream are extensively cultivated in the Mediterranean with an annual production of 125,000 metric tonnes. Here we present a de novo assembly of the fast skeletal muscle transcriptome of gilthead sea bream using 454 reads and identify gene paralogues, splice variants and microsatellite repeats. An annotated transcriptome of the skeletal muscle will facilitate understanding of the genetic and molecular basis of traits linked to production in this economically important species. Results Around 2.7 million reads of mRNA sequence data were generated from the fast myotomal of adult fish (~2 kg and juvenile fish (~0.09 kg that had been either fed to satiation, fasted for 3-5d or transferred to low (11°C or high (33°C temperatures for 3-5d. Newbler v2.5 assembly resulted in 43,461 isotigs >100 bp. The number of sequences annotated by searching protein and gene ontology databases was 10,465. The average coverage of the annotated isotigs was x40 containing 5655 unique gene IDs and 785 full-length cDNAs coding for proteins containing 58–1536 amino acids. The v2.5 assembly was found to be of good quality based on validation using 200 full-length cDNAs from GenBank. Annotated isotigs from the reference transcriptome were attributable to 344 KEGG pathway maps. We identified 26 gene paralogues (20 of them teleost-specific and 43 splice variants, of which 12 had functional domains missing that were likely to affect their biological function. Many key transcription factors, signaling molecules and structural proteins necessary for myogenesis and muscle growth have been identified. Physiological status affected the
PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...
Sveinsson, Saemundur; McDill, Joshua; Wong, Gane K S; Li, Juanjuan; Li, Xia; Deyholos, Michael K; Cronk, Quentin C B
Cultivated flax (Linum usitatissimum) is known to have undergone a whole-genome duplication around 5-9 million years ago. The aim of this study was to investigate whether other whole-genome duplication events have occurred in the evolutionary history of cultivated flax. Knowledge of such whole-genome duplications will be important in understanding the biology and genomics of cultivated flax. Transcriptomes of 11 Linum species were sequenced using the Illumina platform. The short reads were assembled de novo and the DupPipe pipeline was used to look for signatures of polyploidy events from the age distribution of paralogues. In addition, phylogenies of all paralogues were assembled within an estimated age window of interest. These phylogenies were assessed for evidence of a paleopolyploidy event within the genus Linum. A previously unknown paleopolyploidy event that occurred 20-40 million years ago was discovered and shown to be specific to a clade within Linum containing cultivated flax (L. usitatissimum) and other mainly blue-flowered species. The finding was supported by two lines of evidence. First, a significant change of slope (peak) was shown in the age distribution of paralogues that was phylogenetically restricted to, and ubiquitous in, this clade. Second, a large number of paralogue phylogenies were retrieved that are consistent with a polyploidy event occurring within that clade. The results show the utility of multi-species transcriptomics for detecting whole-genome duplication events and demonstrate that that multiple rounds of polyploidy have been important in shaping the evolutionary history of flax. Understanding and characterizing these whole-genome duplication events will be important for future Linum research.
Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong
In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.
Yang, Minglei; Wu, Ying; Jin, Shan; Hou, Jinyan; Mao, Yingji; Liu, Wenbo; Shen, Yangcheng; Wu, Lifang
Sapium sebiferum (Linn.) Roxb. (Chinese Tallow Tree) is a perennial woody tree and its seeds are rich in oil which hold great potential for biodiesel production. Despite a traditional woody oil plant, our understanding on S. sebiferum genetics and molecular biology remains scant. In this study, the first comprehensive transcriptome of S. sebiferum flower has been generated by sequencing and de novo assembly. A total of 149,342 unigenes were generated from raw reads, of which 24,289 unigenes were successfully matched to public database. A total of 61 MADS box genes and putative pathways involved in S. sebiferum flower development have been identified. Abiotic stress response network was also constructed in this work, where 2,686 unigenes are involved in the pathway. As for lipid biosynthesis, 161 unigenes have been identified in fatty acid (FA) and triacylglycerol (TAG) biosynthesis. Besides, the G-Quadruplexes in RNA of S. sebiferum also have been predicted. An interesting finding is that the stress-induced flowering was observed in S. sebiferum for the first time. According to the results of semi-quantitative PCR, expression tendencies of flowering-related genes, GA1, AP2 and CRY2, accorded with stress-related genes, such as GRX50435 and PRXⅡ39562. This transcriptome provides functional genomic information for further research of S. sebiferum, especially for the genetic engineering to shorten the juvenile period and improve yield by regulating flower development. It also offers a useful database for the research of other Euphorbiaceae family plants.
Nam, Bo-Hye; Jung, Myunghee; Subramaniyam, Sathiyamoorthy; Yoo, Seung-il; Markkandan, Kesavan; Moon, Ji-Young; Kim, Young-Ok; Kim, Dong-Gyun; An, Cheul Min; Shin, Younhee; Jung, Ho-jin; Park, Jun-hyung
Abalone (Haliotis discus hannai) is one of the most valuable marine aquatic species in Korea, Japan and China. Tremendous exposure to bacterial infection is common in aquaculture environment, especially by Vibrio sp. infections. It's therefore necessary and urgent to understand the mechanism of H. discus hannai host defense against Vibrio parahemolyticus infection. However studies on its immune system are hindered by the lack of genomic resources. In the present study, we sequenced the transcriptome of control and bacterial challenged H. discus hannai tissues. Totally, 138 MB of reference transcriptome were obtained from de novo assembly of 34 GB clean bases from ten different libraries and annotated with the biological terms (GO and KEGG). A total of 10,575 transcripts exhibiting the differentially expression at least one pair of comparison and the functional annotations highlight genes related to immune response, cell adhesion, immune regulators, redox molecules and mitochondrial coding genes. Mostly, these groups of genes were dominated in hemocytes compared to other tissues. This work is a prerequisite for the identification of those physiological traits controlling H. discus hannai ability to survive against Vibrio infection.
Nam, Bo-Hye; Jung, Myunghee; Subramaniyam, Sathiyamoorthy; Yoo, Seung-il; Markkandan, Kesavan; Moon, Ji-Young; Kim, Young-Ok; Kim, Dong-Gyun; An, Cheul Min; Shin, Younhee; Jung, Ho-jin; Park, Jun-hyung
Abalone (Haliotis discus hannai) is one of the most valuable marine aquatic species in Korea, Japan and China. Tremendous exposure to bacterial infection is common in aquaculture environment, especially by Vibrio sp. infections. It’s therefore necessary and urgent to understand the mechanism of H. discus hannai host defense against Vibrio parahemolyticus infection. However studies on its immune system are hindered by the lack of genomic resources. In the present study, we sequenced the transcriptome of control and bacterial challenged H. discus hannai tissues. Totally, 138 MB of reference transcriptome were obtained from de novo assembly of 34 GB clean bases from ten different libraries and annotated with the biological terms (GO and KEGG). A total of 10,575 transcripts exhibiting the differentially expression at least one pair of comparison and the functional annotations highlight genes related to immune response, cell adhesion, immune regulators, redox molecules and mitochondrial coding genes. Mostly, these groups of genes were dominated in hemocytes compared to other tissues. This work is a prerequisite for the identification of those physiological traits controlling H. discus hannai ability to survive against Vibrio infection. PMID:27088873
Full Text Available Abalone (Haliotis discus hannai is one of the most valuable marine aquatic species in Korea, Japan and China. Tremendous exposure to bacterial infection is common in aquaculture environment, especially by Vibrio sp. infections. It's therefore necessary and urgent to understand the mechanism of H. discus hannai host defense against Vibrio parahemolyticus infection. However studies on its immune system are hindered by the lack of genomic resources. In the present study, we sequenced the transcriptome of control and bacterial challenged H. discus hannai tissues. Totally, 138 MB of reference transcriptome were obtained from de novo assembly of 34 GB clean bases from ten different libraries and annotated with the biological terms (GO and KEGG. A total of 10,575 transcripts exhibiting the differentially expression at least one pair of comparison and the functional annotations highlight genes related to immune response, cell adhesion, immune regulators, redox molecules and mitochondrial coding genes. Mostly, these groups of genes were dominated in hemocytes compared to other tissues. This work is a prerequisite for the identification of those physiological traits controlling H. discus hannai ability to survive against Vibrio infection.
Puente-Marin, Sara; Nombela, Iván; Ciordia, Sergio; Mena, María Carmen; Chico, Verónica; Coll, Julio; Ortega-Villaizan, María Del Mar
Nucleated red blood cells (RBCs) of fish have, in the last decade, been implicated in several immune-related functions, such as antiviral response, phagocytosis or cytokine-mediated signaling. RNA-sequencing (RNA-seq) and label-free shotgun proteomic analyses were carried out for in silico functional pathway profiling of rainbow trout RBCs. For RNA-seq, a de novo assembly was conducted, in order to create a transcriptome database for RBCs. For proteome profiling, we developed a proteomic method that combined: (a) fractionation into cytosolic and membrane fractions, (b) hemoglobin removal of the cytosolic fraction, (c) protein digestion, and (d) a novel step with pH reversed-phase peptide fractionation and final Liquid Chromatography Electrospray Ionization Tandem Mass Spectrometric (LC ESI-MS/MS) analysis of each fraction. Combined transcriptome- and proteome- sequencing data identified, in silico, novel and striking immune functional networks for rainbow trout nucleated RBCs, which are mainly linked to innate and adaptive immunity. Functional pathways related to regulation of hematopoietic cell differentiation, antigen presentation via major histocompatibility complex class II (MHCII), leukocyte differentiation and regulation of leukocyte activation were identified. These preliminary findings further implicate nucleated RBCs in immune function, such as antigen presentation and leukocyte activation.
Full Text Available BACKGROUND: Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. METHODOLOGY/PRINCIPAL FINDINGS: To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO, Clusters of Orthologous Groups (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS, we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. CONCLUSIONS/SIGNIFICANCE: This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry.
Fonseca, Fernando Campos de Assis; Firmino, Alexandre Augusto Pereira; de Macedo, Leonardo Lima Pepino; Coelho, Roberta Ramos; de Souza Júnior, José Dijair Antonino; de Sousa Júnior, José Dijair Antonino; Silva-Junior, Orzenil Bonfim; Togawa, Roberto Coiti; Pappas, Georgios Joannis; de Góis, Luiz Avelar Brandão; da Silva, Maria Cristina Mattar; Grossi-de-Sá, Maria Fátima
Sugarcane is a widely cultivated plant that serves primarily as a source of sugar and ethanol. Its annual yield can be significantly reduced by the action of several insect pests including the sugarcane giant borer (Telchin licus licus), a lepidopteran that presents a long life cycle and which efforts to control it using pesticides have been inefficient. Although its economical relevance, only a few DNA sequences are available for this species in the GenBank. Pyrosequencing technology was used to investigate the transcriptome of several developmental stages of the insect. To maximize transcript diversity, a pool of total RNA was extracted from whole body insects and used to construct a normalized cDNA database. Sequencing produced over 650,000 reads, which were de novo assembled to generate a reference library of 23,824 contigs. After quality score and annotation, 43% of the contigs had at least one BLAST hit against the NCBI non-redundant database, and 40% showed similarities with the lepidopteran Bombyx mori. In a further analysis, we conducted a comparison with Manduca sexta midgut sequences to identify transcripts of genes involved in digestion. Of these transcripts, many presented an expansion or depletion in gene number, compared to B. mori genome. From the sugarcane giant borer (SGB) transcriptome, a number of aminopeptidase N (APN) cDNAs were characterized based on homology to those reported as Cry toxin receptors. This is the first report that provides a large-scale EST database for the species. Transcriptome analysis will certainly be useful to identify novel developmental genes, to better understand the insect's biology and to guide the development of new strategies for insect-pest control.
Fernando Campos de Assis Fonseca
Full Text Available Sugarcane is a widely cultivated plant that serves primarily as a source of sugar and ethanol. Its annual yield can be significantly reduced by the action of several insect pests including the sugarcane giant borer (Telchin licus licus, a lepidopteran that presents a long life cycle and which efforts to control it using pesticides have been inefficient. Although its economical relevance, only a few DNA sequences are available for this species in the GenBank. Pyrosequencing technology was used to investigate the transcriptome of several developmental stages of the insect. To maximize transcript diversity, a pool of total RNA was extracted from whole body insects and used to construct a normalized cDNA database. Sequencing produced over 650,000 reads, which were de novo assembled to generate a reference library of 23,824 contigs. After quality score and annotation, 43% of the contigs had at least one BLAST hit against the NCBI non-redundant database, and 40% showed similarities with the lepidopteran Bombyx mori. In a further analysis, we conducted a comparison with Manduca sexta midgut sequences to identify transcripts of genes involved in digestion. Of these transcripts, many presented an expansion or depletion in gene number, compared to B. mori genome. From the sugarcane giant borer (SGB transcriptome, a number of aminopeptidase N (APN cDNAs were characterized based on homology to those reported as Cry toxin receptors. This is the first report that provides a large-scale EST database for the species. Transcriptome analysis will certainly be useful to identify novel developmental genes, to better understand the insect's biology and to guide the development of new strategies for insect-pest control.
Gregor, Anne; Oti, Martin; Kouwenhoven, Evelyn N
An increasing number of genes involved in chromatin structure and epigenetic regulation has been implicated in a variety of developmental disorders, often including intellectual disability. By trio exome sequencing and subsequent mutational screening we now identified two de novo frameshift...... mutations and one de novo missense mutation in CTCF in individuals with intellectual disability, microcephaly, and growth retardation. Furthermore, an individual with a larger deletion including CTCF was identified. CTCF (CCCTC-binding factor) is one of the most important chromatin organizers in vertebrates...... and is involved in various chromatin regulation processes such as higher order of chromatin organization, enhancer function, and maintenance of three-dimensional chromatin structure. Transcriptome analyses in all three individuals with point mutations revealed deregulation of genes involved in signal transduction...
Jo, Jihoon; Park, Jongsun; Lee, Hyun-Gwan; Kern, Elizabeth M A; Cheon, Seongmin; Jin, Soyeong; Park, Joong-Ki; Cho, Sung-Jin; Park, Chungoo
The sea cucumber Apostichopus japonicus Selenka 1867 represents an important resource in biomedical research, traditional medicine, and the seafood industry. Much of the commercial value of A. japonicus is determined by dorsal/ventral color variation (red, green, and black), yet the taxonomic relationships between these color variants are not clearly understood. We performed the first comparative analysis of de novo assembled transcriptome data from three color variants of A. japonicus. Using the Illumina platform, we sequenced nearly 177,596,774 clean reads representing a total of 18.2Gbp of sea cucumber transcriptome. A comparison of over 0.3 million transcript scaffolds against the Uniprot/Swiss-Prot database yielded 8513, 8602, and 8588 positive matches for green, red, and black body color transcriptomes, respectively. Using the Panther gene classification system, we assessed an extensive and diverse set of expressed genes in three color variants and found that (1) among the three color variants of A. japonicus, genes associated with RNA binding protein, oxidoreductase, nucleic acid binding, transferase, and KRAB box transcription factor were most commonly expressed; and (2) the main protein functional classes are differently regulated in all three color variants (extracellular matrix protein and phosphatase for green color, transporter and potassium channel for red color, and G-protein modulator and enzyme modulator for black color). This work will assist in the discovery and annotation of novel genes that play significant morphological and physiological roles in color variants of A. japonicus, and these sequence data will provide a useful set of resources for the rapidly growing sea cucumber aquaculture industry. Copyright © 2016 Elsevier B.V. All rights reserved.
Full Text Available BACKGROUND: The insect predator, Arma chinensis, is capable of effectively controlling many pests, such as Colorado potato beetle, cotton bollworm, and mirid bugs. Our previous study demonstrated several life history parameters were diminished for A. chinensis reared on an artificial diet compared to a natural food source like the Chinese oak silk moth pupae. The molecular mechanisms underlying the nutritive impact of the artificial diet on A. chinensis health are unclear. So we utilized transcriptome information to better understand the impact of the artificial diet on A. chinensis at the molecular level. METHODOLOGY/PRINCIPAL FINDINGS: Illumina HiSeq2000 was used to sequence 4.79 and 4.70 Gb of the transcriptome from pupae-fed and artificial diet-fed A. chinensis libraries, respectively, and a de novo transcriptome assembly was performed (Trinity short read assembler. This resulted in 112,029 and 98,724 contigs, clustered into 54,083 and 54,169 unigenes for pupae-fed and diet-fed A. chinensis, respectively. Unigenes from each sample's assembly underwent sequence splicing and redundancy removal to acquire non-redundant unigenes. We obtained 55,189 unigenes of A. chinensis, including 12,046 distinct clusters and 43,143 distinct singletons. Unigene sequences were aligned by BLASTx to nr, Swiss-Prot, KEGG and COG (E-value <10(-5, and further aligned by BLASTn to nt (E-value <10(-5, retrieving proteins of highest sequence similarity with the given unigenes along with their protein functional annotations. Totally, 22,964, 7,898, 18,069, 15,416, 8,066 and 5,341 unigenes were annotated in nr, nt, Swiss-Prot, KEGG, COG and GO, respectively. We compared gene expression variations and found thousands of genes were differentially expressed between pupae-fed and diet-fed A. chinensis. CONCLUSIONS/SIGNIFICANCE: Our study provides abundant genomic data and offers comprehensive sequence information for studying A. chinensis. Additionally, the physiological
Full Text Available The passion fruit (Passiflora edulis Sims, also known as the purple granadilla, is widely cultivated as the new darling of the fruit market throughout southern China. This exotic and perennial climber is adapted to warm and humid climates, and thus is generally intolerant of cold. There is limited information about gene regulation and signaling pathways related to the cold stress response in this species. In this study, two transcriptome libraries (KEDU_AP vs. GX_AP were constructed from the aerial parts of cold-tolerant and cold-susceptible varieties of P. edulis, respectively. Overall, 126,284,018 clean reads were obtained, and 86,880 unigenes with a mean size of 1449 bp were assembled. Of these, there were 64,067 (73.74% unigenes with significant similarity to publicly available plant protein sequences. Expression profiles were generated, and 3045 genes were found to be significantly differentially expressed between the KEDU_AP and GX_AP libraries, including 1075 (35.3% up-regulated and 1970 (64.7% down-regulated. These included 36 genes in enriched pathways of plant hormone signal transduction, and 56 genes encoding putative transcription factors. Six genes involved in the ICE1–CBF–COR pathway were induced in the cold-tolerant variety, and their expression levels were further verified using quantitative real-time PCR. This report is the first to identify genes and signaling pathways involved in cold tolerance using high-throughput transcriptome sequencing in P. edulis. These findings may provide useful insights into the molecular mechanisms regulating cold tolerance and genetic breeding in Passiflora spp.
Yao, Zhaoqun; Tian, Fang; Cao, Xiaolei; Xu, Ying; Chen, Meixiu; Xiang, Benchun; Zhao, Sifeng
Phelipanche aegyptiaca is one of the most destructive root parasitic plants of Orobanchaceae. This plant has significant impacts on crop yields worldwide. Conditioned and host root stimulants, in particular, strigolactones, are needed for unique seed germination. However, no extensive study on this phenomenon has been conducted because of insufficient genomic information. Deep RNA sequencing, including de novo assembly and functional annotation was performed on P. aegyptiaca germinating seeds. The assembled transcriptome was used to analyze transcriptional dynamics during seed germination. Key gene categories involved were identified. A total of 274,964 transcripts were determined, and 53,921 unigenes were annotated according to the NR, GO, COG, KOG, and KEGG databases. Overall, 5324 differentially expressed genes among dormant, conditioned, and GR24-treated seeds were identified. GO and KEGG enrichment analyses demonstrated numerous DEGs related to DNA, RNA, and protein repair and biosynthesis, as well as carbohydrate and energy metabolism. Moreover, ABA and ethylene were found to play important roles in this process. GR24 application resulted in dramatic changes in ABA and ethylene-associated genes. Fluridone, a carotenoid biosynthesis inhibitor, alone could induce P. aegyptiaca seed germination. In addition, conditioning was probably not the indispensable stage for P. aegyptiaca, because the transcript level variation of MAX2 and KAI2 genes (relate to strigolactone signaling) was not up-regulated by conditioning treatment.
Full Text Available Phelipanche aegyptiaca is one of the most destructive root parasitic plants of Orobanchaceae. This plant has significant impacts on crop yields worldwide. Conditioned and host root stimulants, in particular, strigolactones, are needed for unique seed germination. However, no extensive study on this phenomenon has been conducted because of insufficient genomic information. Deep RNA sequencing, including de novo assembly and functional annotation was performed on P. aegyptiaca germinating seeds. The assembled transcriptome was used to analyze transcriptional dynamics during seed germination. Key gene categories involved were identified. A total of 274,964 transcripts were determined, and 53,921 unigenes were annotated according to the NR, GO, COG, KOG, and KEGG databases. Overall, 5324 differentially expressed genes among dormant, conditioned, and GR24-treated seeds were identified. GO and KEGG enrichment analyses demonstrated numerous DEGs related to DNA, RNA, and protein repair and biosynthesis, as well as carbohydrate and energy metabolism. Moreover, ABA and ethylene were found to play important roles in this process. GR24 application resulted in dramatic changes in ABA and ethylene-associated genes. Fluridone, a carotenoid biosynthesis inhibitor, alone could induce P. aegyptiaca seed germination. In addition, conditioning was probably not the indispensable stage for P. aegyptiaca, because the transcript level variation of MAX2 and KAI2 genes (relate to strigolactone signaling was not up-regulated by conditioning treatment.
Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza
Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.
Full Text Available Abstract Background Worldwide, the genus Haliotis is represented by 56 extant species and several of these are commercially cultured. Among the six abalone species found in South Africa, Haliotis midae is the only aquacultured species. Despite its economic importance, genomic sequence resources for H. midae, and for abalone in general, are still scarce. Next generation sequencing technologies provide a fast and efficient tool to generate large sequence collections that can be used to characterize the transcriptome and identify expressed genes associated with economically important traits like growth and disease resistance. Results More than 25 million short reads generated by the Illumina Genome Analyzer were de novo assembled in 22,761 contigs with an average size of 260 bp. With a stringent E-value threshold of 10-10, 3,841 contigs (16.8% had a BLAST homologous match against the Genbank non-redundant (NR protein database. Most of these sequences were annotated using the gene ontology (GO and eukaryotic orthologous groups of proteins (KOG databases and assigned to various functional categories. According to annotation results, many gene families involved in immune response were identified. Thousands of simple sequence repeats (SSR and single nucleotide polymorphisms (SNP were detected. Setting stringent parameters to ensure a high probability of amplification, 420 primer pairs in 181 contigs containing SSR loci were designed. Conclusion This data represents the most comprehensive genomic resource for the South African abalone H. midae to date. The amount of assembled sequences demonstrated the utility of the Illumina sequencing technology in the transcriptome characterization of a non-model species. It allowed the development of several markers and the identification of promising candidate genes for future studies on population and functional genomics in H. midae and in other abalone species.
Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki
The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with
Full Text Available The testis is a highly specialized tissue that plays dual roles in ensuring fertility by producing spermatozoa and hormones. Spermatogenesis is a complex process, resulting in the production of mature sperm from primordial germ cells. Significant structural and biochemical changes take place in the seminiferous epithelium of the adult testis during spermatogenesis. The gene expression pattern of testis in Chinese mitten crab (Eriocheir sinensis has not been extensively studied, and limited genetic research has been performed on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for testis of E. sinensis. In two runs, we produced 25,698,778 sequencing reads corresponding with 2.31 Gb total nucleotides. These reads were assembled into 342,753 contigs or 141,861 scaffold sequences, which identified 96,311 unigenes. Based on similarity searches with known proteins, 39,995 unigenes were annotated based on having a Blast hit in the non-redundant database or ESTscan results with a cut-off E-value above 10(-5. This is the first report of a mitten crab transcriptome using high-throughput sequencing technology, and all these testes transcripts can help us understand the molecular mechanisms involved in spermatogenesis and testis maturation.
Wang, Jianghong; Zhou, Xiang; Guo, Kai; Zhang, Xinqi; Lin, Haiping; Montalva, Cristian
Conidiobolus obscurus is a widespread fungal entomopathogen with aphid biocontrol potential. This study focused on a de novo transcriptomic analysis of C. obscurus. A number of pathogenicity-associated factors were annotated for the first time from the assembled 17 231 fungal unigenes, including those encoding subtilisin-like proteolytic enzymes (Pr1s), trypsin-like proteases, metalloproteases, carboxypeptidases and endochitinases. Many of these genes were transcriptionally up-regulated by at least twofold in mycotized cadavers compared with the in vitro fungal cultures. The resultant transcriptomic database was validated by the transcript levels of three selected pathogenicity-related genes quantified from different in vivo and in vitro material in real-time quantitative polymerase chain reaction (PCR). The involvement of multiple Pr1 proteases in the first stage of fungal infection was also suggested. Interestingly, a unique cytolytic (Cyt)-like δ-endotoxin gene was highly expressed in both mycotized cadavers and fungal cultures, and was more or less distinct from its homologues in bacteria and other fungi. Our findings provide the first global insight into various pathogenicity-related genes in this obligate aphid pathogen and may help to develop novel biocontrol strategy against aphid pests. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Zhao, Ying-Jun; Zeng, Yan; Chen, Lei; Dong, Yang; Wang, Wen
As an ancient arthropod with a history of 390 million years, spiders evolved numerous morphological forms resulting from adaptation to different environments. The venom and silk of spiders, which have promising commercial applications in agriculture, medicine and engineering fields, are of special interests to researchers. However, little is known about their genomic components, which hinders not only understanding spider biology but also utilizing their valuable genes. Here we report on deep sequenced and de novo assembled transcriptomes of three orb-web spider species, Gasteracantha arcuata, Nasoonaria sinensis and Gasteracantha hasselti which are distributed in tropical forests of south China. With Illumina paired-end RNA-seq technology, 54 871, 101 855 and 75 455 unigenes for the three spider species were obtained, respectively, among which 9 300, 10 001 and 10 494 unique genes are annotated, respectively. From these annotated unigenes, we comprehensively analyzed silk and toxin gene components and structures for the three spider species. Our study provides valuable transcriptome data for three spider species which previously lacked any genetic/genomic data. The results have laid the first fundamental genomic basis for exploiting gene resources from these spiders. © 2013 Institute of Zoology, Chinese Academy of Sciences.
Kira C.M. Neller
Full Text Available The American pokeweed plant, Phytolacca americana, is recognized for synthesizing pokeweed antiviral protein (PAP, a ribosome inactivating protein (RIP that inhibits the replication of several plant and animal viruses. The plant is also a heavy metal accumulator with applications in soil remediation. However, little is known about pokeweed stress responses, as large-scale sequencing projects have not been performed for this species. Here, we sequenced the mRNA transcriptome of pokeweed in the presence and absence of jasmonic acid (JA, a hormone mediating plant defense. Trinity-based de novo assembly of mRNA from leaf tissue and BLASTx homology searches against public sequence databases resulted in the annotation of 59 096 transcripts. Differential expression analysis identified JA-responsive genes that may be involved in defense against pathogen infection and herbivory. We confirmed the existence of several PAP isoforms and cloned a potentially novel isoform of PAP. Expression analysis indicated that PAP isoforms are differentially responsive to JA, perhaps indicating specialized roles within the plant. Finally, we identified 52 305 natural antisense transcript pairs, four of which comprised PAP isoforms, suggesting a novel form of RIP gene regulation. This transcriptome-wide study of a Phytolaccaceae family member provides a source of new genes that may be involved in stress tolerance in this plant. The sequences generated in our study have been deposited in the SRA database under project # SRP069141.