WorldWideScience

Sample records for novo transcriptome assembly

  1. De novo transcriptome assembly of shrimp Palaemon serratus

    Directory of Open Access Journals (Sweden)

    Alejandra Perina

    2017-03-01

    Full Text Available The shrimp Palaemon serratus is a coastal decapod crustacean with a high commercial value. It is harvested for human consumption. In this study, we used Illumina sequencing technology (HiSeq 2000 to sequence, assemble and annotate the transcriptome of P. serratus. RNA was isolated from muscle of adults individuals and, from a pool of larvae. A total number of 4 cDNA libraries were constructed, using the TruSeq RNA Sample Preparation Kit v2. The raw data in this study was deposited in NCBI SRA database with study accession number of SRP090769. The obtained data were subjected to de novo transcriptome assembly using Trinity software, and coding regions were predicted by TransDecoder. We used Blastp and Sma3s to annotate the identified proteins. The transcriptome data could provide some insight into the understanding of genes involved in the larval development and metamorphosis.

  2. De novo transcriptome assembly of Setatria italica variety Taejin

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2016-06-01

    Full Text Available Foxtail millet (Setaria italica belonging to the family Poaceae is an important millet that is widely cultivated in East Asia. Of the cultivated millets, the foxtail millet has the longest history and is one of the main food crops in South India and China. Moreover, foxtail millet is a model plant system for biofuel generation utilizing the C4 photosynthetic pathway. In this study, we carried out de novo transcriptome assembly for the foxtail millet variety Taejin collected from Korea using next-generation sequencing. We obtained a total of 8.676 GB raw data by paired-end sequencing. The raw data in this study can be available in NCBI SRA database with accession number of SRR3406552. The Trinity program was used to de novo assemble 145,332 transcripts. Using the TransDecoder program, we predicted 82,925 putative proteins. BLASTP was performed against the Swiss-Prot protein sequence database to annotate the functions of identified proteins, resulting in 20,555 potentially novel proteins. Taken together, this study provides transcriptome data for the foxtail millet variety Taejin by RNA-Seq.

  3. Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome.

    Directory of Open Access Journals (Sweden)

    Loren A Honaas

    Full Text Available Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1 proportion of reads mapping to an assembly 2 recovery of conserved, widely expressed genes, 3 N50 length statistics, and 4 the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.

  4. De novo transcriptome assembly of the mycoheterotrophic plant Monotropa hypopitys

    Directory of Open Access Journals (Sweden)

    Alexey V. Beletsky

    2017-03-01

    Full Text Available Monotropa hypopitys (pinesap is a non-photosynthetic obligately mycoheterotrophic plant of the family Ericaceae. It obtains the carbon and other nutrients from the roots of surrounding autotrophic trees through the associated mycorrhizal fungi. In order to understand the evolutionary changes in the plant genome associated with transition to a heterotrophic lifestyle, we performed de novo transcriptomic analysis of M. hypopitys using next-generation sequencing. We obtained the RNA-Seq data from flowers, flower bracts and roots with haustoria using Illumina HiSeq2500 platform. The raw data obtained in this study can be available in NCBI SRA database with accession number of SRP069226. A total of 10.3 GB raw sequence data were obtained, corresponding to 103,357,809 raw reads. A total of 103,025,683 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 98,349 unigens with an N50 of 1342 bp. Using the TransDecoder program, we predicted 43,505 putative proteins. 38,416 unigenes were annotated in the Swiss-Prot protein sequence database using BLASTX. The obtained transcriptomic data will be useful for further studies of the evolution of plant genomes upon transition to a non-photosynthetic lifestyle and the loss of photosynthesis-related functions.

  5. De novo transcriptome assembly of a sour cherry cultivar, Schattenmorelle

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2015-12-01

    Full Text Available Sour cherry (Prunus cerasus in the genus Prunus in the family Rosaceae is one of the most popular stone fruit trees worldwide. Of known sour cherry cultivars, the Schattenmorelle is a famous old sour cherry with a high amount of fruit production. The Schattenmorelle was selected before 1650 and described in the 1800s. This cultivar was named after gardens of the Chateau de Moreille in which the cultivar was initially found. In order to identify new genes and to develop genetic markers for sour cherry, we performed a transcriptome analysis of a sour cherry. We selected the cultivar Schattenmorelle, which is among commercially important cultivars in Europe and North America. We obtained 2.05 GB raw data from the Schattenmorelle (NCBI accession number: SRX1187170. De novo transcriptome assembly using Trinity identified 61,053 transcripts in which N50 was 611 bp. Next, we identified 25,585 protein coding sequences using TransDecoder. The identified proteins were blasted against NCBI's non-redundant database for annotation. Based on blast search, we taxonomically classified the obtained sequences. As a result, we provide the transcriptome of sour cherry cultivar Schattenmorelle using next generation sequencing.

  6. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae

    Directory of Open Access Journals (Sweden)

    Samuel E. Fox

    2013-03-01

    Full Text Available Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis, 104,950 (Spain, and 177,682 (Greece transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs and >20,000 simple sequence repeat (SSR DNA sites. Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum.

  7. De novo transcriptome assembly of Sorghum bicolor variety Taejin

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2016-06-01

    Full Text Available Sorghum (Sorghum bicolor, also known as great millet, is one of the most popular cultivated grass species in the world. Sorghum is frequently consumed as food for humans and animals as well as used for ethanol production. In this study, we conducted de novo transcriptome assembly for sorghum variety Taejin by next-generation sequencing, obtaining 8.748 GB of raw data. The raw data in this study can be available in NCBI SRA database with accession number of SRX1715644. Using the Trinity program, we identified 222,161 transcripts from sorghum variety Taejin. We further predicted coding regions within the assembled transcripts by the TransDecoder program, resulting in a total of 148,531 proteins. We carried out BLASTP against the Swiss-Prot protein sequence database to annotate the functions of the identified proteins. To our knowledge, this is the first transcriptome data for a sorghum variety derived from Korea, and it can be usefully applied to the generation of genetic markers.

  8. Comparing de novo assemblers for 454 transcriptome data.

    Science.gov (United States)

    Kumar, Sujai; Blaxter, Mark L

    2010-10-16

    Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible

  9. Comparing de novo assemblers for 454 transcriptome data

    Directory of Open Access Journals (Sweden)

    Blaxter Mark L

    2010-10-01

    Full Text Available Abstract Background Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects, which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies

  10. De novo transcriptome assembly of mangosteen (Garcinia mangostana L. fruit

    Directory of Open Access Journals (Sweden)

    Deden Derajat Matra

    2016-12-01

    Full Text Available Garcinia mangostana L. (Mangosteen, of the family Clusiaceae, is one of the economically important tropical fruits in Indonesia. In the present study, we performed de novo transcriptomic analysis of Garcinia mangostana L. through RNA-Seq technology. We obtained the raw data from 12 libraries through Ion Proton System. Clean reads of 191,735,809 were obtained from 307,634,890 raw reads. The raw data obtained in this study can be accessible in DDBJ database with accession number of DRA005014 with bioproject accession number of PRJDB5091. We obtained 268,851 transcripts as well as 155,850 unigenes, having N50 value of 555 and 433 bp, respectively. Transcript/unigene length ranged from 201 to 5916 bp. The unigenes were annotated with two main databases from NCBI and UniProtKB, respectively having annotated-sequences of 73,287 and 73,107, respectively. These transcriptomic data will be beneficial for studying transcriptome of Garcinia mangostana L.

  11. Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data

    Directory of Open Access Journals (Sweden)

    Duan Jialei

    2012-08-01

    Full Text Available Abstract Background Rapid advances in next-generation sequencing methods have provided new opportunities for transcriptome sequencing (RNA-Seq. The unprecedented sequencing depth provided by RNA-Seq makes it a powerful and cost-efficient method for transcriptome study, and it has been widely used in model organisms and non-model organisms to identify and quantify RNA. For non-model organisms lacking well-defined genomes, de novo assembly is typically required for downstream RNA-Seq analyses, including SNP discovery and identification of genes differentially expressed by phenotypes. Although RNA-Seq has been successfully used to sequence many non-model organisms, the results of de novo assembly from short reads can still be improved by using recent bioinformatic developments. Results In this study, we used 212.6 million pair-end reads, which accounted for 16.2 Gb, to assemble the hexaploid wheat transcriptome. Two state-of-the-art assemblers, Trinity and Trans-ABySS, which use the single and multiple k-mer methods, respectively, were used, and the whole de novo assembly process was divided into the following four steps: pre-assembly, merging different samples, removal of redundancy and scaffolding. We documented every detail of these steps and how these steps influenced assembly performance to gain insight into transcriptome assembly from short reads. After optimization, the assembled transcripts were comparable to Sanger-derived ESTs in terms of both continuity and accuracy. We also provided considerable new wheat transcript data to the community. Conclusions It is feasible to assemble the hexaploid wheat transcriptome from short reads. Special attention should be paid to dealing with multiple samples to balance the spectrum of expression levels and redundancy. To obtain an accurate overview of RNA profiling, removal of redundancy may be crucial in de novo assembly.

  12. De novo transcriptome assembly of two contrasting pumpkin cultivars

    Directory of Open Access Journals (Sweden)

    Aliki Xanthopoulou

    2016-03-01

    Full Text Available Cucurbita pepo (squash, pumpkin, gourd, a worldwide-cultivated vegetable of American origin, is extremely variable in fruit characteristics. However, the information associated with genes and genetic markers for pumpkin is very limited. In order to identify new genes and to develop genetic markers, we performed a transcriptome analysis (RNA-Seq of two contrasting pumpkin cultivars. Leaves and female flowers of cultivars, ‘Big Moose’ with large round fruits and ‘Munchkin’ with small round fruits, were harvested for total RNA extraction. We obtained a total of 6 GB (Big Moose; http://www.ncbi.nlm.nih.gov/Traces/sra/?run=SRR3056882 and 5 GB (Munchkin; http://www.ncbi.nlm.nih.gov/Traces/sra/?run=SRR3056883 sequence data (NCBI SRA database SRX1502732 and SRX1502735, respectively, which correspond to 18,055,786 and 14,824,292 150-base reads. After quality assessment, the clean sequences where 17,995,932 and 14,774,486 respectively. The numbers of total transcripts for ‘Big Moose’ and ‘Munchkin’ were 84,727 and 68,051, respectively. TransDecoder identified possible coding regions in assembled transcripts. This study provides transcriptome data for two contrasting pumpkin cultivars, which might be useful for genetic marker development and comparative transcriptome analyses. Keywords: RNA-Seq, Pumpkin, Contrasting cultivars, Cucurbita pepo

  13. Optimizing Transcriptome Assemblies for Eleusine indica Leaf and Seedling by Combining Multiple Assemblies from Three De Novo Assemblers

    Directory of Open Access Journals (Sweden)

    Shu Chen

    2015-03-01

    Full Text Available Due to rapid advances in sequencing technology, increasing amounts of genomic and transcriptomic data are available for plant species, presenting enormous challenges for biocomputing analysis. A crucial first step for a successful transcriptomics-based study is the building of a high-quality assembly. Here, we utilized three different de novo assemblers (Trinity, Velvet, and CLC and the EvidentialGene pipeline tr2aacds to assemble two optimized transcript sets for the notorious weed species, . Two RNA sequencing (RNA-seq datasets from leaf and aboveground seedlings were processed using three assemblers, which resulted in 20 assemblies for each dataset. The contig numbers and N50 values of each assembly were compared to study the effect of read number, k-mer size, and in silico normalization on assembly output. The 20 assemblies were then processed through the tr2aacds pipeline to remove redundant transcripts and to select the transcript set with the best coding potential. Each assembly contributed a considerable proportion to the final transcript combination with the exception of the CLC-k14. Thus each assembler and parameter set did assemble better contigs for certain transcripts. The redundancy, total contig number, N50, fully assembled contig number, and transcripts related to target-site herbicide resistance were evaluated for the EvidentialGene and Trinity assemblies. Comparing the EvidentialGene set with the Trinity assembly revealed improved quality and reduced redundancy in both leaf and seedling EvidentialGene sets. The optimized transcriptome references will be useful for studying herbicide resistance in and the evolutionary process in the three allotetraploid offspring.

  14. De novo transcriptome assembly associated with fumonisin production by the rice pathogen Fusarium fujikuroi

    Directory of Open Access Journals (Sweden)

    Keerthi S. Guruge

    2018-06-01

    Full Text Available The present study employed a next-generation sequencing method to assemble a de novo transcriptome database designed to distinguish gene expression changes exhibited by the fumonisin-producing fungus Fusarium fujikuroi when grown under ‘fumonisin-producing’ compared to ‘non-fumonisin-producing’ conditions. The raw data of this study have been deposited at DNA Data Bank of Japan (DDBJ under the accession ID DRA006146. Keywords: Fusarium fujikuroi, Fumonisin, Next-generation sequencing, Transcriptome, Gene-expression

  15. De novo transcriptome assembly of two different peach cultivars grown in Korea

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2015-12-01

    Full Text Available Peach (Prunus persica is one of the most popular stone fruits worldwide. Next generation sequencing (NGS has facilitated genome and transcriptome analyses of several stone fruit trees. In this study, we conducted de novo transcriptome analyses of two peach cultivars grown in Korea. Leaves of two cultivars, referred to as Jangtaek and Mibaek, were harvested and used for library preparation. The two prepared libraries were paired-end sequenced by the HiSeq2000 system. We obtained 8.14 GB and 9.62 GB sequence data from Jangtaek and Mibaek (NCBI accession numbers: SRS1056585 and SRS1056587, respectively. The Trinity program was used to assemble two transcriptomes de novo, resulting in 110,477 (Jangtaek and 136,196 (Mibaek transcripts. TransDecoder identified possible coding regions in assembled transcripts. The identified proteins were subjected to BLASTP search against NCBI's non-redundant database for functional annotation. This study provides transcriptome data for two peach cultivars, which might be useful for genetic marker development and comparative transcriptome analyses.

  16. De novo transcriptome assembly of heavy metal tolerant Silene dioica

    Czech Academy of Sciences Publication Activity Database

    Čegan, R.; Hudzieczek, V.; Hobza, Roman

    2017-01-01

    Roč. 11, MAR (2017), s. 118-119 ISSN 2213-5960 Institutional support: RVO:61389030 Keywords : genome * Silene dioica * RNA-Seq * Transcriptome * Heavy metal tolerance * Sex chromosomes Subject RIV: EB - Genetics ; Molecular Biology OBOR OECD: Plant sciences, botany

  17. De novo assembly of the perennial ryegrass transcriptome using an RNA-Seq strategy.

    Directory of Open Access Journals (Sweden)

    Jacqueline D Farrell

    Full Text Available Perennial ryegrass is a highly heterozygous outbreeding grass species used for turf and forage production. Heterozygosity can affect de-Bruijn graph assembly making de novo transcriptome assembly of species such as perennial ryegrass challenging. Creating a reference transcriptome from a homozygous perennial ryegrass genotype can circumvent the challenge of heterozygosity. The goals of this study were to perform RNA-sequencing on multiple tissues from a highly inbred genotype to develop a reference transcriptome. This was complemented with RNA-sequencing of a highly heterozygous genotype for SNP calling.De novo transcriptome assembly of the inbred genotype created 185,833 transcripts with an average length of 830 base pairs. Within the inbred reference transcriptome 78,560 predicted open reading frames were found of which 24,434 were predicted as complete. Functional annotation found 50,890 transcripts with a BLASTp hit from the Swiss-Prot non-redundant database, 58,941 transcripts with a Pfam protein domain and 1,151 transcripts encoding putative secreted peptides. To evaluate the reference transcriptome we targeted the high-affinity K+ transporter gene family and found multiple orthologs. Using the longest unique open reading frames as the reference sequence, 64,242 single nucleotide polymorphisms were found. One thousand sixty one open reading frames from the inbred genotype contained heterozygous sites, confirming the high degree of homozygosity.Our study has developed an annotated, comprehensive transcriptome reference for perennial ryegrass that can aid in determining genetic variation, expression analysis, genome annotation, and gene mapping.

  18. Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.

    Science.gov (United States)

    Cerveau, Nicolas; Jackson, Daniel J

    2016-12-09

    Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be

  19. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    Science.gov (United States)

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as

  20. De Novo Assembly of the Pea (Pisum sativum L. Nodule Transcriptome

    Directory of Open Access Journals (Sweden)

    Vladimir A. Zhukov

    2015-01-01

    Full Text Available The large size and complexity of the garden pea (Pisum sativum L. genome hamper its sequencing and the discovery of pea gene resources. Although transcriptome sequencing provides extensive information about expressed genes, some tissue-specific transcripts can only be identified from particular organs under appropriate conditions. In this study, we performed RNA sequencing of polyadenylated transcripts from young pea nodules and root tips on an Illumina GAIIx system, followed by de novo transcriptome assembly using the Trinity program. We obtained more than 58,000 and 37,000 contigs from “Nodules” and “Root Tips” assemblies, respectively. The quality of the assemblies was assessed by comparison with pea expressed sequence tags and transcriptome sequencing project data available from NCBI website. The “Nodules” assembly was compared with the “Root Tips” assembly and with pea transcriptome sequencing data from projects indicating tissue specificity. As a result, approximately 13,000 nodule-specific contigs were found and annotated by alignment to known plant protein-coding sequences and by Gene Ontology searching. Of these, 581 sequences were found to possess full CDSs and could thus be considered as novel nodule-specific transcripts of pea. The information about pea nodule-specific gene sequences can be applied for gene-based markers creation, polymorphism studies, and real-time PCR.

  1. De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana.

    Science.gov (United States)

    Gross, Stephen M; Martin, Jeffrey A; Simpson, June; Abraham-Juarez, María Jazmín; Wang, Zhong; Visel, Axel

    2013-08-19

    Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.

  2. De Novo Assembly and Characterization of the Transcriptome of Grasshopper Shirakiacris shirakii

    Directory of Open Access Journals (Sweden)

    Zhongying Qiu

    2016-07-01

    Full Text Available Background: The grasshopper Shirakiacris shirakii is an important agricultural pest and feeds mainly on gramineous plants, thereby causing economic damage to a wide range of crops. However, genomic information on this species is extremely limited thus far, and transcriptome data relevant to insecticide resistance and pest control are also not available. Methods: The transcriptome of S. shirakii was sequenced using the Illumina HiSeq platform, and we de novo assembled the transcriptome. Results: Its sequencing produced a total of 105,408,878 clean reads, and the de novo assembly revealed 74,657 unigenes with an average length of 680 bp and N50 of 1057 bp. A total of 28,173 unigenes were annotated for the NCBI non-redundant protein sequences (Nr, NCBI non-redundant nucleotide sequences (Nt, a manually-annotated and reviewed protein sequence database (Swiss-Prot, Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. Based on the Nr annotation results, we manually identified 79 unigenes encoding cytochrome P450 monooxygenases (P450s, 36 unigenes encoding carboxylesterases (CarEs and 36 unigenes encoding glutathione S-transferases (GSTs in S. shirakii. Core RNAi components relevant to miroRNA, siRNA and piRNA pathways, including Pasha, Loquacious, Argonaute-1, Argonaute-2, Argonaute-3, Zucchini, Aubergine, enhanced RNAi-1 and Piwi, were expressed in S. shirakii. We also identified five unigenes that were homologous to the Sid-1 gene. In addition, the analysis of differential gene expressions revealed that a total of 19,764 unigenes were up-regulated and 4185 unigenes were down-regulated in larvae. In total, we predicted 7504 simple sequence repeats (SSRs from 74,657 unigenes. Conclusions: The comprehensive de novo transcriptomic data of S. shirakii will offer a series of valuable molecular resources for better studying insecticide resistance, RNAi and molecular marker discovery in the transcriptome.

  3. Evaluation of the impact of RNA preservation methods of spiders for de novo transcriptome assembly.

    Science.gov (United States)

    Kono, Nobuaki; Nakamura, Hiroyuki; Ito, Yusuke; Tomita, Masaru; Arakawa, Kazuharu

    2016-05-01

    With advances in high-throughput sequencing technologies, de novo transcriptome sequencing and assembly has become a cost-effective method to obtain comprehensive genetic information of a species of interest, especially in nonmodel species with large genomes such as spiders. However, high-quality RNA is essential for successful sequencing, and sample preservation conditions require careful consideration for the effective storage of field-collected samples. To this end, we report a streamlined feasibility study of various storage conditions and their effects on de novo transcriptome assembly results. The storage parameters considered include temperatures ranging from room temperature to -80°C; preservatives, including ethanol, RNAlater, TRIzol and RNAlater-ICE; and sample submersion states. As a result, intact RNA was extracted and assembly was successful when samples were preserved at low temperatures regardless of the type of preservative used. The assemblies as well as the gene expression profiles were shown to be robust to RNA degradation, when 30 million 150-bp paired-end reads are obtained. The parameters for sample storage, RNA extraction, library preparation, sequencing and in silico assembly considered in this work provide a guideline for the study of field-collected samples of spiders. © 2015 John Wiley & Sons Ltd.

  4. De novo assembly and annotation of the Antarctic copepod (Tigriopus kingsejongensis) transcriptome.

    Science.gov (United States)

    Kim, Hui-Su; Lee, Bo-Young; Han, Jeonghoon; Lee, Young Hwan; Min, Gi-Sik; Kim, Sanghee; Lee, Jae-Seong

    2016-08-01

    The whole transcriptome of the Antarctic copepod (Tigriopus kingsejongensis) was sequenced using Illumina RNA-seq. De novo assembly was performed with 64,785,098 raw reads using Trinity, which assembled into 81,653 contigs. TransDecoder found 38,250 candidate coding contigs which showed homology to other species by BLAST analysis. Functional gene annotation was performed by Gene Ontology (GO), InterProScan, and KEGG pathway analyses. Finally, we identified a number of expressed gene catalog for T. kingsejongensis that is a useful model animal for gene information-based polar research to uncover molecular mechanisms of environmental adaptation on harsh environments. In particular, we observed highly developing lipid metabolism in T. kingsejongensis directly compared to those of the Far East Pacific coast copepod Tigriopus japonicus at the transcriptome level. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. De novo assembling and primary analysis of genome and transcriptome of gray whale Eschrichtius robustus.

    Science.gov (United States)

    Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V

    2017-12-28

    Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.

  6. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  7. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    Science.gov (United States)

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  8. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome.

    Science.gov (United States)

    Weisberg, Alexandra J; Kim, Gunjune; Westwood, James H; Jelesko, John G

    2017-11-10

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is "leaves of three, let it be", which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species.

  9. De novo assembly of maritime pine transcriptome: implications for forest breeding and biotechnology.

    Science.gov (United States)

    Canales, Javier; Bautista, Rocio; Label, Philippe; Gómez-Maldonado, Josefa; Lesur, Isabelle; Fernández-Pozo, Noe; Rueda-López, Marina; Guerrero-Fernández, Dario; Castro-Rodríguez, Vanessa; Benzekri, Hicham; Cañas, Rafael A; Guevara, María-Angeles; Rodrigues, Andreia; Seoane, Pedro; Teyssier, Caroline; Morel, Alexandre; Ehrenmann, François; Le Provost, Grégoire; Lalanne, Céline; Noirot, Céline; Klopp, Christophe; Reymond, Isabelle; García-Gutiérrez, Angel; Trontin, Jean-François; Lelu-Walter, Marie-Anne; Miguel, Celia; Cervera, María Teresa; Cantón, Francisco R; Plomion, Christophe; Harvengt, Luc; Avila, Concepción; Gonzalo Claros, M; Cánovas, Francisco M

    2014-04-01

    Maritime pine (Pinus pinasterAit.) is a widely distributed conifer species in Southwestern Europe and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next-generation sequencing platforms, 454 and Illumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome assembly was validated by RT-PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and the establishment of robust single nucleotide polymorphism (SNP) and simple-sequence repeat (SSR) databases for genotyping applications and integration of translational genomics in maritime pine breeding programmes. All our data are freely available at SustainpineDB, the P. pinaster expressional database. Results reported here on the maritime pine transcriptome represent a valuable resource for future basic and applied studies on this ecological and economically important pine species. © 2013 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  10. De Novo transcriptome assembly of Zingiber officinale cv. Suruchi of Odisha.

    Science.gov (United States)

    Gaur, Mahendra; Das, Aradhana; Sahoo, Rajesh Kumar; Kar, Basudeba; Nayak, Sanghamitra; Subudhi, Enketeswara

    2016-09-01

    Zingiber officinale Rosc., known as ginger, is an Asian crop, popularly used in every household kitchen and commercially used in bakery, beverage, food and pharmaceutical industries. The present study deals with de novo transcriptome assembly of an elite ginger cultivar Suruchi by next generation sequencing methodology. From the analysis 10.9 GB raw data was obtained which can be available in NCBI accession number SAMN03761185. We identified 41,969 transcripts using Trinity RNA-Seq from ginger rhizome of Suruchi variety from Odisha. The transcript length varied from 300 bp to 8404 bp with a total length of 3,96,40,526 bp and N50 of 1251 bp. To the best of our knowledge, this is the first transcriptome data of an elite ginger cultivar Suruchi released for Odisha state of India which will help molecular biologists to develop genetic markers for identification of cultivars.

  11. De novo assembly of leaf transcriptome in the medicinal plant Andrographis paniculata

    Directory of Open Access Journals (Sweden)

    Neeraja Cherukupalli

    2016-08-01

    Full Text Available Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeqTM 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant nonredundant protein database, gene ontology and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts − using kyoto encyclopedia of genes and genomes database − revealed 5,606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6,767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs in 23,168 transcripts. Assembled sequences of transcriptome of A.paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analyses besides identification of key enzymes involved in the various pathways of secondary metabolism.

  12. De novo Assembly of Leaf Transcriptome in the Medicinal Plant Andrographis paniculata

    Science.gov (United States)

    Cherukupalli, Neeraja; Divate, Mayur; Mittapelli, Suresh R.; Khareedu, Venkateswara R.; Vudem, Dashavantha R.

    2016-01-01

    Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeq™ 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant non-redundant protein database, gene ontology, and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts—using kyoto encyclopedia of genes and genomes database—revealed 5606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs) in 23,168 transcripts. Assembled sequences of transcriptome of A. paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analysis besides identification of key enzymes involved in the various pathways of secondary metabolism. PMID:27582746

  13. De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes.

    Directory of Open Access Journals (Sweden)

    Ingrid Garbus

    Full Text Available A long-standing goal in plant breeding has been the ability to confer apomixis to agriculturally relevant species, which would require a deeper comprehension of the molecular basis of apomictic regulatory mechanisms. Eragrostis curvula (Schrad. Nees is a perennial grass that includes both sexual and apomictic cytotypes. The availability of a reference transcriptome for this species would constitute a very important tool toward the identification of genes controlling key steps of the apomictic pathway. Here, we used Roche/454 sequencing technologies to generate reads from inflorescences of E. curvula apomictic and sexual genotypes that were de novo assembled into a reference transcriptome. Near 90% of the 49568 assembled isotigs showed sequence similarity to sequences deposited in the public databases. A gene ontology analysis categorized 27448 isotigs into at least one of the three main GO categories. We identified 11475 SSRs, and several of them were assayed in E curvula germoplasm using SSR-based primers, providing a valuable set of molecular markers that could allow direct allele selection. The differential contribution to each library of the spliced forms of several transcripts revealed the existence of several isotigs produced via alternative splicing of single genes. The reference transcriptome presented and validated in this work will be useful for the identification of a wide range of gene(s related to agronomic traits of E. curvula, including those controlling key steps of the apomictic pathway in this species, allowing the extrapolation of the findings to other plant species.

  14. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Bruno, Vincent M.; Fang, Zhide; Meng, Xiandong; Blow, Matthew; Zhang, Tao; Sherlock, Gavin; Snyder, Michael; Wang, Zhong

    2010-11-19

    Background: Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied. Results: Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95percent) and reconstruct full-length genes for the majority of the existing gene models (54.3percent). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics. Conclusions: These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

  15. Sequencing and de novo assembly of the transcriptome of the glassy-winged sharpshooter (Homalodisca vitripennis.

    Directory of Open Access Journals (Sweden)

    Raja Sekhar Nandety

    Full Text Available BACKGROUND: The glassy-winged sharpshooter Homalodisca vitripennis (Hemiptera: Cicadellidae, is a xylem-feeding leafhopper and important vector of the bacterium Xylella fastidiosa; the causal agent of Pierce's disease of grapevines. The functional complexity of the transcriptome of H. vitripennis has not been elucidated thus far. It is a necessary blueprint for an understanding of the development of H. vitripennis and for designing efficient biorational control strategies including those based on RNA interference. RESULTS: Here we elucidate and explore the transcriptome of adult H. vitripennis using high-throughput paired end deep sequencing and de novo assembly. A total of 32,803,656 paired-end reads were obtained with an average transcript length of 624 nucleotides. We assembled 32.9 Mb of the transcriptome of H. vitripennis that spanned across 47,265 loci and 52,708 transcripts. Comparison of our non-redundant database showed that 45% of the deduced proteins of H. vitripennis exhibit identity (e-value ≤1(-5 with known proteins. We assigned Gene Ontology (GO terms, Kyoto Encyclopedia of Genes and Genomes (KEGG annotations, and potential Pfam domains to each transcript isoform. In order to gain insight into the molecular basis of key regulatory genes of H. vitripennis, we characterized predicted proteins involved in the metabolism of juvenile hormone, and biogenesis of small RNAs (Dicer and Piwi sequences from the transcriptomic sequences. Analysis of transposable element sequences of H. vitripennis indicated that the genome is less expanded in comparison to many other insects with approximately 1% of the transcriptome carrying transposable elements. CONCLUSIONS: Our data significantly enhance the molecular resources available for future study and control of this economically important hemipteran. This transcriptional information not only provides a more nuanced understanding of the underlying biological and physiological mechanisms that

  16. RNA-seq analysis and de novo transcriptome assembly of Jerusalem artichoke (Helianthus tuberosus Linne).

    Science.gov (United States)

    Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun

    2014-01-01

    Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.

  17. De novo transcriptome assembly of ‘Angeleno’ and ‘Lamoon’ Japanese plum cultivars (Prunus salicina

    Directory of Open Access Journals (Sweden)

    Máximo González

    2016-09-01

    De novo transcriptome assembly was performed using CLC Genome Workbench software and a total of 54,584 unique contigs were generated, with an N50 of 1343 base pair (bp and a mean length of 829 bp. This work contributed with a specific Japanese plum skin transcriptome, providing two libraries of contrasting fruit skin color phenotype (yellow and red and increasing substantially the GB of raw data available until now for this specie.

  18. Sequencing and de novo transcriptome assembly of the Chinese giant salamander (Andrias davidianus

    Directory of Open Access Journals (Sweden)

    Yong Huang

    2017-06-01

    Full Text Available Next-generation technologies for determination of genomics and transcriptomics composition have a wide range of applications. Andrias davidianus, has become an endangered amphibian species of salamander endemic in China. However, there is a lack of the molecular information. In this study, we obtained the RNA-Seq data from a pool of A. davidianus tissue including spleen, liver, muscle, kidney, skin, testis, gut and heart using Illumina HiSeq 2500 platform. A total of 15,398,997,600 bp were obtained, corresponding to 102,659,984 raw reads. A total of 102,659,984 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 132,912 unigenes with an average length of 690 bp and N50 of 1263 bp. Unigenes were annotated through number of databases. These transcriptomic data of A. davidianus should open the door to molecular evolution studies based on the entire transcriptome or targeted genes of interest to sequence. The raw data in this study can be available in NCBI SRA database with accession number of SRP099564.

  19. A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: Assembly of the duck (Anas platyrhynchos transcriptome.

    Directory of Open Access Journals (Sweden)

    Joanna eMoreton

    2014-06-01

    Full Text Available For vertebrate organisms where a reference genome is not available, de novo transcriptome assembly enables a cost effective insight into the identification of tissue specific or differentially expressed genes and variation of the coding part of the genome. However, since there are a number of different tools and parameters that can be used to reconstruct transcripts, it is difficult to determine an optimal method. Here we suggest a pipeline based on (1 assessing the performance of three different assembly tools (2 using both single and multiple k-mer approaches (3 examining the influence of the number of reads used in the assembly (4 merging assemblies from different tools. We use an example dataset from the vertebrate Anas platyrhynchos domestica (Pekin duck. We find that taking a subset of data enables a robust assembly to be produced by multiple methods without the need for very high memory capacity. The use of reads mapped back to transcripts (RMBT and CEGMA (Core Eukaryotic Genes Mapping Approach provides useful metrics to determine the completeness of assembly obtained. For this dataset the use of multiple k-mers in the assembly generated a more complete assembly as measured by greater number of RMBT and CEGMA score. Merged single k-mer assemblies are generally smaller but consist of longer transcripts, suggesting an assembly consisting of fewer fragmented transcripts. We suggest that the use of a subset of reads during assembly allows the relatively rapid investigation of assembly characteristics and can guide the user to the most appropriate transcriptome for particular downstream use. Transcriptomes generated by the compared assembly methods and the final merged assembly are freely available for download at http://dx.doi.org/10.6084/m9.figshare.1032613.

  20. De novo sequencing, assembly and characterization of antennal transcriptome of Anomala corpulenta Motschulsky (Coleoptera: Rutelidae.

    Directory of Open Access Journals (Sweden)

    Haoliang Chen

    Full Text Available Anomala corpulenta is an important insect pest and can cause enormous economic losses in agriculture, horticulture and forestry. It is widely distributed in China, and both larvae and adults can cause serious damage. It is difficult to control this pest because the larvae live underground. Any new control strategy should exploit alternatives to heavily and frequently used chemical insecticides. However, little genetic research has been carried out on A. corpulenta due to the lack of genomic resources. Genomic resources could be produced by next generation sequencing technologies with low cost and in a short time. In this study, we performed de novo sequencing, assembly and characterization of the antennal transcriptome of A. corpulenta.Illumina sequencing technology was used to sequence the antennal transcriptome of A. corpulenta. Approximately 76.7 million total raw reads and about 68.9 million total clean reads were obtained, and then 35,656 unigenes were assembled. Of these unigenes, 21,463 of them could be annotated in the NCBI nr database, and, among the annotated unigenes, 11,154 and 6,625 unigenes could be assigned to GO and COG, respectively. Additionally, 16,350 unigenes could be annotated in the Swiss-Prot database, and 14,499 unigenes could map onto 258 pathways in the KEGG Pathway database. We also found 24 unigenes related to OBPs, 6 to CSPs, and in total 167 unigenes related to chemodetection. We analyzed 4 OBPs and 3CSPs sequences and their RT-qPCR results agreed well with their FPKM values.We produced the first large-scale antennal transcriptome of A. corpulenta, which is a species that has little genomic information in public databases. The identified chemodetection unigenes can promote the molecular mechanistic study of behavior in A. corpulenta. These findings provide a general sequence resource for molecular genetics research on A. corpulenta.

  1. De Novo Assembly and Characterization of Sophora japonica Transcriptome Using RNA-seq

    Directory of Open Access Journals (Sweden)

    Liucun Zhu

    2014-01-01

    Full Text Available Sophora japonica Linn (Chinese Scholar Tree is a shrub species belonging to the subfamily Faboideae of the pea family Fabaceae. In this study, RNA sequencing of S. japonica transcriptome was performed to produce large expression datasets for functional genomic analysis. Approximate 86.1 million high-quality clean reads were generated and assembled de novo into 143010 unique transcripts and 57614 unigenes. The average length of unigenes was 901 bps with an N50 of 545 bps. Four public databases, including the NCBI nonredundant protein (NR, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG, and the Cluster of Orthologous Groups (COG, were used to annotate unigenes through NCBI BLAST procedure. A total of 27541 of 57614 unigenes (47.8% were annotated for gene descriptions, conserved protein domains, or gene ontology. Moreover, an interaction network of unigenes in S. japonica was predicted based on known protein-protein interactions of putative orthologs of well-studied plant genomes. The transcriptome data of S. japonica reported here represents first genome-scale investigation of gene expressions in Faboideae plants. We expect that our study will provide a useful resource for further studies on gene expression, genomics, functional genomics, and protein-protein interaction in S. japonica.

  2. De novo transcriptome assembly of two Vigna angularis varieties collected from Korea

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2016-06-01

    Full Text Available The adzuki bean (Vigna angularis, a member of the family Fabaceae, is widely grown in Asia, from East Asia to the Himalayas. The adzuki bean is known as an ingredient that adds sweetness to diverse desserts made in Eastern Asian countries. Libraries prepared from two V. angularis varieties referred to as Taejin Black and Taejin Red were paired-end sequenced using the Illumina HiSeq 2000 system. The raw data in this study can be available in NCBI SRA database with accession numbers of SRR3406660 and SRR3406553. After de novo transcriptome assembly using Trinity, we obtained 324,219 and 280,056 transcripts from Taejin Black and Taejin Red, respectively. We predicted a total of 238,321 proteins and 179,519 proteins for Taejin Black and Taejin Red, respectively, by the TransDecoder program. We carried out BLASTP on the predicted proteins against the Swiss-Prot protein sequence database to predict the putative functions of identified proteins. Taken together, we provide transcriptomes of two adzuki bean varieties by RNA-Seq, which might be usefully applied to generate molecular markers.

  3. De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2017-10-01

    Full Text Available Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV, infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

  4. De novo Transcriptome Assemblies of Rana (Lithobates catesbeiana and Xenopus laevis Tadpole Livers for Comparative Genomics without Reference Genomes.

    Directory of Open Access Journals (Sweden)

    Inanc Birol

    Full Text Available In this work we studied the liver transcriptomes of two frog species, the American bullfrog (Rana (Lithobates catesbeiana and the African clawed frog (Xenopus laevis. We used high throughput RNA sequencing (RNA-seq data to assemble and annotate these transcriptomes, and compared how their baseline expression profiles change when tadpoles of the two species are exposed to thyroid hormone. We generated more than 1.5 billion RNA-seq reads in total for the two species under two conditions as treatment/control pairs. We de novo assembled these reads using Trans-ABySS to reconstruct reference transcriptomes, obtaining over 350,000 and 130,000 putative transcripts for R. catesbeiana and X. laevis, respectively. Using available genomics resources for X. laevis, we annotated over 97% of our X. laevis transcriptome contigs, demonstrating the utility and efficacy of our methodology. Leveraging this validated analysis pipeline, we also annotated the assembled R. catesbeiana transcriptome. We used the expression profiles of the annotated genes of the two species to examine the similarities and differences between the tadpole liver transcriptomes. We also compared the gene ontology terms of expressed genes to measure how the animals react to a challenge by thyroid hormone. Our study reports three main conclusions. First, de novo assembly of RNA-seq data is a powerful method for annotating and establishing transcriptomes of non-model organisms. Second, the liver transcriptomes of the two frog species, R. catesbeiana and X. laevis, show many common features, and the distribution of their gene ontology profiles are statistically indistinguishable. Third, although they broadly respond the same way to the presence of thyroid hormone in their environment, their receptor/signal transduction pathways display marked differences.

  5. De Novo Sequencing and Assembly Analysis of Transcriptome in Pinus bungeana Zucc. ex Endl.

    Directory of Open Access Journals (Sweden)

    Qifei Cai

    2018-03-01

    Full Text Available To enrich the molecular data of Pinus bungeana Zucc. ex Endl. and study the regulating factors of different morphology controled by apical dominance. In this study, de novo assembly of transcriptome annotation was performed for two varieties of Pinus bungeana Zucc. ex Endl. that are obviously different in morphology. More than 147 million reads were produced, which were assembled into 88,092 unigenes. Based on a similarity search, 11,692 unigenes showed significant similarity to proteins from Picea sitchensis (Bong. Carr. From this collection of unigenes, a large number of molecular markers were identified, including 2829 simple sequence repeats (SSRs. A total of 158 unigenes expressed differently between two varieties, including 98 up-regulated and 60 down-regulated unigenes. Furthermore, among the differently expressed genes (DEGs, five genes which may impact the plant morphology were further validated by reverse transcription quantitative polymerase chain reaction (RT-qPCR. The five genes related to cytokinin oxidase/dehydrogenase (CKX, two-component response regulator ARR-A family (ARR-A, plant hormone signal transduction (AHP, and MADS-box transcription factors have a close relationship with apical dominance. This new dataset will be a useful resource for future genetic and genomic studies in Pinus bungeana Zucc. ex Endl.

  6. De novo assembly and characterization of the garlic (Allium sativum) bud transcriptome by Illumina sequencing.

    Science.gov (United States)

    Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi

    2012-10-01

    Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.

  7. A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms.

    Science.gov (United States)

    Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M

    2017-12-06

    While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

  8. Transcriptomic identification of salt-related genes and de novo assembly in common buckwheat (F. esculentum).

    Science.gov (United States)

    Lu, Qi-Huan; Wang, Ya-Qi; Song, Jin-Nan; Yang, Hong-Bing

    2018-06-01

    Common buckwheat (F. esculentum), annually herbaceous crop, is prevalent in people's daily life with the increasing development of economics. Compared with wheat, it is highly praised with high content of rutin and flavonoid. Common buckwheat is recognized as healthy food with good taste, and the product price of which such as noodles, flour, bread and so on are higher than wheat, and the seeds of which are bigger than that of tartary buckwheat, so if common buckwheat are planted more widely, people will spend less money on this healthy and delicious food. However, soil salinity has been a giant problem for agriculture production. The cultivation of salt tolerant crop varieties is an effective way to make full use of saline alkali land, and the highest salinity that the common buckwheat can sow is at 6.0%, so we chose 100 mM as the concentration of NaCl for treatment. Then we conducted transcriptome comparison between control and treatment groups. Potential regulatory genes related salt stress in common buckwheat were identified. A total of 29.36 million clean reads were produced via an illumina sequencing approach. We de novo assembled these reads into a transcriptome dataset containing 43,772 unigenes with N50 length of 1778 bp. A total of 26,672 unigenes could be found matches in public databases. GO, KEGG and Swiss-Prot classification suggested the enrichment of these unigenes in 47 sub-categories, 25 KOG and 129 pathways, respectively. We got 385 differentially expressed genes (DEGs) after comparing the transcriptome data between salt treatment and control groups. There are some genes encoded for responsing to stimulus, cell killing, metabolic process, signaling, multi-organism process, growth and cellular process might be relevant to salt stress in common buckwheat, which will provide a valuable references for the study on mechanism of salt tolerance and will be used as a genetic information for cultivating strong salt tolerant common buckwheat varieties in

  9. De novo assembly and characterization of tissue specific transcriptomes in the emerald notothen, Trematomus bernacchii.

    Science.gov (United States)

    Huth, Troy J; Place, Sean P

    2013-11-20

    The notothenioids comprise a diverse group of fishes that rapidly radiated after isolation by the Antarctic Circumpolar Current approximately 14-25 million years ago. Given that evolutionary adaptation has led to finely tuned traits with narrow physiological limits in these organisms, this system provides a unique opportunity to examine physiological trade-offs and limits of adaptive responses to environmental perturbation. As such, notothenioids have a rich history with respect to studies attempting to understand the vulnerability of polar ecosystems to the negative impacts associated with global climate change. Unfortunately, despite being a model system for understanding physiological adaptations to extreme environments, we still lack fundamental molecular tools for much of the Nototheniidae family. Specimens of the emerald notothen, Trematomus bernacchii, were acclimated for 28 days in flow-through seawater tanks maintained near ambient seawater temperatures (-1.5°C) or at +4°C. Following acclimation, tissue specific cDNA libraries for liver, gill and brain were created by pooling RNA from n = 5 individuals per temperature treatment. The tissue specific libraries were bar-coded and used for 454 pyrosequencing, which yielded over 700 thousand sequencing reads. A de novo assembly and annotation of these reads produced a functional transcriptome library of T. bernacchii containing 30,107 unigenes, 13,003 of which possessed significant homology to a known protein product. Digital gene expression analysis of these extremely cold adapted fish reinforced the loss of an inducible heat shock response and allowed the preliminary exploration into other elements of the cellular stress response. Preliminary exploration of the transcriptome of T. bernacchii under elevated temperatures enabled a semi-quantitative comparison to prior studies aimed at characterizing the thermal response of this endemic fish whose size, abundance and distribution has established it as a

  10. Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

    Science.gov (United States)

    Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent

    2017-01-01

    The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when

  11. De novo Assembly and Characterization of Cajanus scarabaeoides (L. Thouars Transcriptome by Paired-End Sequencing

    Directory of Open Access Journals (Sweden)

    Deepti Nigam

    2017-07-01

    Full Text Available Pigeonpea [Cajanus cajan (L. Millsp.] is a heat and drought resilient legume crop grown mostly in Asia and Africa. Pigeonpea is affected by various biotic (diseases and insect pests and abiotic stresses (salinity and water logging which limit the yield potential of this crop. However, resistance to all these constraints is not readily available in the cultivated genotypes and some of the wild relatives have been found to withstand these resistances. Thus, the utilization of crop wild relatives (CWR in pigeonpea breeding has been effective in conferring resistance, quality and breeding efficiency traits to this crop. Bud and leaf tissue of Cajanus scarabaeoides, a wild relative of pigeon pea were used for transcriptome profiling. Approximately 30 million clean reads filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (3.02 gigabase pairs were generated by Illumina paired-end RNA-seq technology. All of these clean reads were pooled and assembled de novo into 1,17,007 transcripts using the Trinity. Finally, a total of 98,664 unigenes were derived with mean length of 396 bp and N50 values of 1393. The assembly produced significant mapping results (73.68% in BLASTN searches of the Glycine max CDS sequence database (Ensembl. Further, uniprot database of Viridiplantae was used for unigene annotation; 81,799 of 98,664 (82.90% unigenes were finally annotated with gene descriptions or conserved protein domains. Further, a total of 23,475 SSRs were identified in 27,321 unigenes. This data will provide useful information for mining of functionally important genes and SSR markers for pigeonpea improvement.

  12. De novo transcriptome assembly and its annotation for the aposematic wood tiger moth (Parasemia plantaginis

    Directory of Open Access Journals (Sweden)

    Juan A. Galarza

    2017-06-01

    Full Text Available In this paper we report the public availability of transcriptome resources for the aposematic wood tiger moth (Parasemia plantaginis. A comprehensive assembly methods, quality statistics, and annotation are provided. This reference transcriptome may serve as a useful resource for investigating functional gene activity in aposematic Lepidopteran species. All data is freely available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena under study accession number: PRJEB14172.

  13. Next-generation transcriptome assembly

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey A.; Wang, Zhong

    2011-09-01

    Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalog of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies-along with some perspectives on transcriptome assembly in the near future.

  14. De novo Assembly and Analysis of the Chilean Pencil Catfish Trichomycterus areolatus Transcriptome

    Science.gov (United States)

    Schulze, Thomas T.; Ali, Jonathan M.; Bartlett, Maggie L.; McFarland, Madalyn M.; Clement, Emalie J.; Won, Harim I.; Sanford, Austin G.; Monzingo, Elyssa B.; Martens, Matthew C.; Hemsley, Ryan M.; Kumar, Sidharta; Gouin, Nicolas; Kolok, Alan S.; Davis, Paul H.

    2016-01-01

    Trichomycterus areolatus is an endemic species of pencil catfish that inhabits the riffles and rapids of many freshwater ecosystems of Chile. Despite its unique adaptation to Chile's high gradient watersheds and therefore potential application in the investigation of ecosystem integrity and environmental contamination, relatively little is known regarding the molecular biology of this environmental sentinel. Here, we detail the assembly of the Trichomycterus areolatus transcriptome, a molecular resource for the study of this organism and its molecular response to the environment. RNA-Seq reads were obtained by next-generation sequencing with an Illumina® platform and processed using PRINSEQ. The transcriptome assembly was performed using TRINITY assembler. Transcriptome validation was performed by functional characterization with KOG, KEGG, and GO analyses. Additionally, differential expression analysis highlights sex-specific expression patterns, and a list of endocrine and oxidative stress related transcripts are included. PMID:27672404

  15. De novo transcriptome assembly for the tropical grass Panicum maximum Jacq.

    Directory of Open Access Journals (Sweden)

    Guilherme Toledo-Silva

    Full Text Available Guinea grass (Panicum maximum Jacq. is a tropical African grass often used to feed beef cattle, which is an important economic activity in Brazil. Brazil is the leader in global meat exportation because of its exclusively pasture-raised bovine herds. Guinea grass also has potential uses in bioenergy production due to its elevated biomass generation through the C4 photosynthesis pathway. We generated approximately 13 Gb of data from Illumina sequencing of P. maximum leaves. Four different genotypes were sequenced, and the combined reads were assembled de novo into 38,192 unigenes and annotated; approximately 63% of the unigenes had homology to other proteins in the NCBI non-redundant protein database. Functional classification through COG (Clusters of Orthologous Groups, GO (Gene Ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes analyses showed that the unigenes from Guinea grass leaves are involved in a wide range of biological processes and metabolic pathways, including C4 photosynthesis and lignocellulose generation, which are important for cattle grazing and bioenergy production. The most abundant transcripts were involved in carbon fixation, photosynthesis, RNA translation and heavy metal cellular homeostasis. Finally, we identified a number of potential molecular markers, including 5,035 microsatellites (SSRs and 346,456 single nucleotide polymorphisms (SNPs. To the best of our knowledge, this is the first study to characterize the complete leaf transcriptome of P. maximum using high-throughput sequencing. The biological information provided here will aid in gene expression studies and marker-assisted selection-based breeding research in tropical grasses.

  16. De novo transcriptome assembly of the calanoid copepod Neocalanus flemingeri: A new resource for emergence from diapause.

    Science.gov (United States)

    Roncalli, Vittoria; Cieslak, Matthew C; Sommer, Stephanie A; Hopcroft, Russell R; Lenz, Petra H

    2018-02-01

    Copepods, small planktonic crustaceans, are key links between primary producers and upper trophic levels, including many economically important fishes. In the subarctic North Pacific, the life cycle of copepods like Neocalanus flemingeri includes an ontogenetic migration to depth followed by a period of diapause (a type of dormancy) characterized by arrested development and low metabolic activity. The end of diapause is marked by the production of the first brood of eggs. Recent temperature anomalies in the North Pacific have raised concerns about potential negative effects on N. flemingeri. Since diapause is a developmental program, its progress can be tracked using through global gene expression. Thus, a reference transcriptome was developed as a first step towards physiological profiling of diapausing females using high-throughput Illumina sequencing. The de novo transcriptome, the first for this species was designed to investigate the diapause period. RNA-Seq reads were obtained for dormant to reproductive N. flemingeri females. A high quality de novo transcriptome was obtained by first assembling reads from each individual using Trinity software followed by clustering with CAP3 Assembly Program. This assembly consisted of 140,841transcripts (contigs). Bench-marking universal single-copy orthologs analysis identified 85% of core eukaryotic genes, with 79% predicted to be complete. Comparison with other calanoid transcriptomes confirmed its quality and degree of completeness. Trinity assembly of reads originating from multiple individuals led to fragmentation. Thus, the workflow applied here differed from the one recommended by Trinity, but was required to obtain a good assembly. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  17. IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, Siu-Ming; Lv, Ming-Ju; Zhu, Xin-Guang; Chin, Francis Y L

    2013-07-01

    RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100), which make it very difficult to identify low-expressed isoforms. One challenge is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing correct ones with not-so-high multiplicity from low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to erroneous vertices/edges. Contributions: Unlike existing tools, which remove erroneous vertices/edges with multiplicities lower than a global threshold, we use a probabilistic progressive approach to iteratively remove them with local thresholds. This enables us to decompose the graph into disconnected components, each containing a few genes, if not a single gene, while retaining many correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperform existing assemblers in terms of sensitivity and specificity for both simulated and real data. http://www.cs.hku.hk/~alse/idba_tran. Supplementary data are available at Bioinformatics online.

  18. Developmental gene discovery in a hemimetabolous insect: de novo assembly and annotation of a transcriptome for the cricket Gryllus bimaculatus.

    Directory of Open Access Journals (Sweden)

    Victor Zeng

    Full Text Available Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects, representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket, a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in

  19. De novo assembly and comparison of the ovarian transcriptomes of the common Chinese cuttlefish (Sepiella japonica with different gonadal development

    Directory of Open Access Journals (Sweden)

    Zhenming Lü

    2016-03-01

    Full Text Available The common Chinese cuttlefish (Sepiella japonica has been considered one of the most economically important marine Cephalopod species in East Asia and seed breeding technology has been established for massive aquaculture and stock enhancement. In the present study, we used Illumina HiSeq2000 to sequence, assemble and annotate the transcriptome of the ovary tissues of S. japonica for the first time. A total of 53,116,650 and 53,446,640 reads were obtained from the immature and matured ovaries, respectively (NCBI SRA database SRX1409472 and SRX1409473, and 70,039 contigs (N50 = 1443 bp were obtained after de novo assembling with Trinity software. Digital gene expression analysis reveals 47,288 contigs show differential expression profile and 793 contigs are highly expressed in the immature ovary, while 38 contigs are highly expressed in the mature ovary with FPKM >100. We hope that the ovarian transcriptome and those stage-enriched transcripts of S. japonica can provide some insight into the understanding of genome-wide transcriptome profile of cuttlefish gonad tissue and give useful information in cuttlefish gonad development. Keywords: Cuttlefish, Gonad development, Transcriptome

  20. Digital gene expression analysis based on integrated de novo transcriptome assembly of sweet potato [Ipomoea batatas (L. Lam].

    Directory of Open Access Journals (Sweden)

    Xiang Tao

    Full Text Available BACKGROUND: Sweet potato (Ipomoea batatas L. [Lam.] ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages. METHODOLOGY/PRINCIPAL FINDINGS: Illumina paired-end (PE RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥ 100 bp, which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified. CONCLUSIONS/SIGNIFICANCE: The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots

  1. De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis

    Directory of Open Access Journals (Sweden)

    Si Lok

    2017-02-01

    Full Text Available The Canadian beaver (Castor canadensis is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 × long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 × and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.

  2. De novo assembly of a cotyledon-enriched transcriptome map of Vicia faba (L. for transfer cell research

    Directory of Open Access Journals (Sweden)

    Kiruba Shankari eArun Chinnappa

    2015-04-01

    Full Text Available Vicia faba (L. is an important cool-season grain legume species used widely in agriculture but also in plant physiology research, particularly as an experimental model to study transfer cell (TC development. Adaxial epidermal cells of isolated cotyledons can be induced to form functional TCs, thus providing a valuable experimental system to investigate genetic regulation of TC development. The genome of V. faba is exceedingly large (ca. 13 Gb, however, and limited genomic information is available for this species. To provide a resource for transcript profiling of epidermal TC development, we have undertaken de novo assembly of a cotyledon-enriched transcriptome map for V. faba. Illumina paired-end sequencing of total RNA pooled from different tissues and different stages, including isolated cotyledons induced to form TCs, generated 69.5M reads, of which 65.8M were used for assembly following trimming and quality control. Assembly using a De-Bruijn graph-based approach within CLC Genomics Workbench v6.1 generated 21,297 contigs, of which 80.6% were successfully annotated against GO terms. The assembly was validated against known V. faba cDNAs held in GenBank, including transcripts previously identified as being specifically expressed in epidermal cells across TC trans-differentiation. This cotyledon-enriched transcriptome map therefore provides a valuable tool for future transcript profiling of epidermal TC development, and also enriches the genetic resources available for this important legume crop species.

  3. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L..

    Directory of Open Access Journals (Sweden)

    Nan Fu

    Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

  4. RNA-seq analysis of Quercus pubescens Leaves: de novo transcriptome assembly, annotation and functional markers development.

    Directory of Open Access Journals (Sweden)

    Sara Torre

    Full Text Available Quercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.

  5. Transcriptome sequencing and de novo assembly in arecanut, Areca catechu L elucidates the secondary metabolite pathway genes

    Directory of Open Access Journals (Sweden)

    Ramaswamy Manimekalai

    2018-03-01

    Full Text Available Areca catechu L. belongs to the Arecaceae family which comprises many economically important palms. The palm is a source of alkaloids and carotenoids. The lack of ample genetic information in public databases has been a constraint for the genetic improvement of arecanut. To gain molecular insight into the palm, high throughput RNA sequencing and de novo assembly of arecanut leaf transcriptome was undertaken in the present study. A total 56,321,907 paired end reads of 101 bp length consisting of 11.343 Gb nucleotides were generated. De novo assembly resulted in 48,783 good quality transcripts, of which 67% of transcripts could be annotated against NCBI non – redundant database. The Gene Ontology (GO analysis with UniProt database identified 9222 biological process, 11268 molecular function and 7574 cellular components GO terms. Large scale expression profiling through Fragments per Kilobase per Million mapped reads (FPKM showed major genes involved in different metabolic pathways of the plant. Metabolic pathway analysis of the assembled transcripts identified 124 plant related pathways. The transcripts related to carotenoid and alkaloid biosynthetic pathways had more number of reads and FPKM values suggesting higher expression of these genes. The arecanut transcript sequences generated in the study showed high similarity with coconut, oil palm and date palm sequences retrieved from public domains. We also identified 6853 genic SSR regions in the arecanut. The possible primers were designed for SSR detection and this would simplify the future efforts in genetic characterization of arecanut.

  6. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

    Science.gov (United States)

    Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

    2012-01-01

    Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to

  7. De Novo Transcriptome Assembly (NGS) of Curcuma longa L. Rhizome Reveals Novel Transcripts Related to Anticancer and Antimalarial Terpenoids

    Science.gov (United States)

    Jayakumar, Vasanthan; Damodaran, Anand C.; Rao, Sudha Narayana; Katta, Mohan A. V. S. K.; Gopinathan, Sreeja; Sarma, Santosh Prasad; Senthilkumar, Vanitha; Niranjan, Vidya; Gopinath, Ashok; Mugasimangalam, Raja C.

    2013-01-01

    Herbal remedies are increasingly being recognised in recent years as alternative medicine for a number of diseases including cancer. Curcuma longa L., commonly known as turmeric is used as a culinary spice in India and in many Asian countries has been attributed to lower incidences of gastrointestinal cancers. Curcumin, a secondary metabolite isolated from the rhizomes of this plant has been shown to have significant anticancer properties, in addition to antimalarial and antioxidant effects. We sequenced the transcriptome of the rhizome of the 3 varieties of Curcuma longa L. using Illumina reversible dye terminator sequencing followed by de novo transcriptome assembly. Multiple databases were used to obtain a comprehensive annotation and the transcripts were functionally classified using GO, KOG and PlantCyc. Special emphasis was given for annotating the secondary metabolite pathways and terpenoid biosynthesis pathways. We report for the first time, the presence of transcripts related to biosynthetic pathways of several anti-cancer compounds like taxol, curcumin, and vinblastine in addition to anti-malarial compounds like artemisinin and acridone alkaloids, emphasizing turmeric's importance as a highly potent phytochemical. Our data not only provides molecular signatures for several terpenoids but also a comprehensive molecular resource for facilitating deeper insights into the transcriptome of C. longa. PMID:23468859

  8. De Novo transcriptome assembly (NGS of Curcuma longa L. rhizome reveals novel transcripts related to anticancer and antimalarial terpenoids.

    Directory of Open Access Journals (Sweden)

    Ramasamy S Annadurai

    Full Text Available Herbal remedies are increasingly being recognised in recent years as alternative medicine for a number of diseases including cancer. Curcuma longa L., commonly known as turmeric is used as a culinary spice in India and in many Asian countries has been attributed to lower incidences of gastrointestinal cancers. Curcumin, a secondary metabolite isolated from the rhizomes of this plant has been shown to have significant anticancer properties, in addition to antimalarial and antioxidant effects. We sequenced the transcriptome of the rhizome of the 3 varieties of Curcuma longa L. using Illumina reversible dye terminator sequencing followed by de novo transcriptome assembly. Multiple databases were used to obtain a comprehensive annotation and the transcripts were functionally classified using GO, KOG and PlantCyc. Special emphasis was given for annotating the secondary metabolite pathways and terpenoid biosynthesis pathways. We report for the first time, the presence of transcripts related to biosynthetic pathways of several anti-cancer compounds like taxol, curcumin, and vinblastine in addition to anti-malarial compounds like artemisinin and acridone alkaloids, emphasizing turmeric's importance as a highly potent phytochemical. Our data not only provides molecular signatures for several terpenoids but also a comprehensive molecular resource for facilitating deeper insights into the transcriptome of C. longa.

  9. De Novo transcriptome assembly (NGS) of Curcuma longa L. rhizome reveals novel transcripts related to anticancer and antimalarial terpenoids.

    Science.gov (United States)

    Annadurai, Ramasamy S; Neethiraj, Ramprasad; Jayakumar, Vasanthan; Damodaran, Anand C; Rao, Sudha Narayana; Katta, Mohan A V S K; Gopinathan, Sreeja; Sarma, Santosh Prasad; Senthilkumar, Vanitha; Niranjan, Vidya; Gopinath, Ashok; Mugasimangalam, Raja C

    2013-01-01

    Herbal remedies are increasingly being recognised in recent years as alternative medicine for a number of diseases including cancer. Curcuma longa L., commonly known as turmeric is used as a culinary spice in India and in many Asian countries has been attributed to lower incidences of gastrointestinal cancers. Curcumin, a secondary metabolite isolated from the rhizomes of this plant has been shown to have significant anticancer properties, in addition to antimalarial and antioxidant effects. We sequenced the transcriptome of the rhizome of the 3 varieties of Curcuma longa L. using Illumina reversible dye terminator sequencing followed by de novo transcriptome assembly. Multiple databases were used to obtain a comprehensive annotation and the transcripts were functionally classified using GO, KOG and PlantCyc. Special emphasis was given for annotating the secondary metabolite pathways and terpenoid biosynthesis pathways. We report for the first time, the presence of transcripts related to biosynthetic pathways of several anti-cancer compounds like taxol, curcumin, and vinblastine in addition to anti-malarial compounds like artemisinin and acridone alkaloids, emphasizing turmeric's importance as a highly potent phytochemical. Our data not only provides molecular signatures for several terpenoids but also a comprehensive molecular resource for facilitating deeper insights into the transcriptome of C. longa.

  10. Transcriptome Sequencing, De Novo Assembly and Differential Gene Expression Analysis of the Early Development of Acipenser baeri.

    Directory of Open Access Journals (Sweden)

    Wei Song

    Full Text Available The molecular mechanisms that drive the development of the endangered fossil fish species Acipenser baeri are difficult to study due to the lack of genomic data. Recent advances in sequencing technologies and the reducing cost of sequencing offer exclusive opportunities for exploring important molecular mechanisms underlying specific biological processes. This manuscript describes the large scale sequencing and analyses of mRNA from Acipenser baeri collected at five development time points using the Illumina Hiseq2000 platform. The sequencing reads were de novo assembled and clustered into 278167 unigenes, of which 57346 (20.62% had 45837 known homologues proteins in Uniprot protein databases while 11509 proteins matched with at least one sequence of assembled unigenes. The remaining 79.38% of unigenes could stand for non-coding unigenes or unigenes specific to A. baeri. A number of 43062 unigenes were annotated into functional categories via Gene Ontology (GO annotation whereas 29526 unigenes were associated with 329 pathways by mapping to KEGG database. Subsequently, 3479 differentially expressed genes were scanned within developmental stages and clustered into 50 gene expression profiles. Genes preferentially expressed at each stage were also identified. Through GO and KEGG pathway enrichment analysis, relevant physiological variations during the early development of A. baeri could be better cognized. Accordingly, the present study gives insights into the transcriptome profile of the early development of A. baeri, and the information contained in this large scale transcriptome will provide substantial references for A. baeri developmental biology and promote its aquaculture research.

  11. De Novo Assembly and Characterization of Fruit Transcriptome in Black Pepper (Piper nigrum).

    Science.gov (United States)

    Hu, Lisong; Hao, Chaoyun; Fan, Rui; Wu, Baoduo; Tan, Lehe; Wu, Huasong

    2015-01-01

    Black pepper is one of the most popular and oldest spices in the world and valued for its pungent constituent alkaloids. Pinerine is the main bioactive compound in pepper alkaloids, which perform unique physiological functions. However, the mechanisms of piperine synthesis are poorly understood. This study is the first to describe the fruit transcriptome of black pepper by sequencing on Illumina HiSeq 2000 platform. A total of 56,281,710 raw reads were obtained and assembled. From these raw reads, 44,061 unigenes with an average length of 1,345 nt were generated. During functional annotation, 40,537 unigenes were annotated in Gene Ontology categories, Kyoto Encyclopedia of Genes and Genomes pathways, Swiss-Prot database, and Nucleotide Collection (NR/NT) database. In addition, 8,196 simple sequence repeats (SSRs) were detected. In a detailed analysis of the transcriptome, housekeeping genes for quantitative polymerase chain reaction internal control, polymorphic SSRs, and lysine/ornithine metabolism-related genes were identified. These results validated the availability of our database. Our study could provide useful data for further research on piperine synthesis in black pepper.

  12. De novo assembly of pen shell ( Atrina pectinata) transcriptome and screening of its genic microsatellites

    Science.gov (United States)

    Sun, Xiujun; Li, Dongming; Liu, Zhihong; Zhou, Liqing; Wu, Biao; Yang, Aiguo

    2017-10-01

    The pen shell ( Atrina pectinata) is a large wedge-shaped bivalve, which belongs to family Pinnidae. Due to its large and nutritious adductor muscle, it is the popular seafood with high commercial value in Asia-Pacific countries. However, limiting genomic and transcriptomic data have hampered its genetic investigations. In this study, the transcriptome of A. pectinata was deeply sequenced using Illumina pair-end sequencing technology. After assembling, a total of 127263 unigenes were obtained. Functional annotation indicated that the highest percentage of unigenes (18.60%) was annotated on GO database, followed by 18.44% on PFAM database and 17.04% on NR database. There were 270 biological pathways matched with those in KEGG database. Furthermore, a total of 23452 potential simple sequence repeats (SSRs) were identified, of them the most abundant type was mono-nucleotide repeats (12902, 55.01%), which was followed by di-nucleotide (8132, 34.68%), tri-nucleotide (2010, 8.57%), tetra-nucleotide (401, 1.71%), and penta-nucleotide (7, 0.03%) repeats. Sixty SSRs were selected for validating and developing genic SSR markers, of them 23 showed polymorphism in a cultured population with the average observed and expected heterozygosities of 0.412 and 0.579, respectively. In this study, we established the first comprehensive transcript dataset of A. pectinata genes. Our results demonstrated that RNA-Seq is a fast and cost-effective method for genic SSR development in non-model species.

  13. De novo transcriptome assembly and the putative biosynthetic pathway of steroidal sapogenins of Dioscorea composita.

    Directory of Open Access Journals (Sweden)

    Xia Wang

    Full Text Available The plant Dioscorea composita has important applications in the medical and energy industries, and can be used for the extraction of steroidal sapogenins (important raw materials for the synthesis of steroidal drugs and bioethanol production. However, little is known at the genetic level about how sapogenins are biosynthesized in this plant. Using Illumina deep sequencing, 62,341 unigenes were obtained by assembling its transcriptome, and 27,720 unigenes were annotated. Of these, 8,022 unigenes were mapped to 243 specific pathways, and 531 unigenes were identified to be involved in 24 secondary metabolic pathways. 35 enzymes, which were encoded by 79 unigenes, were related to the biosynthesis of steroidal sapogenins in this transcriptome database, covering almost all the nodes in the steroidal pathway. The results of real-time PCR experiments on ten related transcripts (HMGR, MK, SQLE, FPPS, DXS, CAS, HMED, CYP51, DHCR7, and DHCR24 indicated that sapogenins were mainly biosynthesized by the mevalonate pathway. The expression of these ten transcripts in the tuber and leaves was found to be much higher than in the stem. Also, expression in the shoots was low. The nucleotide and protein sequences and conserved domains of four related genes (HMGR, CAS, SQS, and SMT1 were highly conserved between D. composita and D. zingiberensis; but expression of these four genes is greater in D. composita. However, there is no expression of these key enzymes in potato and no steroidal sapogenins are synthesized.

  14. A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach.

    Science.gov (United States)

    Melicher, Dacotah; Torson, Alex S; Dworkin, Ian; Bowsher, Julia H

    2014-03-12

    The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba. Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species. The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

  15. De novo Transcriptome Assembly of Common Wild Rice (Oryza rufipogon Griff.) and Discovery of Drought-Response Genes in Root Tissue Based on Transcriptomic Data.

    Science.gov (United States)

    Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu

    2015-01-01

    The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.

  16. De novo assembly of a transcriptome for Calanus finmarchicus (Crustacea, Copepoda--the dominant zooplankter of the North Atlantic Ocean.

    Directory of Open Access Journals (Sweden)

    Petra H Lenz

    Full Text Available Assessing the impact of global warming on the food web of the North Atlantic will require difficult-to-obtain physiological data on a key copepod crustacean, Calanus finmarchicus. The de novo transcriptome presented here represents a new resource for acquiring such data. It was produced from multiplexed gene libraries using RNA collected from six developmental stages: embryo, early nauplius (NI-II, late nauplius (NV-VI, early copepodite (CI-II, late copepodite (CV and adult (CVI female. Over 400,000,000 paired-end reads (100 base-pairs long were sequenced on an Illumina instrument, and assembled into 206,041 contigs using Trinity software. Coverage was estimated to be at least 65%. A reference transcriptome comprising 96,090 unique components ("comps" was annotated using Blast2GO. 40% of the comps had significant blast hits. 11% of the comps were successfully annotated with gene ontology (GO terms. Expression of many comps was found to be near zero in one or more developmental stages suggesting that 35 to 48% of the transcriptome is "silent" at any given life stage. Transcripts involved in lipid biosynthesis pathways, critical for the C. finmarchicus life cycle, were identified and their expression pattern during development was examined. Relative expression of three transcripts suggests wax ester biosynthesis in late copepodites, but triacylglyceride biosynthesis in adult females. Two of these transcripts may be involved in the preparatory phase of diapause. A key environmental challenge for C. finmarchicus is the seasonal exposure to the dinoflagellate Alexandrium fundyense with high concentrations of saxitoxins, neurotoxins that block voltage-gated sodium channels. Multiple contigs encoding putative voltage-gated sodium channels were identified. They appeared to be the result of both alternate splicing and gene duplication. This is the first report of multiple NaV1 genes in a protostome. These data provide new insights into the transcriptome

  17. De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.

    Science.gov (United States)

    Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan

    2018-05-24

    High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.

  18. De novo assembly and comparative analysis of the transcriptome of embryogenic callus formation in bread wheat (Triticum aestivum L.).

    Science.gov (United States)

    Chu, Zongli; Chen, Junying; Sun, Junyan; Dong, Zhongdong; Yang, Xia; Wang, Ying; Xu, Haixia; Zhang, Xiaoke; Chen, Feng; Cui, Dangqun

    2017-12-19

    During asexual reproduction the embryogenic callus can differentiate into a new plantlet, offering great potential for fostering in vitro culture efficiency in plants. The immature embryos (IMEs) of wheat (Triticum aestivum L.) are more easily able to generate embryogenic callus than mature embryos (MEs). To understand the molecular process of embryogenic callus formation in wheat, de novo transcriptome sequencing was used to generate transcriptome sequences from calli derived from IMEs and MEs after 3d, 6d, or 15d of culture (DC). In total, 155 million high quality paired-end reads were obtained from the 6 cDNA libraries. Our de novo assembly generated 142,221 unigenes, of which 59,976 (42.17%) were annotated with a significant Blastx against nr, Pfam, Swissprot, KOG, KEGG, GO and COG/KOG databases. Comparative transcriptome analysis indicated that a total of 5194 differentially expressed genes (DEGs) were identified in the comparisons of IME vs. ME at the three stages, including 3181, 2085 and 1468 DEGs at 3, 6 and 15 DC, respectively. Of them, 283 overlapped in all the three comparisons. Furthermore, 4731 DEGs were identified in the comparisons between stages in IMEs and MEs. Functional analysis revealed that 271transcription factor (TF) genes (10 overlapped in all 3 comparisons of IME vs. ME) and 346 somatic embryogenesis related genes (SSEGs; 35 overlapped in all 3 comparisons of IME vs. ME) were differentially expressed in at least one comparison of IME vs. ME. In addition, of the 283 overlapped DEGs in the 3 comparisons of IME vs. ME, excluding the SSEGs and TFs, 39 possessed a higher rate of involvement in biological processes relating to response to stimuli, in multi-organism processes, reproductive processes and reproduction. Furthermore, 7 were simultaneously differentially expressed in the 2 comparisons between the stages in IMEs, but not MEs, suggesting that they may be related to embryogenic callus formation. The expression levels of genes, which

  19. De novo transcriptome assembly and analysis of differential gene expression in response to drought in European beech.

    Directory of Open Access Journals (Sweden)

    Markus Müller

    Full Text Available Despite the ecological and economic importance of European beech (Fagus sylvatica L. genomic resources of this species are still limited. This hampers an understanding of the molecular basis of adaptation to stress. Since beech will most likely be threatened by the consequences of climate change, an understanding of adaptive processes to climate change-related drought stress is of major importance. Here, we used RNA-seq to provide the first drought stress-related transcriptome of beech. In a drought stress trial with beech saplings, 50 samples were taken for RNA extraction at five points in time during a soil desiccation experiment. De novo transcriptome assembly and analysis of differential gene expression revealed 44,335 contigs, and 662 differentially expressed genes between the stress and normally watered control group. Gene expression was specific to the different time points, and only five genes were significantly differentially expressed between the stress and control group on all five sampling days. GO term enrichment showed that mostly genes involved in lipid- and homeostasis-related processes were upregulated, whereas genes involved in oxidative stress response were downregulated in the stressed seedlings. This study gives first insights into the genomic drought stress response of European beech, and provides new genetic resources for adaptation research in this species.

  20. De Novo Assembly and Comparative Transcriptome Analysis Provide Insight into Lysine Biosynthesis in Toona sinensis Roem.

    Science.gov (United States)

    Zhang, Xia; Song, Zhenqiao; Liu, Tian; Guo, Linlin; Li, Xingfeng

    2016-01-01

    Toona sinensis Roem is a popular leafy vegetable in Chinese cuisine and is also used as a traditional Chinese medicine. In this study, leaf samples were collected from the same plant on two development stages and then used for high-throughput Illumina RNA-sequencing (RNA-Seq). 125,884 transcripts and 54,628 unigenes were obtained through de novo assembly. A total of 25,570 could be annotated with known biological functions, which indicated that the T. sinensis leaves and shoots were undergoing multiple developmental processes especially for active metabolic processes. Analysis of differentially expressed unigenes between the two libraries showed that the lysine biosynthesis was an enriched KEGG pathway, and candidate genes involved in the lysine biosynthesis pathway in T. sinensis leaves and shoots were identified. Our results provide a primary analysis of the gene expression files of T. sinensis leaf and shoot on different development stages and afford a valuable resource for genetic and genomic research on plant lysine biosynthesis.

  1. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Dan-Dan Wei

    Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying

  2. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.

    Science.gov (United States)

    Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D

    2013-02-28

    Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to

  3. De novo assembly and characterization of the transcriptome of the parasitic weed dodder identifies genes associated with plant parasitism.

    Science.gov (United States)

    Ranjan, Aashish; Ichihashi, Yasunori; Farhi, Moran; Zumstein, Kristina; Townsley, Brad; David-Schwartz, Rakefet; Sinha, Neelima R

    2014-11-01

    Parasitic flowering plants are one of the most destructive agricultural pests and have major impact on crop yields throughout the world. Being dependent on finding a host plant for growth, parasitic plants penetrate their host using specialized organs called haustoria. Haustoria establish vascular connections with the host, which enable the parasite to steal nutrients and water. The underlying molecular and developmental basis of parasitism by plants is largely unknown. In order to investigate the process of parasitism, RNAs from different stages (i.e. seed, seedling, vegetative strand, prehaustoria, haustoria, and flower) were used to de novo assemble and annotate the transcriptome of the obligate plant stem parasite dodder (Cuscuta pentagona). The assembled transcriptome was used to dissect transcriptional dynamics during dodder development and parasitism and identified key gene categories involved in the process of plant parasitism. Host plant infection is accompanied by increased expression of parasite genes underlying transport and transporter categories, response to stress and stimuli, as well as genes encoding enzymes involved in cell wall modifications. By contrast, expression of photosynthetic genes is decreased in the dodder infective stages compared with normal stem. In addition, genes relating to biosynthesis, transport, and response of phytohormones, such as auxin, gibberellins, and strigolactone, were differentially expressed in the dodder infective stages compared with stems and seedlings. This analysis sheds light on the transcriptional changes that accompany plant parasitism and will aid in identifying potential gene targets for use in controlling the infestation of crops by parasitic weeds. © 2014 American Society of Plant Biologists. All Rights Reserved.

  4. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  5. De novo assembly and characterization of the transcriptome, and development of SSR markers in wax gourd (Benicasa hispida.

    Directory of Open Access Journals (Sweden)

    Biao Jiang

    Full Text Available BACKGROUND: Wax gourd is a widely used vegetable of Cucuribtaceae, and also has important medicinal and health values. However, the genomic resources of wax gourd were scarcity, and only a few nucleotide sequences could be obtained in public databases. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we examined transcriptome in wax gourd. More than 44 million of high quality reads were generated from five different tissues of wax gourd using Illumina paired-end sequencing technology. Approximately 4 Gbp data were generated, and de novo assembled into 65,059 unigenes, with an N50 of 1,132 bp. Based on sequence similarity search with known protein database, 36,070 (55.4% showed significant similarity to known proteins in Nr database, and 24,969 (38.4% had BLAST hits in Swiss-Prot database. Among the annotated unigenes, 14,994 of wax gourd unigenes were assigned to GO term annotation, and 23,977 were found to have COG classifications. In addition, a total of 18,713 unigenes were assigned to 281 KEGG pathways. Furthermore, 6,242 microsatellites (simple sequence repeats were detected as potential molecular markers in wax gourd. Two hundred primer pairs for SSRs were designed for validation of the amplification and polymorphism. The result showed that 170 of the 200 primer pairs were successfully amplified and 49 (28.8% of them exhibited polymorphisms. CONCLUSION/SIGNIFICANCE: Our study enriches the genomic resources of wax gourd and provides powerful information for future studies. The availability of this ample amount of information about the transcriptome and SSRs in wax gourd could serve as valuable basis for studies on the physiology, biochemistry, molecular genetics and molecular breeding of this important vegetable crop.

  6. Incomplete sex chromosome dosage compensation in the Indian meal moth, Plodia interpunctella, based on de novo transcriptome assembly.

    Science.gov (United States)

    Harrison, Peter W; Mank, Judith E; Wedell, Nina

    2012-01-01

    Males and females experience differences in gene dose for loci in the nonrecombining region of heteromorphic sex chromosomes. If not compensated, this leads to expression imbalances, with the homogametic sex on average exhibiting greater expression due to the doubled gene dose. Many organisms with heteromorphic sex chromosomes display global dosage compensation mechanisms, which equalize gene expression levels between the sexes. However, birds and Schistosoma have been previously shown to lack chromosome-wide dosage compensation mechanisms, and the status in other female heterogametic taxa including Lepidoptera remains unresolved. To further our understanding of dosage compensation in female heterogametic taxa and to resolve its status in the lepidopterans, we assessed the Indian meal moth, Plodia interpunctella. As P. interpunctella lacks a complete reference genome, we conducted de novo transcriptome assembly combined with orthologous genomic location prediction from the related silkworm genome, Bombyx mori, to compare Z-linked and autosomal gene expression levels for each sex. We demonstrate that P. interpunctella lacks complete Z chromosome dosage compensation, female Z-linked genes having just over half the expression level of males and autosomal genes. This finding suggests that the Lepidoptera and possibly all female heterogametic taxa lack global dosage compensation, although more species will need to be sampled to confirm this assertion.

  7. De novo sequencing, assembly, and analysis of Iris lactea var. chinensis roots' transcriptome in response to salt stress.

    Science.gov (United States)

    Gu, Chunsun; Xu, Sheng; Wang, Zhiquan; Liu, Liangqin; Zhang, Yongxia; Deng, Yanming; Huang, Suzhen

    2018-04-01

    As a halophyte, Iris lactea var. chinensis (I. lactea var. chinensis) is widely distributed and has good drought and heavy metal resistance. Moreover, it is an excellent ornamental plant. I. lactea var. chinensis has extensive application prospects owing to the global impacts of salinization. To better understand its molecular mechanism involved in salt resistance, the de novo sequencing, assembly, and analysis of I. lactea var. chinensis roots' transcriptome in response to salt-stress conditions was performed. On average, 74.17% of the clean reads were mapped to unigenes. A total of 121,093 unigenes were constructed and 56,398 (46.57%) were annotated. Among these, 13,522 differentially expressed genes (DEGs) were identified between salt-treated and control samples Compared to the transcriptional level of control, 7037 DEGs were up-regulated and 6539 down-regulated. In addition, 129 up-regulated and 1609 down-regulated genes were simultaneously detected in all three pairwise comparisons between control and salt-stressed libraries. At least 247 and 250 DEGs encoding transcription factors and transporter proteins were identified. Meanwhile, 130 DEGs regarding reactive oxygen species (ROS) scavenging system were also summarized. Based on real-time quantitative RT-PCR, we verified the changes in the expression patterns of 10 unigenes. Our study identified potential salt-responsive candidate genes and increased the understanding of halophyte responses to salinity stress. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  8. Evaluation of de novo assembly technique in the South African abalone Haliotis midae transcriptome: A comparison from Illumina and 454 systems

    Directory of Open Access Journals (Sweden)

    Barbara Picone

    2016-12-01

    Full Text Available Next generation sequencing platforms have recently been used to rapidly characterize transcriptome sequences from a number of non-model organisms. The present study compares two of the most frequently used platforms, the Roche 454-pyrosequencing and the Illumina sequencing-by-synthesis (SBS, on the same RNA sample obtained from an intertidal gastropod mollusc species, Haliotis midae. All the sequencing reads were deposited in the Short Read Archive (SRA database are retrievable under the accession number [SRR071314 (Illumina Genome Analyzer II] and [SRR1737738, SRR1737737, SRR1737735, SRR1737734 (454 GS FLX] in the SRA database of NCBI. Three transcriptomes, composed of either pure 454 or Illumina reads or a mixture of read types (Hybrid, were assembled using CLC Genomics Workbench software. Illumina assemblies performed the best de novo transcriptome characterization in terms of contig length, whereas the 454 assemblies tended to improve the complete assembly of gene transcripts. Both the Hybrid and Illumina assemblies produced longer contigs covering more of the transcriptome than 454 assemblies. However, the addition of 454 significantly increased the number of genes annotated.

  9. Transcriptomic resources for the medicinal legume Mucuna pruriens: de novo transcriptome assembly, annotation, identification and validation of EST-SSR markers.

    Science.gov (United States)

    Sathyanarayana, N; Pittala, Ranjith Kumar; Tripathi, Pankaj Kumar; Chopra, Ratan; Singh, Heikham Russiachand; Belamkar, Vikas; Bhardwaj, Pardeep Kumar; Doyle, Jeff J; Egan, Ashley N

    2017-05-25

    The medicinal legume Mucuna pruriens (L.) DC. has attracted attention worldwide as a source of the anti-Parkinson's drug L-Dopa. It is also a popular green manure cover crop that offers many agronomic benefits including high protein content, nitrogen fixation and soil nutrients. The plant currently lacks genomic resources and there is limited knowledge on gene expression, metabolic pathways, and genetics of secondary metabolite production. Here, we present transcriptomic resources for M. pruriens, including a de novo transcriptome assembly and annotation, as well as differential transcript expression analyses between root, leaf, and pod tissues. We also develop microsatellite markers and analyze genetic diversity and population structure within a set of Indian germplasm accessions. One-hundred ninety-one million two hundred thirty-three thousand two hundred forty-two bp cleaned reads were assembled into 67,561 transcripts with mean length of 626 bp and N50 of 987 bp. Assembled sequences were annotated using BLASTX against public databases with over 80% of transcripts annotated. We identified 7,493 simple sequence repeat (SSR) motifs, including 787 polymorphic repeats between the parents of a mapping population. 134 SSRs from expressed sequenced tags (ESTs) were screened against 23 M. pruriens accessions from India, with 52 EST-SSRs retained after quality control. Population structure analysis using a Bayesian framework implemented in fastSTRUCTURE showed nearly similar groupings as with distance-based (neighbor-joining) and principal component analyses, with most of the accessions clustering per geographical origins. Pair-wise comparison of transcript expression in leaves, roots and pods identified 4,387 differentially expressed transcripts with the highest number occurring between roots and leaves. Differentially expressed transcripts were enriched with transcription factors and transcripts annotated as belonging to secondary metabolite pathways. The M

  10. De Novo Assembly and Transcriptome Analysis of Bulb Onion (Allium cepa L.) during Cold Acclimation Using Contrasting Genotypes.

    Science.gov (United States)

    Han, Jeongsukhyeon; Thamilarasan, Senthil Kumar; Natarajan, Sathishkumar; Park, Jong-In; Chung, Mi-Young; Nou, Ill-Sup

    2016-01-01

    Bulb onion (Allium cepa) is the second most widely cultivated and consumed vegetable crop in the world. During winter, cold injury can limit the production of bulb onion. Genomic resources available for bulb onion are still very limited. To date, no studies on heritably durable cold and freezing tolerance have been carried out in bulb onion genotypes. We applied high-throughput sequencing technology to cold (2°C), freezing (-5 and -15°C), and control (25°C)-treated samples of cold tolerant (CT) and cold susceptible (CS) genotypes of A. cepa lines. A total of 452 million paired-end reads were de novo assembled into 54,047 genes with an average length of 1,331 bp. Based on similarity searches, these genes were aligned with entries in the public non-redundant (nr) database, as well as KEGG and COG database. Differentially expressed genes (DEGs) were identified using log10 values with the FPKM method. Among 5,167DEGs, 491 genes were differentially expressed at freezing temperature compared to the control temperature in both CT and CS libraries. The DEG results were validated with qRT-PCR. We performed GO and KEGG pathway enrichment analyses of all DEGs and iPath interactive analysis found 31 pathways including those related to metabolism of carbohydrate, nucleotide, energy, cofactors and vitamins, other amino acids and xenobiotics biodegradation. Furthermore, a large number of molecular markers were identified from the assembled genes, including simple sequence repeats (SSRs) 4,437 and SNP substitutions of transition and transversion types of CT and CS. Our study is the first to provide a transcriptome sequence resource for Allium spp. with regard to cold and freezing stress. We identified a large set of genes and determined their DEG profiles under cold and freezing conditions using two different genotypes. These data represent a valuable resource for genetic and genomic studies of Allium spp.

  11. De Novo Assembly and Transcriptome Analysis of Bulb Onion (Allium cepa L. during Cold Acclimation Using Contrasting Genotypes.

    Directory of Open Access Journals (Sweden)

    Jeongsukhyeon Han

    Full Text Available Bulb onion (Allium cepa is the second most widely cultivated and consumed vegetable crop in the world. During winter, cold injury can limit the production of bulb onion. Genomic resources available for bulb onion are still very limited. To date, no studies on heritably durable cold and freezing tolerance have been carried out in bulb onion genotypes. We applied high-throughput sequencing technology to cold (2°C, freezing (-5 and -15°C, and control (25°C-treated samples of cold tolerant (CT and cold susceptible (CS genotypes of A. cepa lines. A total of 452 million paired-end reads were de novo assembled into 54,047 genes with an average length of 1,331 bp. Based on similarity searches, these genes were aligned with entries in the public non-redundant (nr database, as well as KEGG and COG database. Differentially expressed genes (DEGs were identified using log10 values with the FPKM method. Among 5,167DEGs, 491 genes were differentially expressed at freezing temperature compared to the control temperature in both CT and CS libraries. The DEG results were validated with qRT-PCR. We performed GO and KEGG pathway enrichment analyses of all DEGs and iPath interactive analysis found 31 pathways including those related to metabolism of carbohydrate, nucleotide, energy, cofactors and vitamins, other amino acids and xenobiotics biodegradation. Furthermore, a large number of molecular markers were identified from the assembled genes, including simple sequence repeats (SSRs 4,437 and SNP substitutions of transition and transversion types of CT and CS. Our study is the first to provide a transcriptome sequence resource for Allium spp. with regard to cold and freezing stress. We identified a large set of genes and determined their DEG profiles under cold and freezing conditions using two different genotypes. These data represent a valuable resource for genetic and genomic studies of Allium spp.

  12. De novo transcriptome assembly and comparative analysis of differentially expressed genes in Prunus dulcis Mill. in response to freezing stress.

    Directory of Open Access Journals (Sweden)

    Sadegh Mousavi

    Full Text Available Almond (Prunus dulcis Mill., one of the most important nut crops, requires chilling during winter to develop fruiting buds. However, early spring chilling and late spring frost may damage the reproductive tissues leading to reduction in the rate of productivity. Despite the importance of transcriptional changes and regulation, little is known about the almond's transcriptome under the cold stress conditions. In the current research, we used RNA-seq technique to study the response of the reproductive tissues of almond (anther and ovary to frost stress. RNA sequencing resulted in more than 20 million reads from anther and ovary tissues of almond, individually. About 40,000 contigs were assembled and annotated de novo in each tissue. Profile of gene expression in ovary showed significant alterations in 5,112 genes, whereas in anther 6,926 genes were affected by freezing stress. Around two thousands of these genes were common altered genes in both ovary and anther libraries. Gene ontology indicated the involvement of differentially expressed (DE genes, responding to freezing stress, in metabolic and cellular processes. qRT-PCR analysis verified the expression pattern of eight genes randomly selected from the DE genes. In conclusion, the almond gene index assembled in this study and the reported DE genes can provide great insights on responses of almond and other Prunus species to abiotic stresses. The obtained results from current research would add to the limited available information on almond and Rosaceae. Besides, the findings would be very useful for comparative studies as the number of DE genes reported here is much higher than that of any previous reports in this plant.

  13. De novo transcriptome assembly and comparative analysis of differentially expressed genes in Prunus dulcis Mill. in response to freezing stress.

    Science.gov (United States)

    Mousavi, Sadegh; Alisoltani, Arghavan; Shiran, Behrouz; Fallahi, Hossein; Ebrahimie, Esameil; Imani, Ali; Houshmand, Saadollah

    2014-01-01

    Almond (Prunus dulcis Mill.), one of the most important nut crops, requires chilling during winter to develop fruiting buds. However, early spring chilling and late spring frost may damage the reproductive tissues leading to reduction in the rate of productivity. Despite the importance of transcriptional changes and regulation, little is known about the almond's transcriptome under the cold stress conditions. In the current research, we used RNA-seq technique to study the response of the reproductive tissues of almond (anther and ovary) to frost stress. RNA sequencing resulted in more than 20 million reads from anther and ovary tissues of almond, individually. About 40,000 contigs were assembled and annotated de novo in each tissue. Profile of gene expression in ovary showed significant alterations in 5,112 genes, whereas in anther 6,926 genes were affected by freezing stress. Around two thousands of these genes were common altered genes in both ovary and anther libraries. Gene ontology indicated the involvement of differentially expressed (DE) genes, responding to freezing stress, in metabolic and cellular processes. qRT-PCR analysis verified the expression pattern of eight genes randomly selected from the DE genes. In conclusion, the almond gene index assembled in this study and the reported DE genes can provide great insights on responses of almond and other Prunus species to abiotic stresses. The obtained results from current research would add to the limited available information on almond and Rosaceae. Besides, the findings would be very useful for comparative studies as the number of DE genes reported here is much higher than that of any previous reports in this plant.

  14. SSP: an interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads.

    Science.gov (United States)

    Safikhani, Zhaleh; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz

    2013-01-01

    Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp. © 2013.

  15. De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

    Science.gov (United States)

    Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

    2015-01-01

    Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.

  16. De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

    Directory of Open Access Journals (Sweden)

    Feng-Yun Xie

    Full Text Available Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR protein database. We also compared the donkey protein sequences with those of the horse (E. caballus and wild horse (E. przewalskii, and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.

  17. De novo transcriptome assembly, functional annotation and differential gene expression analysis of juvenile and adult E. fetida, a model oligochaete used in ecotoxicological studies

    Directory of Open Access Journals (Sweden)

    Michelle Thunders

    Full Text Available Abstract Background Earthworms are sensitive to toxic chemicals present in the soil and so are useful indicator organisms for soil health. Eisenia fetida are commonly used in ecotoxicological studies; therefore the assembly of a baseline transcriptome is important for subsequent analyses exploring the impact of toxin exposure on genome wide gene expression. Results This paper reports on the de novo transcriptome assembly of E. fetida using Trinity, a freely available software tool. Trinotate was used to carry out functional annotation of the Trinity generated transcriptome file and the transdecoder generated peptide sequence file along with BLASTX, BLASTP and HMMER searches and were loaded into a Sqlite3 database. To identify differentially expressed transcripts; each of the original sequence files were aligned to the de novo assembled transcriptome using Bowtie and then RSEM was used to estimate expression values based on the alignment. EdgeR was used to calculate differential expression between the two conditions, with an FDR corrected P value cut off of 0.001, this returned six significantly differentially expressed genes. Initial BLASTX hits of these putative genes included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes. Conclusion These data have been made publicly available, and to our knowledge represent the most comprehensive available transcriptome for E. fetida assembled from RNA sequencing data. This provides important groundwork for subsequent ecotoxicogenomic studies exploring the impact of the environment on global gene expression in E. fetida and other earthworm species.

  18. De novo assembly of the perennial ryegrass transcriptome using an RNA-seq strategy

    DEFF Research Database (Denmark)

    Farrell, Jacqueline Danielle; Byrne, Stephen; Paina, Cristiana

    2014-01-01

    a homozygous perennial ryegrass genotype can circumvent the challenge of heterozygosity. The goals of this study were to perform RNA-sequencing on multiple tissues from a highly inbred genotype to develop a reference transcriptome. This was complemented with RNA-sequencing of a highly heterozygous genotype...... for SNP calling. Result De novo transcriptome assembly of the inbred genotype created 185,833 transcripts with an average length of 830 base pairs. Within the inbred reference transcriptome 78,560 predicted open reading frames were found of which 24,434 were predicted as complete. Functional annotation...... multiple orthologs. Using the longest unique open reading frames as the reference sequence, 64,242 single nucleotide polymorphisms were found. One thousand sixty one open reading frames from the inbred genotype contained heterozygous sites, confirming the high degree of homozygosity. Conclusion Our study...

  19. Gene expression patterns regulating embryogenesis based on the integrated de novo transcriptome assembly of the Japanese flounder.

    Science.gov (United States)

    Fu, Yuanshuai; Jia, Liang; Shi, Zhiyi; Zhang, Junling; Li, Wenjuan

    2017-06-01

    The Japanese flounder (Paralichthys olivaceus) is one of the most important commercial and biological marine fishes. However, the molecular biology involved during embryogenesis and early development of the Japanese flounder remains largely unknown due to a lack of genomic resources. A comprehensive and integrated transcriptome is necessary to study the molecular mechanisms of early development and to allow for the detailed characterization of gene expression patterns during embryogenesis; this approach is critical to understanding the processes that occur prior to mesectoderm formation during early embryonic development. In this study, more than 117.8 million 100bp PE reads were generated from pooled RNA extracted from unfertilized eggs to 41dph (days post-hatching) embryos and were sequenced using Illumina pair-end sequencing technology. In total, 121,513 transcripts (≥200bp) were obtained using de novo assembly. A sequence similarity search indicated that 52,338 transcripts show significant similarity to 22,462 known proteins from the NCBI non-redundant database and the Swiss-Prot protein database and were annotated using Blast2GO. GO terms were assigned to 44,627 transcripts with 12,006 functional terms, and 10,024 transcripts were assigned to 133 KEGG pathways. Furthermore, gene expression differences between the unfertilized egg and the gastrula embryo were analysed using Illumina RNA-Seq with single-read sequencing technology, and 24,837 differentially and specifically expressed transcripts were identified and included 5,286 annotated transcripts and 19,569 non-annotated transcripts. All of the expressed transcripts in the unfertilized egg and gastrula embryo were further classified as maternal, zygotic, or maternal-zygotic transcripts, which may help us to understand the roles of these transcripts during the embryonic development of the Japanese flounder. Thus, the results will contribute to an improved understanding of the gene expression patterns and

  20. De novo Assembly of the Camellia nitidissima Transcriptome Reveals Key Genes of Flower Pigment Biosynthesis

    Directory of Open Access Journals (Sweden)

    Xingwen Zhou

    2017-09-01

    Full Text Available The golden camellia, Camellia nitidissima Chi., is a well-known ornamental plant that is known as “the queen of camellias” because of its golden yellow flowers. The principal pigments in the flowers are carotenoids and flavonol glycosides. Understanding the biosynthesis of the golden color and its regulation is important in camellia breeding. To obtain a comprehensive understanding of flower development in C. nitidissima, a number of cDNA libraries were independently constructed during flower development. Using the Illumina Hiseq2500 platform, approximately 71.8 million raw reads (about 10.8 gigabase pairs were obtained and assembled into 583,194 transcripts and 466, 594 unigenes. A differentially expressed genes (DEGs and co-expression network was constructed to identify unigenes correlated with flower color. The analysis of DEGs and co-expressed network involved in the carotenoid pathway indicated that the biosynthesis of carotenoids is regulated mainly at the transcript level and that phytoene synthase (PSY, β -carotene 3-hydroxylase (CrtZ, and capsanthin synthase (CCS1 exert synergistic effects in carotenoid biosynthesis. The analysis of DEGs and co-expressed network involved in the flavonoid pathway indicated that chalcone synthase (CHS, naringenin 3-dioxygenase (F3H, leucoanthocyanidin dioxygenase(ANS, and flavonol synthase (FLS play critical roles in regulating the formation of flavonols and anthocyanidin. Based on the gene expression analysis of the carotenoid and flavonoid pathways, and determinations of the pigments, we speculate that the high expression of PSY and CrtZ ensures the production of adequate levels of carotenoids, while the expression of CHS, FLS ensures the production of flavonols. The golden yellow color is then the result of the accumulation of carotenoids and flavonol glucosides in the petals. This study of the mechanism of color formation in golden camellia points the way to breeding strategies that exploit gene

  1. Analysis of insecticide resistance-related genes of the Carmine spider mite Tetranychus cinnabarinus based on a de novo assembled transcriptome.

    Science.gov (United States)

    Xu, Zhifeng; Zhu, Wenyi; Liu, Yanchao; Liu, Xing; Chen, Qiushuang; Peng, Miao; Wang, Xiangzun; Shen, Guangmao; He, Lin

    2014-01-01

    The carmine spider mite (CSM), Tetranychus cinnabarinus, is an important pest mite in agriculture, because it can develop insecticide resistance easily. To gain valuable gene information and molecular basis for the future insecticide resistance study of CSM, the first transcriptome analysis of CSM was conducted. A total of 45,016 contigs and 25,519 unigenes were generated from the de novo transcriptome assembly, and 15,167 unigenes were annotated via BLAST querying against current databases, including nr, SwissProt, the Clusters of Orthologous Groups (COGs), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). Aligning the transcript to Tetranychus urticae genome, the 19255 (75.45%) of the transcripts had significant (e-value insecticide resistance in arthropod were generated from CSM transcriptome, including 53 P450-, 22 GSTs-, 23 CarEs-, 1 AChE-, 7 GluCls-, 9 nAChRs-, 8 GABA receptor-, 1 sodium channel-, 6 ATPase- and 12 Cyt b genes. We developed significant molecular resources for T. cinnabarinus putatively involved in insecticide resistance. The transcriptome assembly analysis will significantly facilitate our study on the mechanism of adapting environmental stress (including insecticide) in CSM at the molecular level, and will be very important for developing new control strategies against this pest mite.

  2. De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

    Science.gov (United States)

    Marchant, A; Mougel, F; Almeida, C; Jacquin-Joly, E; Costa, J; Harry, M

    2015-04-01

    High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.

  3. De novo transcriptome assembly facilitates characterisation of fast-evolving gene families, MHC class I in the bank vole (Myodes glareolus).

    Science.gov (United States)

    Migalska, M; Sebastian, A; Konczal, M; Kotlík, P; Radwan, J

    2017-04-01

    The major histocompatibility complex (MHC) plays a central role in the adaptive immune response and is the most polymorphic gene family in vertebrates. Although high-throughput sequencing has increasingly been used for genotyping families of co-amplifying MHC genes, its potential to facilitate early steps in the characterisation of MHC variation in nonmodel organism has not been fully explored. In this study we evaluated the usefulness of de novo transcriptome assembly in characterisation of MHC sequence diversity. We found that although de novo transcriptome assembly of MHC I genes does not reconstruct sequences of individual alleles, it does allow the identification of conserved regions for PCR primer design. Using the newly designed primers, we characterised MHC I sequences in the bank vole. Phylogenetic analysis of the partial MHC I coding sequence (2-4 exons) of the bank vole revealed a lack of orthology to MHC I of other Cricetidae, consistent with the high gene turnover of this region. The diversity of expressed alleles was characterised using ultra-deep sequencing of the third exon that codes for the peptide-binding region of the MHC molecule. High allelic diversity was demonstrated, with 72 alleles found in 29 individuals. Interindividual variation in the number of expressed loci was found, with the number of alleles per individual ranging from 5 to 14. Strong signatures of positive selection were found for 8 amino acid sites, most of which are inferred to bind antigens in human MHC, indicating conservation of structure despite rapid sequence evolution.

  4. Data of first de-novo transcriptome assembly of a non-model species, hawksbill sea turtle, Eretmochelys imbricate, nesting of the Colombian Caribean.

    Science.gov (United States)

    Hernández-Fernández, Javier

    2017-12-01

    The hawksbill sea turtle, Eretmochelys imbricata, is an endangered species of the Caribbean Colombian coast due to anthropic and natural factors that have decreased their population levels. Little is known about the genes that are involved in their immune system, sex determination, aging and others important functions. The data generated represents RNA sequencing and the first de-novo assembly of transcripts expressed in the blood of the hawksbill sea turtle. The raw FASTQ files were deposited in the NCBI SRA database with accession number SRX2653641. A total of 5.7 Gb raw sequence data were obtained, corresponding to 47,555,108 raw reads. Trinity was used to perform a first de-novo assembly, and we were able to identify 47,586 transcripts of the female hawksbill turtle transcriptome with an N50 of 1100 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species.

  5. Data of first de-novo transcriptome assembly of a non-model species, hawksbill sea turtle, Eretmochelys imbricate, nesting of the Colombian Caribean

    Directory of Open Access Journals (Sweden)

    Javier Hernández-Fernández

    2017-12-01

    Full Text Available The hawksbill sea turtle, Eretmochelys imbricata, is an endangered species of the Caribbean Colombian coast due to anthropic and natural factors that have decreased their population levels. Little is known about the genes that are involved in their immune system, sex determination, aging and others important functions. The data generated represents RNA sequencing and the first de-novo assembly of transcripts expressed in the blood of the hawksbill sea turtle. The raw FASTQ files were deposited in the NCBI SRA database with accession number SRX2653641. A total of 5.7 Gb raw sequence data were obtained, corresponding to 47,555,108 raw reads. Trinity was used to perform a first de-novo assembly, and we were able to identify 47,586 transcripts of the female hawksbill turtle transcriptome with an N50 of 1100 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species. Keywords: Hawksbill turtle, Trinity, RNAseq, illumina, N50

  6. Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii: The Identification of Genes and Markers Associated with Reproduction

    Directory of Open Access Journals (Sweden)

    Hyungtaek Jung

    2016-05-01

    Full Text Available The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.

  7. Sequencing and de novo assembly of the Asian clam (Corbicula fluminea transcriptome using the Illumina GAIIx method.

    Directory of Open Access Journals (Sweden)

    Huihui Chen

    Full Text Available BACKGROUND: The Asian clam (Corbicula fluminea is currently one of the most economically important aquatic species in China and has been used as a test organism in many environmental studies. However, the lack of genomic resources, such as sequenced genome, expressed sequence tags (ESTs and transcriptome sequences has hindered the research on C. fluminea. Recent advances in large-scale RNA-Seq enable generation of genomic resources in a short time, and provide large expression datasets for functional genomic analysis. METHODOLOGY/PRINCIPAL FINDINGS: We used a next-generation high-throughput DNA sequencing technique with an Illumina GAIIx method to analyze the transcriptome from the whole bodies of C. fluminea. More than 62,250,336 high-quality reads were generated based on the raw data, and 134,684 unigenes with a mean length of 791 bp were assembled using the Velvet and Oases software. All of the assembly unigenes were annotated by running BLASTx and BLASTn similarity searches on the Nt, Nr, Swiss-Prot, COG and KEGG databases. In addition, the Clusters of Orthologous Groups (COGs, Gene Ontology (GO terms and Kyoto Encyclopedia of Gene and Genome (KEGG annotations were also assigned to each unigene transcript. To provide a preliminary verification of the assembly and annotation results, and search for potential environmental pollution biomarkers, 15 functional genes (five antioxidase genes, two cytochrome P450 genes, three GABA receptor-related genes and five heat shock protein genes were cloned and identified. Expressions of the 15 selected genes following fluoxetine exposure confirmed that the genes are indeed linked to environmental stress. CONCLUSIONS/SIGNIFICANCE: The C. fluminea transcriptome advances the underlying molecular understanding of this freshwater clam, provides a basis for further exploration of C. fluminea as an environmental test organism and promotes further studies on other bivalve organisms.

  8. De novo transcriptome assembly and quantification reveal differentially expressed genes between soft-seed and hard-seed pomegranate (Punica granatum L..

    Directory of Open Access Journals (Sweden)

    Hui Xue

    Full Text Available Pomegranate (Punica granatum L. belongs to Punicaceae, and is valued for its social, ecological, economic, and aesthetic values, as well as more recently for its health benefits. The 'Tunisia' variety has softer seeds and big arils that are easily swallowed. It is a widely popular fruit; however, the molecular mechanisms of the formation of hard and soft seeds is not yet clear. We conducted a de novo assembly of the seed transcriptome in P. granatum L. and revealed differential gene expression between the soft-seed and hard-seed pomegranate varieties. A total of 35.1 Gb of data were acquired in this study, including 280,881,106 raw reads. Additionally, de novo transcriptome assembly generated 132,287 transcripts and 105,743 representative unigenes; approximately 13,805 unigenes (37.7% were longer than 1,000 bp. Using bioinformatics annotation libraries, a total of 76,806 unigenes were annotated and, among the high-quality reads, 72.63% had at least one significant match to an existing gene model. Gene expression and differentially expressed genes were analyzed. The seed formation of the two pomegranate cultivars involves lignin biosynthesis and metabolism, including some genes encoding laccase and peroxidase, WRKY, MYB, and NAC transcription factors. In the hard-seed pomegranate, lignin-related genes and cellulose synthesis-related genes were highly expressed; in soft-seed pomegranates, expression of genes related to flavonoids and programmed cell death was slightly higher. We validated selection of the identified genes using qRT-PCR. This is the first transcriptome analysis of P. granatum L. This transcription sequencing greatly enriched the pomegranate molecular database, and the high-quality SSRs generated in this study will aid the gene cloning from pomegranate in the future. It provides important insights into the molecular mechanisms underlying the formation of soft seeds in pomegranate.

  9. De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

    Directory of Open Access Journals (Sweden)

    Duan Gui

    Full Text Available BACKGROUND: The Indo-Pacific humpback dolphin (Sousa chinensis, a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. PRINCIPAL FINDINGS: We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5, respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. CONCLUSION: This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

  10. De novo transcriptome assembly and quantification reveal differentially expressed genes between soft-seed and hard-seed pomegranate (Punica granatum L.).

    Science.gov (United States)

    Xue, Hui; Cao, Shangyin; Li, Haoxian; Zhang, Jie; Niu, Juan; Chen, Lina; Zhang, Fuhong; Zhao, Diguang

    2017-01-01

    Pomegranate (Punica granatum L.) belongs to Punicaceae, and is valued for its social, ecological, economic, and aesthetic values, as well as more recently for its health benefits. The 'Tunisia' variety has softer seeds and big arils that are easily swallowed. It is a widely popular fruit; however, the molecular mechanisms of the formation of hard and soft seeds is not yet clear. We conducted a de novo assembly of the seed transcriptome in P. granatum L. and revealed differential gene expression between the soft-seed and hard-seed pomegranate varieties. A total of 35.1 Gb of data were acquired in this study, including 280,881,106 raw reads. Additionally, de novo transcriptome assembly generated 132,287 transcripts and 105,743 representative unigenes; approximately 13,805 unigenes (37.7%) were longer than 1,000 bp. Using bioinformatics annotation libraries, a total of 76,806 unigenes were annotated and, among the high-quality reads, 72.63% had at least one significant match to an existing gene model. Gene expression and differentially expressed genes were analyzed. The seed formation of the two pomegranate cultivars involves lignin biosynthesis and metabolism, including some genes encoding laccase and peroxidase, WRKY, MYB, and NAC transcription factors. In the hard-seed pomegranate, lignin-related genes and cellulose synthesis-related genes were highly expressed; in soft-seed pomegranates, expression of genes related to flavonoids and programmed cell death was slightly higher. We validated selection of the identified genes using qRT-PCR. This is the first transcriptome analysis of P. granatum L. This transcription sequencing greatly enriched the pomegranate molecular database, and the high-quality SSRs generated in this study will aid the gene cloning from pomegranate in the future. It provides important insights into the molecular mechanisms underlying the formation of soft seeds in pomegranate.

  11. De novo assembly and analysis of the Artemisia argyi transcriptome and identification of genes involved in terpenoid biosynthesis.

    Science.gov (United States)

    Liu, Miaomiao; Zhu, Jinhang; Wu, Shengbing; Wang, Chenkai; Guo, Xingyi; Wu, Jiawen; Zhou, Meiqi

    2018-04-11

    Artemisia argyi Lev. et Vant. (A. argyi) is widely utilized for moxibustion in Chinese medicine, and the mechanism underlying terpenoid biosynthesis in its leaves is suggested to play an important role in its medicinal use. However, the A. argyi transcriptome has not been sequenced. Herein, we performed RNA sequencing for A. argyi leaf, root and stem tissues to identify as many as possible of the transcribed genes. In total, 99,807 unigenes were assembled by analysing the expression profiles generated from the three tissue types, and 67,446 of those unigenes were annotated in public databases. We further performed differential gene expression analysis to compare leaf tissue with the other two tissue types and identified numerous genes that were specifically expressed or up-regulated in leaf tissue. Specifically, we identified multiple genes encoding significant enzymes or transcription factors related to terpenoid synthesis. This study serves as a valuable resource for transcriptome information, as many transcribed genes related to terpenoid biosynthesis were identified in the A. argyi transcriptome, providing a functional genomic basis for additional studies on molecular mechanisms underlying the medicinal use of A. argyi.

  12. De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

    Science.gov (United States)

    Gui, Duan; Jia, Kuntong; Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

    2013-01-01

    The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-valueIndo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

  13. Transcriptome Analysis of the Emerald Ash Borer (EAB), Agrilus planipennis: De Novo Assembly, Functional Annotation and Comparative Analysis.

    Science.gov (United States)

    Duan, Jun; Ladd, Tim; Doucet, Daniel; Cusson, Michel; vanFrankenhuyzen, Kees; Mittapalli, Omprakash; Krell, Peter J; Quan, Guoxing

    2015-01-01

    The Emerald ash borer (EAB), Agrilus planipennis, is an invasive phloem-feeding insect pest of ash trees. Since its initial discovery near the Detroit, US- Windsor, Canada area in 2002, the spread of EAB has had strong negative economic, social and environmental impacts in both countries. Several transcriptomes from specific tissues including midgut, fat body and antenna have recently been generated. However, the relatively low sequence depth, gene coverage and completeness limited the usefulness of these EAB databases. High-throughput deep RNA-Sequencing (RNA-Seq) was used to obtain 473.9 million pairs of 100 bp length paired-end reads from various life stages and tissues. These reads were assembled into 88,907 contigs using the Trinity strategy and integrated into 38,160 unigenes after redundant sequences were removed. We annotated 11,229 unigenes by searching against the public nr, Swiss-Prot and COG. The EAB transcriptome assembly was compared with 13 other sequenced insect species, resulting in the prediction of 536 unigenes that are Coleoptera-specific. Differential gene expression revealed that 290 unigenes are expressed during larval molting and 3,911 unigenes during metamorphosis from larvae to pupae, respectively (FDR2). In addition, 1,167 differentially expressed unigenes were identified from larval and adult midguts, 435 unigenes were up-regulated in larval midgut and 732 unigenes were up-regulated in adult midgut. Most of the genes involved in RNA interference (RNAi) pathways were identified, which implies the existence of a system RNAi in EAB. This study provides one of the most fundamental and comprehensive transcriptome resources available for EAB to date. Identification of the tissue- stage- or species- specific unigenes will benefit the further study of gene functions during growth and metamorphosis processes in EAB and other pest insects.

  14. De novo assembly of the transcriptome of Aegiceras corniculatum, a mangrove species in the Indo-West Pacific region.

    Science.gov (United States)

    Fang, Lu; Yang, Yuchen; Guo, Wuxia; Li, Jianfang; Zhong, Cairong; Huang, Yelin; Zhou, Renchao; Shi, Suhua

    2016-08-01

    Aegiceras corniculatum (L.) Blanco is one of the most salt tolerant mangrove species and can thrive in 3% salinity at the seaward edge of mangrove forests. Here we sequenced the transcriptome of A. corniculatum used Illumina GA platform to develop its genomic resources for ecological and evolutionary studies. We obtained about 50 million high-quality paired-end reads with 75bp in length. Using the short read assembler Velvet, we yielded 49,437 contigs with the average length of 625bp. A total of 32,744 (66.23%) contigs showed significant similarity to the GenBank non-redundant (NR) protein database. 30,911 and 18,004 of these sequences were assigned to Gene Ontology and eukaryotic orthologous groups of proteins (KOG). A total of 4942 transcripts from our assemblies had significant similarity with KEGG Orthologs and were involved in 144 KEGG pathways, while 9899 unigenes had enzyme commission (EC) numbers. In addition, 9792 transcriptome-derived SSRs were identified from 7342 sequences. With our strict criteria, 4165 candidate SNPs were also identified from 2058 contigs. Some of these SNPs were further validated by Sanger sequencing. Genomic resources generated in this study should be valuable in ecological, evolutionary, and functional genomics studies for this mangrove species. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Analysis of insecticide resistance-related genes of the Carmine spider mite Tetranychus cinnabarinus based on a de novo assembled transcriptome.

    Directory of Open Access Journals (Sweden)

    Zhifeng Xu

    Full Text Available The carmine spider mite (CSM, Tetranychus cinnabarinus, is an important pest mite in agriculture, because it can develop insecticide resistance easily. To gain valuable gene information and molecular basis for the future insecticide resistance study of CSM, the first transcriptome analysis of CSM was conducted. A total of 45,016 contigs and 25,519 unigenes were generated from the de novo transcriptome assembly, and 15,167 unigenes were annotated via BLAST querying against current databases, including nr, SwissProt, the Clusters of Orthologous Groups (COGs, Kyoto Encyclopedia of Genes and Genomes (KEGG and Gene Ontology (GO. Aligning the transcript to Tetranychus urticae genome, the 19255 (75.45% of the transcripts had significant (e-value <10-5 matches to T. urticae DNA genome, 19111 sequences matched to T. urticae proteome with an average protein length coverage of 42.55%. Core Eukaryotic Genes Mapping Approach (CEGMA analysis identified 435 core eukaryotic genes (CEGs in the CSM dataset corresponding to 95% coverage. Ten gene categories that relate to insecticide resistance in arthropod were generated from CSM transcriptome, including 53 P450-, 22 GSTs-, 23 CarEs-, 1 AChE-, 7 GluCls-, 9 nAChRs-, 8 GABA receptor-, 1 sodium channel-, 6 ATPase- and 12 Cyt b genes. We developed significant molecular resources for T. cinnabarinus putatively involved in insecticide resistance. The transcriptome assembly analysis will significantly facilitate our study on the mechanism of adapting environmental stress (including insecticide in CSM at the molecular level, and will be very important for developing new control strategies against this pest mite.

  16. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery.

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata

  17. De Novo Assembly and Characterization of the Transcriptome of the Parasitic Weed Dodder Identifies Genes Associated with Plant Parasitism1[C][W][OPEN

    Science.gov (United States)

    Ranjan, Aashish; Ichihashi, Yasunori; Farhi, Moran; Zumstein, Kristina; Townsley, Brad; David-Schwartz, Rakefet; Sinha, Neelima R.

    2014-01-01

    Parasitic flowering plants are one of the most destructive agricultural pests and have major impact on crop yields throughout the world. Being dependent on finding a host plant for growth, parasitic plants penetrate their host using specialized organs called haustoria. Haustoria establish vascular connections with the host, which enable the parasite to steal nutrients and water. The underlying molecular and developmental basis of parasitism by plants is largely unknown. In order to investigate the process of parasitism, RNAs from different stages (i.e. seed, seedling, vegetative strand, prehaustoria, haustoria, and flower) were used to de novo assemble and annotate the transcriptome of the obligate plant stem parasite dodder (Cuscuta pentagona). The assembled transcriptome was used to dissect transcriptional dynamics during dodder development and parasitism and identified key gene categories involved in the process of plant parasitism. Host plant infection is accompanied by increased expression of parasite genes underlying transport and transporter categories, response to stress and stimuli, as well as genes encoding enzymes involved in cell wall modifications. By contrast, expression of photosynthetic genes is decreased in the dodder infective stages compared with normal stem. In addition, genes relating to biosynthesis, transport, and response of phytohormones, such as auxin, gibberellins, and strigolactone, were differentially expressed in the dodder infective stages compared with stems and seedlings. This analysis sheds light on the transcriptional changes that accompany plant parasitism and will aid in identifying potential gene targets for use in controlling the infestation of crops by parasitic weeds. PMID:24399359

  18. De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

    Science.gov (United States)

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori

    2014-09-04

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. Copyright © 2014 Huang et al.

  19. De Novo Assembly and Transcriptome Analysis of Wheat with Male Sterility Induced by the Chemical Hybridizing Agent SQ-1.

    Directory of Open Access Journals (Sweden)

    Qidi Zhu

    Full Text Available Wheat (Triticum aestivum L., one of the world's most important food crops, is a strictly autogamous (self-pollinating species with exclusively perfect flowers. Male sterility induced by chemical hybridizing agents has increasingly attracted attention as a tool for hybrid seed production in wheat; however, the molecular mechanisms of male sterility induced by the agent SQ-1 remain poorly understood due to limited whole transcriptome data. Therefore, a comparative analysis of wheat anther transcriptomes for male fertile wheat and SQ-1-induced male sterile wheat was carried out using next-generation sequencing technology. In all, 42,634,123 sequence reads were generated and were assembled into 82,356 high-quality unigenes with an average length of 724 bp. Of these, 1,088 unigenes were significantly differentially expressed in the fertile and sterile wheat anthers, including 643 up-regulated unigenes and 445 down-regulated unigenes. The differentially expressed unigenes with functional annotations were mapped onto 60 pathways using the Kyoto Encyclopedia of Genes and Genomes database. They were mainly involved in coding for the components of ribosomes, photosynthesis, respiration, purine and pyrimidine metabolism, amino acid metabolism, glutathione metabolism, RNA transport and signal transduction, reactive oxygen species metabolism, mRNA surveillance pathways, protein processing in the endoplasmic reticulum, protein export, and ubiquitin-mediated proteolysis. This study is the first to provide a systematic overview comparing wheat anther transcriptomes of male fertile wheat with those of SQ-1-induced male sterile wheat and is a valuable source of data for future research in SQ-1-induced wheat male sterility.

  20. De novo assembly and characterization of the spleen transcriptome of common carp (Cyprinus carpio) using Illumina paired-end sequencing.

    Science.gov (United States)

    Li, Guoxi; Zhao, Yinli; Liu, Zhonghu; Gao, Chunsheng; Yan, Fengbin; Liu, Bianzhi; Feng, Jianxin

    2015-06-01

    Common carp (Cyprinus carpio) is one of the most important aquacultured species of the family Cyprinidae, and breeding this species for disease resistance is becoming more and more important. However, at the genome or transcriptome levels, study of the immunogenetics of disease resistance in the common carp is lacking. In this study, 60,316,906 and 75,200,328 paired-end clean reads were obtained from two cDNA libraries of the common carp spleen by Illumina paired-end sequencing technology. Totally, 130,293 unique transcript fragments (unigenes) were assembled, with an average length of 1400.57 bp. Approximately 105,612 (81.06%) unigenes could be annotated according to their homology with matches in the Nr, Nt, Swiss-Prot, COG, GO, or KEGG databases, and they were found to represent 46,747 non-redundant genes. Comparative analysis showed that 59.82% of the unigenes have significant similarity to zebrafish Refseq proteins. Gene expression comparison revealed that 10,432 and 6889 annotated unigenes were, respectively, up- and down-regulated with at least twofold changes between two developmental stages of the common carp spleen. Gene ontology and KEGG analysis were performed to classify all unigenes into functional categories for understanding gene functions and regulation pathways. In addition, 46,847 simple sequence repeats (SSRs) were detected from 35,618 unigenes, and a large number of single nucleotide polymorphism (SNP) and insertion/deletion (INDEL) sites were identified in the spleen transcriptome of common carp. This study has characterized the spleen transcriptome of the common carp for the first time, providing a valuable resource for a better understanding of the common carp immune system and defense mechanisms. This knowledge will also facilitate future functional studies on common carp immunogenetics that may eventually be applied in breeding programs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  2. De Novo assembly of the Japanese flounder (Paralichthys olivaceus spleen transcriptome to identify putative genes involved in immunity.

    Directory of Open Access Journals (Sweden)

    Lin Huang

    Full Text Available Japanese flounder (Paralichthys olivaceus is an economically important marine fish in Asia and has suffered from disease outbreaks caused by various pathogens, which requires more information for immune relevant genes on genome background. However, genomic and transcriptomic data for Japanese flounder remain scarce, which limits studies on the immune system of this species. In this study, we characterized the Japanese flounder spleen transcriptome using an Illumina paired-end sequencing platform to identify putative genes involved in immunity.A cDNA library from the spleen of P. olivaceus was constructed and randomly sequenced using an Illumina technique. The removal of low quality reads generated 12,196,968 trimmed reads, which assembled into 96,627 unigenes. A total of 21,391 unigenes (22.14% were annotated in the NCBI Nr database, and only 1.1% of the BLASTx top-hits matched P. olivaceus protein sequences. Approximately 12,503 (58.45% unigenes were categorized into three Gene Ontology groups, 19,547 (91.38% were classified into 26 Cluster of Orthologous Groups, and 10,649 (49.78% were assigned to six Kyoto Encyclopedia of Genes and Genomes pathways. Furthermore, 40,928 putative simple sequence repeats and 47, 362 putative single nucleotide polymorphisms were identified. Importantly, we identified 1,563 putative immune-associated unigenes that mapped to 15 immune signaling pathways.The P. olivaceus transciptome data provides a rich source to discover and identify new genes, and the immune-relevant sequences identified here will facilitate our understanding of the mechanisms involved in the immune response. Furthermore, the plentiful potential SSRs and SNPs found in this study are important resources with respect to future development of a linkage map or marker assisted breeding programs for the flounder.

  3. De novo Transcriptome Assembly of Chinese Kale and Global Expression Analysis of Genes Involved in Glucosinolate Metabolism in Multiple Tissues

    Science.gov (United States)

    Wu, Shuanghua; Lei, Jianjun; Chen, Guoju; Chen, Hancai; Cao, Bihao; Chen, Changming

    2017-01-01

    Chinese kale, a vegetable of the cruciferous family, is a popular crop in southern China and Southeast Asia due to its high glucosinolate content and nutritional qualities. However, there is little research on the molecular genetics and genes involved in glucosinolate metabolism and its regulation in Chinese kale. In this study, we sequenced and characterized the transcriptomes and expression profiles of genes expressed in 11 tissues of Chinese kale. A total of 216 million 150-bp clean reads were generated using RNA-sequencing technology. From the sequences, 98,180 unigenes were assembled for the whole plant, and 49,582~98,423 unigenes were assembled for each tissue. Blast analysis indicated that a total of 80,688 (82.18%) unigenes exhibited similarity to known proteins. The functional annotation and classification tools used in this study suggested that genes principally expressed in Chinese kale, were mostly involved in fundamental processes, such as cellular and molecular functions, the signal transduction, and biosynthesis of secondary metabolites. The expression levels of all unigenes were analyzed in various tissues of Chinese kale. A large number of candidate genes involved in glucosinolate metabolism and its regulation were identified, and the expression patterns of these genes were analyzed. We found that most of the genes involved in glucosinolate biosynthesis were highly expressed in the root, petiole, and in senescent leaves. The expression patterns of ten glucosinolate biosynthetic genes from RNA-seq were validated by quantitative RT-PCR in different tissues. These results provided an initial and global overview of Chinese kale gene functions and expression activities in different tissues. PMID:28228764

  4. De novo assembly of Eugenia uniflora L. transcriptome and identification of genes from the terpenoid biosynthesis pathway.

    Science.gov (United States)

    Guzman, Frank; Kulcheski, Franceli Rodrigues; Turchetto-Zolet, Andreia Carina; Margis, Rogerio

    2014-12-01

    Pitanga (Eugenia uniflora L.) is a member of the Myrtaceae family and is of particular interest due to its medicinal properties that are attributed to specialized metabolites with known biological activities. Among these molecules, terpenoids are the most abundant in essential oils that are found in the leaves and represent compounds with potential pharmacological benefits. The terpene diversity observed in Myrtaceae is determined by the activity of different members of the terpene synthase and oxidosqualene cyclase families. Therefore, the aim of this study was to perform a de novo assembly of transcripts from E. uniflora leaves and to annotation to identify the genes potentially involved in the terpenoid biosynthesis pathway and terpene diversity. In total, 72,742 unigenes with a mean length of 1048bp were identified. Of these, 43,631 and 36,289 were annotated with the NCBI non-redundant protein and Swiss-Prot databases, respectively. The gene ontology categorized the sequences into 53 functional groups. A metabolic pathway analysis with KEGG revealed 8,625 unigenes assigned to 141 metabolic pathways and 40 unigenes predicted to be associated with the biosynthesis of terpenoids. Furthermore, we identified four putative full-length terpene synthase genes involved in sesquiterpenes and monoterpenes biosynthesis, and three putative full-length oxidosqualene cyclase genes involved in the triterpenes biosynthesis. The expression of these genes was validated in different E. uniflora tissues. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  5. De Novo Assembly and Analysis of Tartary Buckwheat (Fagopyrum tataricum Garetn. Transcriptome Discloses Key Regulators Involved in Salt-Stress Response

    Directory of Open Access Journals (Sweden)

    Qi Wu

    2017-10-01

    Full Text Available Soil salinization has been a tremendous obstacle for agriculture production. The regulatory networks underlying salinity adaption in model plants have been extensively explored. However, limited understanding of the salt response mechanisms has hindered the planting and production in Fagopyrum tataricum, an economic and health-beneficial plant mainly distributing in southwest China. In this study, we performed physiological analysis and found that salt stress of 200 mM NaCl solution significantly affected the relative water content (RWC, electrolyte leakage (EL, malondialdehyde (MDA content, peroxidase (POD and superoxide dismutase (SOD activities in tartary buckwheat seedlings. Further, we conducted transcriptome comparison between control and salt treatment to identify potential regulatory components involved in F. tataricum salt responses. A total of 53.15 million clean reads from control and salt-treated libraries were produced via an Illumina sequencing approach. Then we de novo assembled these reads into a transcriptome dataset containing 57,921 unigenes with N50 length of 1400 bp and total length of 44.5 Mb. A total of 36,688 unigenes could find matches in public databases. GO, KEGG and KOG classification suggested the enrichment of these unigenes in 56 sub-categories, 25 KOG, and 273 pathways, respectively. Comparison of the transcriptome expression patterns between control and salt treatment unveiled 455 differentially expressed genes (DEGs. Further, we found the genes encoding for protein kinases, phosphatases, heat shock proteins (HSPs, ATP-binding cassette (ABC transporters, glutathione S-transferases (GSTs, abiotic-related transcription factors and circadian clock might be relevant to the salinity adaption of this species. Thus, this study offers an insight into salt tolerance mechanisms, and will serve as useful genetic information for tolerant elite breeding programs in future.

  6. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

    Directory of Open Access Journals (Sweden)

    Alix Armero

    Full Text Available The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L. is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut and a reference species (oil palm to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/.

  7. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

    Science.gov (United States)

    Armero, Alix; Baudouin, Luc; Bocs, Stéphanie; This, Dominique

    2017-01-01

    The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).

  8. Development of novel EST-SSR markers for ploidy identification based on de novo transcriptome assembly for Misgurnus anguillicaudatus.

    Science.gov (United States)

    Feng, Bing; Yi, Soojin V; Zhang, Manman; Zhou, Xiaoyun

    2018-01-01

    The co-existence of several ploidy types in natural populations makes the cyprinid loach Misgurnus anguillicaudatus an exciting model system to study the genetic and phenotypic consequences of ploidy variations. A first step in such effort is to identify the specific ploidy of an individual. Currently popular methods of karyotyping via cytological preparation or flow cytometry require a large amount of tissue (such as blood) samples, which can be damaging or fatal to the fishes. Here, we developed novel microsatellite markers (SSR markers) from M. anguillicaudatus and show that they can effectively discriminate ploidy using samples collected in a minimally invasive way. Specifically, we generated whole genome transcriptomes from multiple M. anguillicaudatus using the Illumina paired-end sequencing. Approximately 150 million raw reads were assembled into 76,544 non-redundant unigenes. A total of 8,194 potential SSR markers were identified. We selected 98 pairs with more than five tandem repeats for further assays. Out of 45 putative EST-SSR markers that successfully amplified and harbored polymorphism in diploids, 11 markers displayed high variability in tetraploids. We further demonstrate that a set of five EST-SSR markers selected from these are sufficient to distinguish ploidy levels, by first validating them on 69 reference specimens with known ploidy levels and then subsequently using fresh-collected 96 ploidy-unknown specimens. The results from EST-SSR markers are highly concordant with those from independent flow cytometry analysis. The novel EST-SSR markers developed here should facilitate genetic studies of polyploidy in the emerging model system M. anguillicaudatus.

  9. Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil. by NGS and de novo transcriptome assembly.

    Directory of Open Access Journals (Sweden)

    Humberto J Debat

    Full Text Available Yerba mate (Ilex paraguariensis A. St.-Hil. is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼ 32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼ 1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge

  10. Exploring the Genes of Yerba Mate (Ilex paraguariensis A. St.-Hil.) by NGS and De Novo Transcriptome Assembly

    Science.gov (United States)

    Aguilera, Patricia M.; Bubillo, Rosana E.; Otegui, Mónica B.; Ducasse, Daniel A.; Zapata, Pedro D.; Marti, Dardo A.

    2014-01-01

    Yerba mate (Ilex paraguariensis A. St.-Hil.) is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge of yerba mate genes

  11. De Novo Transcriptomes of Forsythia koreana Using a Novel Assembly Method: Insight into Tissue- and Species-Specific Expression of Lignan Biosynthesis-Related Gene.

    Directory of Open Access Journals (Sweden)

    Akira Shiraishi

    Full Text Available Forsythia spp. are perennial woody plants which are one of the most extensively used medicinal sources of Chinese medicines and functional diets owing to their lignan contents. Lignans have received widespread attention as leading compounds in the development of antitumor drugs and healthy diets for reducing the risks of lifestyle-related diseases. However, the molecular basis of Forsythia has yet to be established. In this study, we have verified de novo deep transcriptome of Forsythia koreana leaf and callus using the Illumina HiSeq 1500 platform. A total of 89 million reads were assembled into 116,824 contigs using Trinity, and 1,576 of the contigs displayed the sequence similarity to the enzymes responsible for plant specialized metabolism including lignan biosynthesis. Notably, gene ontology (GO analysis indicated the remarkable enrichment of lignan-biosynthetic enzyme genes in the callus transcriptome. Nevertheless, precise annotation and molecular phylogenetic analyses were hindered by partial sequences of open reading frames (ORFs of the Trinity-based contigs. To obtain more numerous contigs harboring a full-length ORF, we developed a novel overlapping layout consensus-based procedure, virtual primer-based sequence reassembly (VP-seq. VP-seq elucidated 709 full-length ORFs, whereas only 146 full-length ORFs were assembled by Trinity. The comparison of expression profiles of leaf and callus using VP-seq-based full-length ORFs revealed 50-fold upregulation of secoisolariciresinol dehydrogenase (SIRD in callus. Expression and phylogenetic cluster analyses predicted candidates for matairesinol-glucosylating enzymes. We also performed VP-seq analysis of lignan-biosynthetic enzyme genes in the transcriptome data of other lignan-rich plants, Linum flavum, Linum usitatissimum and Podophyllum hexandrum. The comparative analysis indicated both common gene clusters involved in biosynthesis upstream of matairesinol such as SIRD and plant lineage

  12. De novo assembly and characterization of global transcriptome of coconut palm (Cocos nucifera L.) embryogenic calli using Illumina paired-end sequencing.

    Science.gov (United States)

    Rajesh, M K; Fayas, T P; Naganeeswaran, S; Rachana, K E; Bhavyashree, U; Sajini, K K; Karun, Anitha

    2016-05-01

    Production and supply of quality planting material is significant to coconut cultivation but is one of the major constraints in coconut productivity. Rapid multiplication of coconut through in vitro techniques, therefore, is of paramount importance. Although somatic embryogenesis in coconut is a promising technique that will allow for the mass production of high quality palms, coconut is highly recalcitrant to in vitro culture. In order to overcome the bottlenecks in coconut somatic embryogenesis and to develop a repeatable protocol, it is imperative to understand, identify, and characterize molecular events involved in coconut somatic embryogenesis pathway. Transcriptome analysis (RNA-Seq) of coconut embryogenic calli, derived from plumular explants of West Coast Tall cultivar, was undertaken on an Illumina HiSeq 2000 platform. After de novo transcriptome assembly and functional annotation, we have obtained 40,367 transcripts which showed significant BLASTx matches with similarity greater than 40 % and E value of ≤10(-5). Fourteen genes known to be involved in somatic embryogenesis were identified. Quantitative real-time PCR (qRT-PCR) analyses of these 14 genes were carried in six developmental stages. The result showed that CLV was upregulated in the initial stage of callogenesis. Transcripts GLP, GST, PKL, WUS, and WRKY were expressed more in somatic embryo stage. The expression of SERK, MAPK, AP2, SAUR, ECP, AGP, LEA, and ANT were higher in the embryogenic callus stage compared to initial culture and somatic embryo stages. This study provides the first insights into the gene expression patterns during somatic embryogenesis in coconut.

  13. NATpipe: an integrative pipeline for systematical discovery of natural antisense transcripts (NATs) and phase-distributed nat-siRNAs from de novo assembled transcriptomes

    Science.gov (United States)

    Yu, Dongliang; Meng, Yijun; Zuo, Ziwei; Xue, Jie; Wang, Huizhong

    2016-01-01

    Nat-siRNAs (small interfering RNAs originated from natural antisense transcripts) are a class of functional small RNA (sRNA) species discovered in both plants and animals. These siRNAs are highly enriched within the annealed regions of the NAT (natural antisense transcript) pairs. To date, great research efforts have been taken for systematical identification of the NATs in various organisms. However, developing a freely available and easy-to-use program for NAT prediction is strongly demanded by researchers. Here, we proposed an integrative pipeline named NATpipe for systematical discovery of NATs from de novo assembled transcriptomes. By utilizing sRNA sequencing data, the pipeline also allowed users to search for phase-distributed nat-siRNAs within the perfectly annealed regions of the NAT pairs. Additionally, more reliable nat-siRNA loci could be identified based on degradome sequencing data. A case study on the non-model plant Dendrobium officinale was performed to illustrate the utility of NATpipe. Finally, we hope that NATpipe would be a useful tool for NAT prediction, nat-siRNA discovery, and related functional studies. NATpipe is available at www.bioinfolab.cn/NATpipe/NATpipe.zip. PMID:26858106

  14. Mango (Mangifera indica L.) cv. Kent fruit mesocarp de novo transcriptome assembly identifies gene families important for ripening

    Science.gov (United States)

    Fruit ripening is a physiological and biochemical process genetically programmed to regulate fruit quality parameters like firmness, flavor, odor and color, as well as production of ethylene in climacteric fruit. In this study, a transcriptomic analysis of mango (Mangifera indica L.) mesocarp cv. "K...

  15. De novo assembly and characterization of the transcriptome of seagrass Zostera marina using Illumina paired-end sequencing.

    Directory of Open Access Journals (Sweden)

    Fanna Kong

    Full Text Available BACKGROUND: The seagrass Zostera marina is a monocotyledonous angiosperm belonging to a polyphyletic group of plants that can live submerged in marine habitats. Zostera marina L. is one of the most common seagrasses and is considered a cornerstone of marine plant molecular ecology research and comparative studies. However, the mechanisms underlying its adaptation to the marine environment still remain poorly understood due to limited transcriptomic and genomic data. PRINCIPAL FINDINGS: Here we explored the transcriptome of Z. marina leaves under different environmental conditions using Illumina paired-end sequencing. Approximately 55 million sequencing reads were obtained, representing 58,457 transcripts that correspond to 24,216 unigenes. A total of 14,389 (59.41% unigenes were annotated by blast searches against the NCBI non-redundant protein database. 45.18% and 46.91% of the unigenes had significant similarity with proteins in the Swiss-Prot database and Pfam database, respectively. Among these, 13,897 unigenes were assigned to 57 Gene Ontology (GO terms and 4,745 unigenes were identified and mapped to 233 pathways via functional annotation against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG. We compared the orthologous gene family of the Z. marina transcriptome to Oryza sativa and Pyropia yezoensis and 11,667 orthologous gene families are specific to Z. marina. Furthermore, we identified the photoreceptors sensing red/far-red light and blue light. Also, we identified a large number of genes that are involved in ion transporters and channels including Na+ efflux, K+ uptake, Cl- channels, and H+ pumping. CONCLUSIONS: Our study contains an extensive sequencing and gene-annotation analysis of Z. marina. This information represents a genetic resource for the discovery of genes related to light sensing and salt tolerance in this species. Our transcriptome can be further utilized in future studies on molecular adaptation to

  16. De novo assembly and functional annotation of Myrciaria dubia fruit transcriptome reveals multiple metabolic pathways for L-ascorbic acid biosynthesis.

    Science.gov (United States)

    Castro, Juan C; Maddox, J Dylan; Cobos, Marianela; Requena, David; Zimic, Mirko; Bombarely, Aureliano; Imán, Sixto A; Cerdeira, Luis A; Medina, Andersson E

    2015-11-24

    Myrciaria dubia is an Amazonian fruit shrub that produces numerous bioactive phytochemicals, but is best known by its high L-ascorbic acid (AsA) content in fruits. Pronounced variation in AsA content has been observed both within and among individuals, but the genetic factors responsible for this variation are largely unknown. The goals of this research, therefore, were to assemble, characterize, and annotate the fruit transcriptome of M. dubia in order to reconstruct metabolic pathways and determine if multiple pathways contribute to AsA biosynthesis. In total 24,551,882 high-quality sequence reads were de novo assembled into 70,048 unigenes (mean length = 1150 bp, N50 = 1775 bp). Assembled sequences were annotated using BLASTX against public databases such as TAIR, GR-protein, FB, MGI, RGD, ZFIN, SGN, WB, TIGR_CMR, and JCVI-CMR with 75.2 % of unigenes having annotations. Of the three core GO annotation categories, biological processes comprised 53.6 % of the total assigned annotations, whereas cellular components and molecular functions comprised 23.3 and 23.1 %, respectively. Based on the KEGG pathway assignment of the functionally annotated transcripts, five metabolic pathways for AsA biosynthesis were identified: animal-like pathway, myo-inositol pathway, L-gulose pathway, D-mannose/L-galactose pathway, and uronic acid pathway. All transcripts coding enzymes involved in the ascorbate-glutathione cycle were also identified. Finally, we used the assembly to identified 6314 genic microsatellites and 23,481 high quality SNPs. This study describes the first next-generation sequencing effort and transcriptome annotation of a non-model Amazonian plant that is relevant for AsA production and other bioactive phytochemicals. Genes encoding key enzymes were successfully identified and metabolic pathways involved in biosynthesis of AsA, anthocyanins, and other metabolic pathways have been reconstructed. The identification of these genes and pathways is in agreement with

  17. Characterization of the 'Xiangshui' lemon transcriptome by de novo assembly to discover genes associated with self-incompatibility.

    Science.gov (United States)

    Zhang, Shuwei; Ding, Feng; He, Xinhua; Luo, Cong; Huang, Guixiang; Hu, Ying

    2015-02-01

    Seedlessness is a desirable character in lemons and other citrus species. Seedless fruit can be induced in many ways, including through self-incompatibility (SI). SI is widely used as an intraspecific reproductive barrier that prevents self-fertilization in flowering plants. Although there have been many studies on SI, its mechanism remains unclear. The 'Xiangshui' lemon is an important seedless cultivar whose seedlessness has been caused by SI. It is essential to identify genes involved in SI in 'Xiangshui' lemon to clarify its molecular mechanism. In this study, candidate genes associated with SI were identified using high-throughput Illumina RNA sequencing (RNA-seq). A total of 61,224 unigenes were obtained (average, 948 bp; N50 of 1,457 bp), among which 47,260 unigenes were annotated by comparison to six public databases (Nr, Nt, Swiss-Prot, KEGG, COG, and GO). Differentially expressed genes were identified by comparing the transcriptomes of no-, self-, and cross-pollinated stigmas with styles of the 'Xiangshui' lemon. Several differentially expressed genes that might be associated with SI were identified, such as those involved in pollen tube growth, programmed cell death, signal transduction, and transcription. NADPH oxidase genes associated with apoptosis were highly upregulated in the self-pollinated transcriptome. The expression pattern of 12 genes was analyzed by quantitative real-time polymerase chain reaction. A putative S-RNase gene was identified that had not been previously associated with self-pollen rejection in lemon or citrus. This study provided a transcriptome dataset for further studies of SI and seedless lemon breeding.

  18. De Novo Transcriptome Assembly and Characterization of the Synthesis Genes of Bioactive Constituents in Abelmoschus esculentus (L.) Moench

    Science.gov (United States)

    Zhang, Chenghao; Dong, Wenqi; Gen, Wei; Xu, Baoyu; Shen, Chenjia

    2018-01-01

    Abelmoschus esculentus (okra or lady’s fingers) is a vegetable with high nutritional value, as well as having certain medicinal effects. It is widely used as food, in the food industry, and in herbal medicinal products, but also as an ornamental, in animal feed, and in other commercial sectors. Okra is rich in bioactive compounds, such as flavonoids, polysaccharides, polyphenols, caffeine, and pectin. In the present study, the concentrations of total flavonoids and polysaccharides in five organs of okra were determined and compared. Transcriptome sequencing was used to explore the biosynthesis pathways associated with the active constituents in okra. Transcriptome sequencing of five organs (roots, stem, leaves, flowers, and fruits) of okra enabled us to obtain 293,971 unigenes, of which 232,490 were annotated. Unigenes related to the enzymes involved in the flavonoid biosynthetic pathway or in fructose and mannose metabolism were identified, based on Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. All of the transcriptional datasets were uploaded to Sequence Read Archive (SRA). In summary, our comprehensive analysis provides important information at the molecular level about the flavonoid and polysaccharide biosynthesis pathways in okra. PMID:29495525

  19. Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus (Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs

    Science.gov (United States)

    Yang, Wei; Chen, Huapu; Cui, Xuefan; Zhang, Kewei; Jiang, Dongneng; Deng, Siping; Zhu, Chunhua; Li, Guangli

    2017-09-01

    Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker

  20. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

    Science.gov (United States)

    2012-01-01

    Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were

  1. De novo Transcriptome Assembly of Floral Buds of Pineapple and Identification of Differentially Expressed Genes in Response to Ethephon Induction

    Science.gov (United States)

    Liu, Chuan-He; Fan, Chao

    2016-01-01

    A remarkable characteristic of pineapple is its ability to undergo floral induction in response to external ethylene stimulation. However, little information is available regarding the molecular mechanism underlying this process. In this study, the differentially expressed genes (DEGs) in plants exposed to 1.80 mL·L−1 (T1) or 2.40 mL·L−1 ethephon (T2) compared with Ct plants (control, cleaning water) were identified using RNA-seq and gene expression profiling. Illumina sequencing generated 65,825,224 high-quality reads that were assembled into 129,594 unigenes with an average sequence length of 1173 bp. Of these unigenes, 24,775 were assigned to specific KEGG pathways, of which metabolic pathways and biosynthesis of secondary metabolites were the most highly represented. Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority were involved in metabolic and cellular processes, cell and cell part, catalytic activity and binding. Gene expression profiling analysis revealed 3788, 3062, and 758 DEGs in the comparisons of T1 with Ct, T2 with Ct, and T2 with T1, respectively. GO analysis indicated that these DEGs were predominantly annotated to metabolic and cellular processes, cell and cell part, catalytic activity, and binding. KEGG pathway analysis revealed the enrichment of several important pathways among the DEGs, including metabolic pathways, biosynthesis of secondary metabolites and plant hormone signal transduction. Thirteen DEGs were identified as candidate genes associated with the process of floral induction by ethephon, including three ERF-like genes, one ETR-like gene, one LTI-like gene, one FT-like gene, one VRN1-like gene, three FRI-like genes, one AP1-like gene, one CAL-like gene, and one AG-like gene. qPCR analysis indicated that the changes in the expression of these 13 candidate genes were consistent with the alterations in the corresponding RPKM values, confirming the accuracy and credibility of the RNA-seq and gene

  2. De novo Transcriptome Assembly and Comparison of C3, C3-C4, and C4 Species of Tribe Salsoleae (Chenopodiaceae

    Directory of Open Access Journals (Sweden)

    Maximilian Lauterbach

    2017-11-01

    Full Text Available C4 photosynthesis is a carbon-concentrating mechanism that evolved independently more than 60 times in a wide range of angiosperm lineages. Among other alterations, the evolution of C4 from ancestral C3 photosynthesis requires changes in the expression of a vast number of genes. Differential gene expression analyses between closely related C3 and C4 species have significantly increased our understanding of C4 functioning and evolution. In Chenopodiaceae, a family that is rich in C4 origins and photosynthetic types, the anatomy, physiology and phylogeny of C4, C2, and C3 species of Salsoleae has been studied in great detail, which facilitated the choice of six samples of five representative species with different photosynthetic types for transcriptome comparisons. mRNA from assimilating organs of each species was sequenced in triplicates, and sequence reads were de novo assembled. These novel genetic resources were then analyzed to provide a better understanding of differential gene expression between C3, C2 and C4 species. All three analyzed C4 species belong to the NADP-ME type as most genes encoding core enzymes of this C4 cycle are highly expressed. The abundance of photorespiratory transcripts is decreased compared to the C3 and C2 species. Like in other C4 lineages of Caryophyllales, our results suggest that PEPC1 is the C4-specific isoform in Salsoleae. Two recently identified transporters from the PHT4 protein family may not only be related to the C4 syndrome, but also active in C2 photosynthesis in Salsoleae. In the two populations of the C2 species S. divaricata transcript abundance of several C4 genes are slightly increased, however, a C4 cycle is not detectable in the carbon isotope values. Most of the core enzymes of photorespiration are highly increased in the C2 species compared to both C3 and C4 species, confirming a successful establishment of the C2 photosynthetic pathway. Furthermore, a function of PEP-CK in C2 photosynthesis

  3. Defining the maize transcriptome de novo using deep RNA-Seq

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

    2011-06-01

    De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.

  4. Defining the maize transcriptome de novo using deep RNA-Seq

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Gross, Stephen; Choi, Cindy; Zhang, Tao; Lindquist, Erika; Wei, Chia-Lin; Wang, Zhong

    2011-06-02

    De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.

  5. De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing.

    Science.gov (United States)

    Natarajan, Purushothaman; Parani, Madasamy

    2011-04-15

    Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of

  6. De novo assembly and comparative transcriptome analysis of the foot from Chinese green mussel (Perna viridis in response to cadmium stimulation.

    Directory of Open Access Journals (Sweden)

    Xinhui Zhang

    Full Text Available The Chinese green mussel, Perna viridis, is a marine bivalve with important economic values as well as biomonitoring roles for aquatic pollution. Byssus, secreted by the foot gland, has been proved to bind heavy metals effectively. In this study, using the RNA sequencing technology, we performed comparative transcriptomic analysis on the mussel feet with or without inducing by cadmium (Cd. Our current work is aiming at providing insights into the molecular mechanisms of byssus binding to heavy metal ions. The transcriptome sequencing generated a total of 26.13-Gb raw data. After a careful assembly of clean data, we obtained a primary set of 105,127 unigenes, in which 32,268 unigenes were annotated. Based on the expression profiles, we identified 9,048 differentially expressed genes (DEGs between Cd treatment (50 or 100 μg/L at 48 h and the control, suggesting an extensive transcriptome response of the mussels during the Cd stimulation. Moreover, we observed that the expression levels of 54 byssus protein coding genes increased significantly after the 48-h Cd stimulation. In addition, 16 critical byssus protein coding genes were picked for profiling by quantitative real-time PCR (qRT-PCR. Finally, we reached a primary conclusion that high content of tyrosine (Tyr, cysteine (Cys, histidine (His residues or the special motif plays an important role in the accumulation of heavy metals in byssus. We also proposed an interesting model for the confirmed byssal Cd accumulation, in which biosynthesis of byssus proteins may play simultaneously critical roles since their transcription levels were significantly elevated.

  7. De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing

    Directory of Open Access Journals (Sweden)

    Parani Madasamy

    2011-04-01

    Full Text Available Abstract Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil

  8. A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads

    Directory of Open Access Journals (Sweden)

    Stanley Kimbung Mbandi

    2014-02-01

    Full Text Available Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score base filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms.

  9. De novo assembly of mud loach (Misgurnus anguillicaudatus skin transcriptome to identify putative genes involved in immunity and epidermal mucus secretion.

    Directory of Open Access Journals (Sweden)

    Yong Long

    Full Text Available Fish skin serves as the first line of defense against a wide variety of chemical, physical and biological stressors. Secretion of mucus is among the most prominent characteristics of fish skin and numerous innate immune factors have been identified in the epidermal mucus. However, molecular mechanisms underlying the mucus secretion and immune activities of fish skin remain largely unclear due to the lack of genomic and transcriptomic data for most economically important fish species. In this study, we characterized the skin transcriptome of mud loach using Illumia paired-end sequencing. A total of 40364 unigenes were assembled from 86.6 million (3.07 gigabases filtered reads. The mean length, N50 size and maximum length of assembled transcripts were 387, 611 and 8670 bp, respectively. A total of 17336 (43.76% unigenes were annotated by blast searches against the NCBI non-redundant protein database. Gene ontology mapping assigned a total of 108513 GO terms to 15369 (38.08% unigenes. KEGG orthology mapping annotated 9337 (23.23% unigenes. Among the identified KO categories, immune system is the largest category that contains various components of multiple immune pathways such as chemokine signaling, leukocyte transendothelial migration and T cell receptor signaling, suggesting the complexity of immune mechanisms in fish skin. As for mucin biosynthesis, 37 unigenes were mapped to 7 enzymes of the mucin type O-glycan biosynthesis pathway and 8 members of the polypeptide N-acetylgalactosaminyltransferase family were identified. Additionally, 38 unigenes were mapped to 23 factors of the SNARE interactions in vesicular transport pathway, indicating that the activity of this pathway is required for the processes of epidermal mucus storage and release. Moreover, 1754 simple sequence repeats (SSRs were detected in 1564 unigenes and dinucleotide repeats represented the most abundant type. These findings have laid the foundation for further understanding

  10. De novo assembly of mud loach (Misgurnus anguillicaudatus) skin transcriptome to identify putative genes involved in immunity and epidermal mucus secretion.

    Science.gov (United States)

    Long, Yong; Li, Qing; Zhou, Bolan; Song, Guili; Li, Tao; Cui, Zongbin

    2013-01-01

    Fish skin serves as the first line of defense against a wide variety of chemical, physical and biological stressors. Secretion of mucus is among the most prominent characteristics of fish skin and numerous innate immune factors have been identified in the epidermal mucus. However, molecular mechanisms underlying the mucus secretion and immune activities of fish skin remain largely unclear due to the lack of genomic and transcriptomic data for most economically important fish species. In this study, we characterized the skin transcriptome of mud loach using Illumia paired-end sequencing. A total of 40364 unigenes were assembled from 86.6 million (3.07 gigabases) filtered reads. The mean length, N50 size and maximum length of assembled transcripts were 387, 611 and 8670 bp, respectively. A total of 17336 (43.76%) unigenes were annotated by blast searches against the NCBI non-redundant protein database. Gene ontology mapping assigned a total of 108513 GO terms to 15369 (38.08%) unigenes. KEGG orthology mapping annotated 9337 (23.23%) unigenes. Among the identified KO categories, immune system is the largest category that contains various components of multiple immune pathways such as chemokine signaling, leukocyte transendothelial migration and T cell receptor signaling, suggesting the complexity of immune mechanisms in fish skin. As for mucin biosynthesis, 37 unigenes were mapped to 7 enzymes of the mucin type O-glycan biosynthesis pathway and 8 members of the polypeptide N-acetylgalactosaminyltransferase family were identified. Additionally, 38 unigenes were mapped to 23 factors of the SNARE interactions in vesicular transport pathway, indicating that the activity of this pathway is required for the processes of epidermal mucus storage and release. Moreover, 1754 simple sequence repeats (SSRs) were detected in 1564 unigenes and dinucleotide repeats represented the most abundant type. These findings have laid the foundation for further understanding the secretary

  11. De novo transcriptome assembly of a Chinese locoweed (Oxytropis ochrocephala species provides insights into genes associated with drought, salinity and cold tolerance

    Directory of Open Access Journals (Sweden)

    Wei eHe

    2015-12-01

    Full Text Available Background: Locoweeds (toxic Oxytropis and Astraglus species, containing the toxic agent swainsonine, pose serious threats to animal husbandry on grasslands in both China and the US. Some locoweeds have evolved adaptations in order to resist various stress conditions such as drought, salt and cold. As a result they replace other plants in their communities and become an ecological problem. Currently very limited genetic information of locoweeds is available and this hinders our understanding in the molecular basis of their environmental plasticity, and the interaction between locoweeds and their symbiotic swainsonine producing endophytes. Next-generation sequencing provides a means of obtaining transcriptomic sequences in a timely manner, which is particularly useful for non-model plants. In this study, we performed transcriptome sequencing of Oxytropis ochrocephala plants followed by a de nove assembly. Our primary aim was to provide an enriched pool of genetic sequences of an Oxytropis sp. for further locoweed research. Results: Transcriptomes of four different O. ochrocephala samples, from control (CK plants, and those that had experienced either drought (20% PEG, salt (150 mM NaCl or cold (4 °C stress were sequenced using an Illumina Hiseq 2000 platform. From 232,209,506 clean reads 23,220,950,600 (~23 G nucleotides, 182,430 transcripts and 88,942 unigenes were retrieved, with an N50 value of 1,237. Differential expression analysis revealed putative genes encoding heat shock proteins (HSPs and late embryogenesis abundant (LEA proteins, enzymes in secondary metabolite and plant hormone biosyntheses, and transcription factors which are involved in stress tolerance in O. ochrocephala. In order to validate our sequencing results, we further analyzed the expression profiles of nine genes by quantitative real-time PCR. Finally, we discuss the possible mechanism of O. ochrocephala’s adaptations to stress environment. Conclusion: Our

  12. De novo Assembly, Characterization of Immature Seed Transcriptome and Development of Genic-SSR Markers in Black Gram [Vigna mungo (L. Hepper].

    Directory of Open Access Journals (Sweden)

    J Souframanien

    Full Text Available Black gram [V. mungo (L. Hepper] is an important legume crop extensively grown in south and south-east Asia, where it is a major source of dietary protein for its predominantly vegetarian population. However, lack of genomic information and markers has become a limitation for genetic improvement of this crop. Here, we report the transcriptome sequencing of the immature seeds of black gram cv. TU94-2, by Illumina paired end sequencing technology to generate transcriptome sequences for gene discovery and genic-SSR marker development. A total of 17.2 million paired-end reads were generated and 48,291 transcript contigs (TCS were assembled with an average length of 443 bp. Based on sequence similarity search, 33,766 TCS showed significant similarity to known proteins. Among these, only 29,564 TCS were annotated with gene ontology (GO functional categories. A total number of 138 unique KEGG (Kyoto Encyclopedia of Genes and Genomes pathways were identified, of which majority of TCS are grouped into purine metabolism (678 followed by pyrimidine metabolism (263. A total of 48,291 TCS were searched for SSRs and 1,840 SSRs were identified in 1,572 TCS with an average frequency of one SSR per 11.9 kb. The tri-nucleotide repeats were most abundant (35% followed by di-nucleotide repeats (32%. PCR primer pairs were successfully designed for 933 SSR loci. Sequences analyses indicate that about 64.4% and 35.6% of the SSR motifs were present in the coding sequences (CDS and untranslated regions (UTRs respectively. Tri-nucleotide repeats (57.3% were preferentially present in the CDS. The rate of successful amplification and polymorphism were investigated using selected primers among 18 black gram accessions. Genic-SSR markers developed from the Illumina paired end sequencing of black gram immature seed transcriptome will provide a valuable resource for genetic diversity, evolution, linkage mapping, comparative genomics and marker-assisted selection in black gram.

  13. De novo Assembly, Characterization of Immature Seed Transcriptome and Development of Genic-SSR Markers in Black Gram [Vigna mungo (L.) Hepper

    Science.gov (United States)

    Souframanien, J.; Reddy, Kandali Sreenivasulu

    2015-01-01

    Black gram [V. mungo (L.) Hepper] is an important legume crop extensively grown in south and south-east Asia, where it is a major source of dietary protein for its predominantly vegetarian population. However, lack of genomic information and markers has become a limitation for genetic improvement of this crop. Here, we report the transcriptome sequencing of the immature seeds of black gram cv. TU94-2, by Illumina paired end sequencing technology to generate transcriptome sequences for gene discovery and genic-SSR marker development. A total of 17.2 million paired-end reads were generated and 48,291 transcript contigs (TCS) were assembled with an average length of 443 bp. Based on sequence similarity search, 33,766 TCS showed significant similarity to known proteins. Among these, only 29,564 TCS were annotated with gene ontology (GO) functional categories. A total number of 138 unique KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were identified, of which majority of TCS are grouped into purine metabolism (678) followed by pyrimidine metabolism (263). A total of 48,291 TCS were searched for SSRs and 1,840 SSRs were identified in 1,572 TCS with an average frequency of one SSR per 11.9 kb. The tri-nucleotide repeats were most abundant (35%) followed by di-nucleotide repeats (32%). PCR primer pairs were successfully designed for 933 SSR loci. Sequences analyses indicate that about 64.4% and 35.6% of the SSR motifs were present in the coding sequences (CDS) and untranslated regions (UTRs) respectively. Tri-nucleotide repeats (57.3%) were preferentially present in the CDS. The rate of successful amplification and polymorphism were investigated using selected primers among 18 black gram accessions. Genic-SSR markers developed from the Illumina paired end sequencing of black gram immature seed transcriptome will provide a valuable resource for genetic diversity, evolution, linkage mapping, comparative genomics and marker-assisted selection in black gram. PMID

  14. Evaluating de Bruijn graph assemblers on 454 transcriptomic data.

    Directory of Open Access Journals (Sweden)

    Xianwen Ren

    Full Text Available Next generation sequencing (NGS technologies have greatly changed the landscape of transcriptomic studies of non-model organisms. Since there is no reference genome available, de novo assembly methods play key roles in the analysis of these data sets. Because of the huge amount of data generated by NGS technologies for each run, many assemblers, e.g., ABySS, Velvet and Trinity, are developed based on a de Bruijn graph due to its time- and space-efficiency. However, most of these assemblers were developed initially for the Illumina/Solexa platform. The performance of these assemblers on 454 transcriptomic data is unknown. In this study, we evaluated and compared the relative performance of these de Bruijn graph based assemblers on both simulated and real 454 transcriptomic data. The results suggest that Trinity, the Illumina/Solexa-specialized transcriptomic assembler, performs the best among the multiple de Bruijn graph assemblers, comparable to or even outperforming the standard 454 assembler Newbler which is based on the overlap-layout-consensus algorithm. Our evaluation is expected to provide helpful guidance for researchers to choose assemblers when analyzing 454 transcriptomic data.

  15. Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium.

    Directory of Open Access Journals (Sweden)

    Fengxi Yang

    Full Text Available Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms

  16. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum.

    Directory of Open Access Journals (Sweden)

    Meng-Yao Li

    Full Text Available Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  17. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum).

    Science.gov (United States)

    Li, Meng-Yao; Tan, Hua-Wei; Wang, Feng; Jiang, Qian; Xu, Zhi-Sheng; Tian, Chang; Xiong, Ai-Sheng

    2014-01-01

    Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  18. Extreme-Scale De Novo Genome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Georganas, Evangelos [Intel Corporation, Santa Clara, CA (United States); Hofmeyr, Steven [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Egan, Rob [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Rokhsar, Daniel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.

    2017-09-26

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.

  19. Spaced Seed Data Structures for De Novo Assembly

    Directory of Open Access Journals (Sweden)

    Inanç Birol

    2015-01-01

    Full Text Available De novo assembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.

  20. De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits.

    Directory of Open Access Journals (Sweden)

    Shuang Chen

    Full Text Available Locusts exhibit remarkable density-dependent phenotype (phase changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to identify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4(th instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development

  1. De Novo Transcriptome Assembly and Identification of Gene Candidates for Rapid Evolution of Soil Al Tolerance in Anthoxanthum odoratum at the Long-Term Park Grass Experiment.

    Science.gov (United States)

    Gould, Billie; McCouch, Susan; Geber, Monica

    2015-01-01

    Studies of adaptation in the wild grass Anthoxanthum odoratum at the Park Grass Experiment (PGE) provided one of the earliest examples of rapid evolution in plants. Anthoxanthum has become locally adapted to differences in soil Al toxicity, which have developed there due to soil acidification from long-term experimental fertilizer treatments. In this study, we used transcriptome sequencing to identify Al stress responsive genes in Anthoxanhum and identify candidates among them for further molecular study of rapid Al tolerance evolution at the PGE. We examined the Al content of Anthoxanthum tissues and conducted RNA-sequencing of root tips, the primary site of Al induced damage. We found that despite its high tolerance Anthoxanthum is not an Al accumulating species. Genes similar to those involved in organic acid exudation (TaALMT1, ZmMATE), cell wall modification (OsSTAR1), and internal Al detoxification (OsNRAT1) in cultivated grasses were responsive to Al exposure. Expression of a large suite of novel loci was also triggered by early exposure to Al stress in roots. Three-hundred forty five transcripts were significantly more up- or down-regulated in tolerant vs. sensitive Anthoxanthum genotypes, providing important targets for future study of rapid evolution at the PGE.

  2. De Novo Transcriptome Assembly and Identification of Gene Candidates for Rapid Evolution of Soil Al Tolerance in Anthoxanthum odoratum at the Long-Term Park Grass Experiment.

    Directory of Open Access Journals (Sweden)

    Billie Gould

    Full Text Available Studies of adaptation in the wild grass Anthoxanthum odoratum at the Park Grass Experiment (PGE provided one of the earliest examples of rapid evolution in plants. Anthoxanthum has become locally adapted to differences in soil Al toxicity, which have developed there due to soil acidification from long-term experimental fertilizer treatments. In this study, we used transcriptome sequencing to identify Al stress responsive genes in Anthoxanhum and identify candidates among them for further molecular study of rapid Al tolerance evolution at the PGE. We examined the Al content of Anthoxanthum tissues and conducted RNA-sequencing of root tips, the primary site of Al induced damage. We found that despite its high tolerance Anthoxanthum is not an Al accumulating species. Genes similar to those involved in organic acid exudation (TaALMT1, ZmMATE, cell wall modification (OsSTAR1, and internal Al detoxification (OsNRAT1 in cultivated grasses were responsive to Al exposure. Expression of a large suite of novel loci was also triggered by early exposure to Al stress in roots. Three-hundred forty five transcripts were significantly more up- or down-regulated in tolerant vs. sensitive Anthoxanthum genotypes, providing important targets for future study of rapid evolution at the PGE.

  3. De Novo Transcriptome Sequence Assembly and Identification of AP2/ERF Transcription Factor Related to Abiotic Stress in Parsley (Petroselinum crispum)

    OpenAIRE

    Li, Meng-Yao; Tan, Hua-Wei; Wang, Feng; Jiang, Qian; Xu, Zhi-Sheng; Tian, Chang; Xiong, Ai-Sheng

    2014-01-01

    Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation...

  4. A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors

    Directory of Open Access Journals (Sweden)

    Donald M. Bryant

    2017-01-01

    Full Text Available Mammals have extremely limited regenerative capabilities; however, axolotls are profoundly regenerative and can replace entire limbs. The mechanisms underlying limb regeneration remain poorly understood, partly because the enormous and incompletely sequenced genomes of axolotls have hindered the study of genes facilitating regeneration. We assembled and annotated a de novo transcriptome using RNA-sequencing profiles for a broad spectrum of tissues that is estimated to have near-complete sequence information for 88% of axolotl genes. We devised expression analyses that identified the axolotl orthologs of cirbp and kazald1 as highly expressed and enriched in blastemas. Using morpholino anti-sense oligonucleotides, we find evidence that cirbp plays a cytoprotective role during limb regeneration whereas manipulation of kazald1 expression disrupts regeneration. Our transcriptome and annotation resources greatly complement previous transcriptomic studies and will be a valuable resource for future research in regenerative biology.

  5. RNA-Seq de novo assembly and differential transcriptome analysis of the nematode Ascaridia galli in relation to in vivo exposure to flubendazole.

    Directory of Open Access Journals (Sweden)

    Mihaela M Martis

    Full Text Available The nematode Ascaridia galli (order Ascaridida is an economically important intestinal parasite responsible for increased food consumption, reduced performance and elevated mortality in commercial poultry production. This roundworm is an emerging problem in several European countries on farms with laying hens, as a consequence of the recent European Union (EU ban on conventional battery cages. As infection is associated with slow development of low levels of acquired protective immunity, parasite control relies on repeated use of dewormers (anthelmintics. Benzimidazoles (BZ are currently the only anthelmintic registered in the EU for use in controlling A. galli and there is an obvious risk of overuse of one drug class, selecting for resistance. Thus we developed a reference transcriptome of A. galli to investigate the response in gene expression before and after exposure to the BZ drug flubendazole (FLBZ. Transcriptional variations between treated and untreated A. galli showed that transcripts annotated as mitochondrial glutamate dehydrogenase and cytochrome P450 were significantly down-regulated in treated worms, whereas transcripts homologous to heat shock proteins (HSP, catalase, phosphofructokinase, and a multidrug resistance P-glycoprotein (PGP1 were significantly up-regulated in treated worms. Investigation of candidate transcripts responsible for anthelmintic resistance in livestock nematodes led to identification of several tubulins, including six new isoforms of beta-tubulin, and several ligand-gated ionotropic receptors and ABC-transporters. We discovered several transcripts associated with drug binding and processing genes, but further characterisation using a larger set of worms exposed to BZs in functional assays is required to determine how these are involved in drug binding and metabolism.

  6. Development of EST-SSR markers in flowering Chinese cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsen et Lee based on de novo transcriptomic assemblies.

    Directory of Open Access Journals (Sweden)

    Jingfang Chen

    Full Text Available Flowering Chinese cabbage is one of the most important vegetable crops in southern China. Genetic improvement of various agronomic traits in this crop is underway to meet high market demand in the region, but the progress is hampered by limited number of molecular markers available in this crop. This study aimed to develop EST-SSR markers from transcriptome sequences generated by next-generation sequencing. RNA-seq of eight cabbage samples identified 48,975 unigenes. Of these unigenes, 23,267 were annotated in 56 gene ontology (GO categories, 6,033 were mapped to 131 KEGG pathways, and 7,825 were assigned to clusters of orthologous groups (COGs. From the unigenes, 8,165 EST-SSR loci were identified and 98.57% of them were 1-3 nucleotide repeats with 14.32%, 41.08% and 43.17% of mono-, di- and tri-nucleotide repeats, respectively. Fifty-eight types of motifs were identified with A/T, AG/CT, AT/AT, AC/GT, AAG/CTT and AGG/CCT the most abundant. The lengths of repeated nucleotide sequences in all SSR loci ranged from 12 to 60 bp, with most (88.51% under 20 bp. Among 170 primer pairs were randomly selected from a total of 4,912 SSR primers we designed, 48 yielded unambiguously polymorphic bands with high reproducibility. Cluster analysis using 48 SSRs classified 34 flowering Chinese cabbage cultivars into three groups. A large number of EST-SSR markers identified in this study will facilitate marker-assisted selection in the breeding programs of flowering Chinese cabbage.

  7. Development of EST-SSR markers in flowering Chinese cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsen et Lee) based on de novo transcriptomic assemblies.

    Science.gov (United States)

    Chen, Jingfang; Li, Ronghua; Xia, Yanshi; Bai, Guihua; Guo, Peiguo; Wang, Zhiliang; Zhang, Hua; Siddique, Kadambot H M

    2017-01-01

    Flowering Chinese cabbage is one of the most important vegetable crops in southern China. Genetic improvement of various agronomic traits in this crop is underway to meet high market demand in the region, but the progress is hampered by limited number of molecular markers available in this crop. This study aimed to develop EST-SSR markers from transcriptome sequences generated by next-generation sequencing. RNA-seq of eight cabbage samples identified 48,975 unigenes. Of these unigenes, 23,267 were annotated in 56 gene ontology (GO) categories, 6,033 were mapped to 131 KEGG pathways, and 7,825 were assigned to clusters of orthologous groups (COGs). From the unigenes, 8,165 EST-SSR loci were identified and 98.57% of them were 1-3 nucleotide repeats with 14.32%, 41.08% and 43.17% of mono-, di- and tri-nucleotide repeats, respectively. Fifty-eight types of motifs were identified with A/T, AG/CT, AT/AT, AC/GT, AAG/CTT and AGG/CCT the most abundant. The lengths of repeated nucleotide sequences in all SSR loci ranged from 12 to 60 bp, with most (88.51%) under 20 bp. Among 170 primer pairs were randomly selected from a total of 4,912 SSR primers we designed, 48 yielded unambiguously polymorphic bands with high reproducibility. Cluster analysis using 48 SSRs classified 34 flowering Chinese cabbage cultivars into three groups. A large number of EST-SSR markers identified in this study will facilitate marker-assisted selection in the breeding programs of flowering Chinese cabbage.

  8. De novo assembly and phasing of a Korean human genome.

    Science.gov (United States)

    Seo, Jeong-Sun; Rhie, Arang; Kim, Junsoo; Lee, Sangjin; Sohn, Min-Hwan; Kim, Chang-Uk; Hastie, Alex; Cao, Han; Yun, Ji-Young; Kim, Jihye; Kuk, Junho; Park, Gun Hwa; Kim, Juhyeok; Ryu, Hanna; Kim, Jongbum; Roh, Mira; Baek, Jeonghun; Hunkapiller, Michael W; Korlach, Jonas; Shin, Jong-Yeon; Kim, Changhoon

    2016-10-13

    Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of

  9. Characterization and analysis of a de novo transcriptome from the pygmy grasshopper Tetrix japonica.

    Science.gov (United States)

    Qiu, Zhongying; Liu, Fei; Lu, Huimeng; Huang, Yuan

    2017-05-01

    The pygmy grasshopper Tetrix japonica is a common insect distributed throughout the world, and it has the potential for use in studies of body colour polymorphism, genomics and the biology of Tetrigoidea (Insecta: Orthoptera). However, limited biological information is available for this insect. Here, we conducted a de novo transcriptome study of adult and larval T. japonica to provide a better understanding of its gene expression and develop genomic resources for future work. We sequenced and explored the characteristics of the de novo transcriptome of T. japonica using Illumina HiSeq 2000 platform. A total of 107 608 206 paired-end clean reads were assembled into 61 141 unigenes using the trinity software; the mean unigene size was 771 bp, and the N50 length was 1238 bp. A total of 29 225 unigenes were functionally annotated to the NCBI nonredundant protein sequences (Nr), NCBI nonredundant nucleotide sequences (Nt), a manually annotated and reviewed protein sequence database (Swiss-Prot), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. A large number of putative genes that are potentially involved in pigment pathways, juvenile hormone (JH) metabolism and signalling pathways were identified in the T. japonica transcriptome. Additionally, 165 769 and 156 796 putative single nucleotide polymorphisms occurred in the adult and larvae transcriptomes, respectively, and a total of 3162 simple sequence repeats were detected in this assembly. This comprehensive transcriptomic data for T. japonica will provide a usable resource for gene predictions, signalling pathway investigations and molecular marker development for this species and other pygmy grasshoppers. © 2016 John Wiley & Sons Ltd.

  10. Identifying wrong assemblies in de novo short read primary ...

    Indian Academy of Sciences (India)

    2016-08-05

    Aug 5, 2016 ... Most of these assemblies are done using some de novo short read assemblers and other related approaches. .... benchmarking projects like Assemblathon 1, Assemblathon ... from a large insert library (at least 1000 bases).

  11. Icarus: visualizer for de novo assembly evaluation.

    Science.gov (United States)

    Mikheenko, Alla; Valin, Gleb; Prjibelski, Andrey; Saveliev, Vladislav; Gurevich, Alexey

    2016-11-01

    : Data visualization plays an increasingly important role in NGS data analysis. With advances in both sequencing and computational technologies, it has become a new bottleneck in genomics studies. Indeed, evaluation of de novo genome assemblies is one of the areas that can benefit from the visualization. However, even though multiple quality assessment methods are now available, existing visualization tools are hardly suitable for this purpose. Here, we present Icarus-a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on the tool QUAST. Icarus can be used in studies where a related reference genome is available, as well as for non-model organisms. The tool is available online and as a standalone application. http://cab.spbu.ru/software/icarus CONTACT: aleksey.gurevich@spbu.ruSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. The Amaranth Genome: Genome, Transcriptome, and Physical Map Assembly

    Directory of Open Access Journals (Sweden)

    J. W. Clouse

    2016-03-01

    Full Text Available Amaranth ( L. is an emerging pseudocereal native to the New World that has garnered increased attention in recent years because of its nutritional quality, in particular its seed protein and more specifically its high levels of the essential amino acid lysine. It belongs to the Amaranthaceae family, is an ancient paleopolyploid that shows disomic inheritance (2 = 32, and has an estimated genome size of 466 Mb. Here we present a high-quality draft genome sequence of the grain amaranth. The genome assembly consisted of 377 Mb in 3518 scaffolds with an N of 371 kb. Repetitive element analysis predicted that 48% of the genome is comprised of repeat sequences, of which -like elements were the most commonly classified retrotransposon. A de novo transcriptome consisting of 66,370 contigs was assembled from eight different amaranth tissue and abiotic stress libraries. Annotation of the genome identified 23,059 protein-coding genes. Seven grain amaranths (, , and and their putative progenitor ( were resequenced. A single nucleotide polymorphism (SNP phylogeny supported the classification of as the progenitor species of the grain amaranths. Lastly, we generated a de novo physical map for using the BioNano Genomics’ Genome Mapping platform. The physical map spanned 340 Mb and a hybrid assembly using the BioNano physical maps nearly doubled the N of the assembly to 697 kb. Moreover, we analyzed synteny between amaranth and sugar beet ( L. and estimated, using analysis, the age of the most recent polyploidization event in amaranth.

  13. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

    Science.gov (United States)

    2013-01-01

    Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514

  14. Comparative de novo transcriptome analysis of male and female Sea buckthorn.

    Science.gov (United States)

    Bansal, Ankush; Salaria, Mehul; Sharma, Tashil; Stobdan, Tsering; Kant, Anil

    2018-02-01

    Sea buckthorn is a dioecious medicinal plant found at high altitude. The plant has both male and female reproductive organs in separate individuals. In this article, whole transcriptome de novo assemblies of male and female flower bud samples were carried out using Illumina NextSeq 500 platform to determine the role of the genes involved in sex determination. Moreover, genes with differential expression in male and female transcriptomes were identified to understand the underlying sex determination mechanism. The current study showed 63,904 and 62,272 coding sequences (CDS) in female and male transcriptome data sets, respectively. 16,831 common CDS were screened out from both transcriptomes, out of which 625 were upregulated and 491 were found to be downregulated. To understand the potential regulatory roles of differentially expressed genes in metabolic networks and biosynthetic pathways: KEGG mapping, gene ontology, and co-expression network analysis were performed. Comparison with Flowering Interactive Database (FLOR-ID) resulted in eight differentially expressed genes viz. CHD3-type chromatin-remodeling factor PICKLE ( PKL ), phytochrome-associated serine/threonine-protein phosphatase ( FYPP ), protein TOPLESS ( TPL ), sensitive to freezing 6 ( SFR6 ), lysine-specific histone demethylase 1 homolog 1 ( LDL1 ), pre-mRNA-processing-splicing factor 8A ( PRP8A ), sucrose synthase 4 ( SUS4 ), ubiquitin carboxyl-terminal hydrolase 12 ( UBP12 ), known to be broadly involved in flowering, photoperiodism, embryo development, and cold response pathways. Male and female flower bud transcriptome data of Sea buckthorn may provide comprehensive information at genomic level for the identification of genetic regulation involved in sex determination.

  15. De novo-based transcriptome profiling of male-sterile and fertile watermelon lines.

    Science.gov (United States)

    Rhee, Sun-Ju; Kwon, Taehyung; Seo, Minseok; Jang, Yoon Jeong; Sim, Tae Yong; Cho, Seoae; Han, Sang-Wook; Lee, Gung Pyo

    2017-01-01

    The whole-genome sequence of watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai), a valuable horticultural crop worldwide, was released in 2013. Here, we compared a de novo-based approach (DBA) to a reference-based approach (RBA) using RNA-seq data, to aid in efforts to improve the annotation of the watermelon reference genome and to obtain biological insight into male-sterility in watermelon. We applied these techniques to available data from two watermelon lines: the male-sterile line DAH3615-MS and the male-fertile line DAH3615. Using DBA, we newly annotated 855 watermelon transcripts, and found gene functional clusters predicted to be related to stimulus responses, nucleic acid binding, transmembrane transport, homeostasis, and Golgi/vesicles. Among the DBA-annotated transcripts, 138 de novo-exclusive differentially-expressed genes (DEDEGs) related to male sterility were detected. Out of 33 randomly selected newly annotated transcripts and DEDEGs, 32 were validated by RT-qPCR. This study demonstrates the usefulness and reliability of the de novo transcriptome assembly in watermelon, and provides new insights for researchers exploring transcriptional blueprints with regard to the male sterility.

  16. Examination of triacylglycerol biosynthetic pathways via de novo transcriptomic and proteomic analyses in an unsequenced microalga.

    Directory of Open Access Journals (Sweden)

    Michael T Guarnieri

    Full Text Available Biofuels derived from algal lipids represent an opportunity to dramatically impact the global energy demand for transportation fuels. Systems biology analyses of oleaginous algae could greatly accelerate the commercialization of algal-derived biofuels by elucidating the key components involved in lipid productivity and leading to the initiation of hypothesis-driven strain-improvement strategies. However, higher-level systems biology analyses, such as transcriptomics and proteomics, are highly dependent upon available genomic sequence data, and the lack of these data has hindered the pursuit of such analyses for many oleaginous microalgae. In order to examine the triacylglycerol biosynthetic pathway in the unsequenced oleaginous microalga, Chlorella vulgaris, we have established a strategy with which to bypass the necessity for genomic sequence information by using the transcriptome as a guide. Our results indicate an upregulation of both fatty acid and triacylglycerol biosynthetic machinery under oil-accumulating conditions, and demonstrate the utility of a de novo assembled transcriptome as a search model for proteomic analysis of an unsequenced microalga.

  17. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    Directory of Open Access Journals (Sweden)

    Juan Ning

    Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  18. Transcriptome sequencing and De Novo analysis of Youngia japonica using the illumina platform.

    Directory of Open Access Journals (Sweden)

    Yulan Peng

    Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.

  19. Rapid centriole assembly in Naegleria reveals conserved roles for both de novo and mentored assembly.

    Science.gov (United States)

    Fritz-Laylin, Lillian K; Levy, Yaron Y; Levitan, Edward; Chen, Sean; Cande, W Zacheus; Lai, Elaine Y; Fulton, Chandler

    2016-03-01

    Centrioles are eukaryotic organelles whose number and position are critical for cilia formation and mitosis. Many cell types assemble new centrioles next to existing ones ("templated" or mentored assembly). Under certain conditions, centrioles also form without pre-existing centrioles (de novo). The synchronous differentiation of Naegleria amoebae to flagellates represents a unique opportunity to study centriole assembly, as nearly 100% of the population transitions from having no centrioles to having two within minutes. Here, we find that Naegleria forms its first centriole de novo, immediately followed by mentored assembly of the second. We also find both de novo and mentored assembly distributed among all major eukaryote lineages. We therefore propose that both modes are ancestral and have been conserved because they serve complementary roles, with de novo assembly as the default when no pre-existing centriole is available, and mentored assembly allowing precise regulation of number, timing, and location of centriole assembly. © 2016 Wiley Periodicals, Inc.

  20. Towards accurate de novo assembly for genomes with repeats

    NARCIS (Netherlands)

    Bucur, Doina

    2017-01-01

    De novo genome assemblers designed for short k-mer length or using short raw reads are unlikely to recover complex features of the underlying genome, such as repeats hundreds of bases long. We implement a stochastic machine-learning method which obtains accurate assemblies with repeats and

  1. A practical guide to build de-novo assemblies for single tissues of non-model organisms: the example of a Neotropical frog

    Directory of Open Access Journals (Sweden)

    Santiago Montero-Mendieta

    2017-09-01

    Full Text Available Whole genome sequencing (WGS is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembled de-novo. We used RNA-seq to obtain the transcriptomic profile for Oreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome of O. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating a de-novo transcriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to build de-novo transcriptome assemblies using readily available software and is freely available at: https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

  2. Faucet: streaming de novo assembly graph construction.

    Science.gov (United States)

    Rozov, Roye; Goldshlager, Gil; Halperin, Eran; Shamir, Ron

    2018-01-01

    We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased. Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata-coverage counts collected at junction k-mers and connections bridging between junction pairs-contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency-namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14-110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available. Faucet is available at https://github.com/Shamir-Lab/Faucet. rshamir@tau.ac.il or eranhalperin@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  3. KrillDB: A de novo transcriptome database for the Antarctic krill (Euphausia superba.

    Directory of Open Access Journals (Sweden)

    Gabriele Sales

    Full Text Available Antarctic krill (Euphausia superba is a key species in the Southern Ocean with an estimated biomass between 100 and 500 million tonnes. Changes in krill population viability would have catastrophic effect on the Antarctic ecosystem. One looming threat due to elevated levels of anthropogenic atmospheric carbon dioxide (CO2 is ocean acidification (lowering of sea water pH by CO2 dissolving into the oceans. The genetics of Antarctic krill has long been of scientific interest for both for the analysis of population structure and analysis of functional genetics. However, the genetic resources available for the species are relatively modest. We have developed the most advanced genetic database on Euphausia superba, KrillDB, which includes comprehensive data sets of former and present transcriptome projects. In particular, we have built a de novo transcriptome assembly using more than 360 million Illumina sequence reads generated from larval krill including individuals subjected to different CO2 levels. The database gives access to: 1 the full list of assembled genes and transcripts; 2 their level of similarity to transcripts and proteins from other species; 3 the predicted protein domains contained within each transcript; 4 their predicted GO terms; 5 the level of expression of each transcript in the different larval stages and CO2 treatments. All references to external entities (sequences, domains, GO terms are equipped with a link to the appropriate source database. Moreover, the software implements a full-text search engine that makes it possible to submit free-form queries. KrillDB represents the first large-scale attempt at classifying and annotating the full krill transcriptome. For this reason, we believe it will constitute a cornerstone of future approaches devoted to physiological and molecular study of this key species in the Southern Ocean food web.

  4. De novo transcriptomic analysis of cowpea (Vigna unguiculata L. Walp.) for genic SSR marker development.

    Science.gov (United States)

    Chen, Honglin; Wang, Lixia; Liu, Xiaoyan; Hu, Liangliang; Wang, Suhua; Cheng, Xuzhen

    2017-07-11

    Cowpea [Vigna unguiculata (L.) Walp.] is one of the most important legumes in tropical and semi-arid regions. However, there is relatively little genomic information available for genetic research on and breeding of cowpea. The objectives of this study were to analyse the cowpea transcriptome and develop genic molecular markers for future genetic studies of this genus. Approximately 54 million high-quality cDNA sequence reads were obtained from cowpea based on Illumina paired-end sequencing technology and were de novo assembled to generate 47,899 unigenes with an N50 length of 1534 bp. Sequence similarity analysis revealed 36,289 unigenes (75.8%) with significant similarity to known proteins in the non-redundant (Nr) protein database, 23,471 unigenes (49.0%) with BLAST hits in the Swiss-Prot database, and 20,654 unigenes (43.1%) with high similarity in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Further analysis identified 5560 simple sequence repeats (SSRs) as potential genic molecular markers. Validating a random set of 500 SSR markers yielded 54 polymorphic markers among 32 cowpea accessions. This transcriptomic analysis of cowpea provided a valuable set of genomic data for characterizing genes with important agronomic traits in Vigna unguiculata and a new set of genic SSR markers for further genetic studies and breeding in cowpea and related Vigna species.

  5. De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing

    Science.gov (United States)

    Zhuang Hu; Tian Zhang; Xiao-Xiao Gao; Yang Wang; Qiang Zhang; Hui-Juan Zhou; Gui-Fang Zhao; Ma-Li Wang; Keith E. Woeste; Peng Zhao

    2016-01-01

    Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome...

  6. De novo assembly of highly diverse viral populations

    Directory of Open Access Journals (Sweden)

    Yang Xiao

    2012-09-01

    Full Text Available Abstract Background Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage. Results We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software. Conclusions We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.

  7. Illumina–based de novo transcriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    Administrator

    2017-10-25

    Oct 25, 2017 ... (Shanghai, China) following manufacturer's protocols (Illumina, San .... suggests that pathways involved in musk production are expressed at a ..... Strickler S. R., Aureliano B. and Mueller L. A. 2012 Designing a transcriptome.

  8. De novo transcriptome analysis of Sinapis alba in revealing the glucosinolate and phytochelatin pathways

    Directory of Open Access Journals (Sweden)

    Xiaohui eZhang

    2016-03-01

    Full Text Available Sinapis alba is an important condiment crop and can also be used as a phytoremediation plant. Though it has important economic and agronomic values, sequence data and the genetic tools are still rare in this plant. In the present study, a de novo transcriptome based on the transcriptions of leaves, stems and roots was assembled for S. alba for the first time. The transcriptome contains 47,972 unigenes with a mean length of 1,185 nt and an N50 of 1,672 nt. Among these unigenes, 46,535 (97% unigenes were annotated by at least one of the following databases: NCBI non-redundant (Nr, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG pathway, Gene Ontology (GO, and Clusters of Orthologous Groups of proteins (COGs. The tissue expression pattern profiles revealed that 3,489, 1,361 and 8,482 unigenes were predominantly expressed in the leaves, stems and roots of S. alba, respectively. Genes predominantly expressed in the leaf were enriched in photosynthesis- and carbon fixation-related pathways. Genes predominantly expressed in the stem were enriched in not only pathways related to sugar, ether lipid and amino acid metabolisms but also plant hormone signal transduction and circadian rhythm pathways, while the root-dominant genes were enriched in pathways related to lignin and cellulose syntheses, involved in plant-pathogen interactions, and potentially responsible for heavy metal chelating and detoxification. Based on this transcriptome, 14,727 simple sequence repeats (SSRs were identified, and 12,830 pairs of primers were developed for 2,522 SSR-containing unigenes. Additionally, the glucosinolate (GSL and phytochelatin metabolic pathways, which give the characteristic flavor and the heavy metal tolerance of this plant, were intensively analyzed. The genes of aliphatic GSLs pathway were predominantly expressed in roots. The absence of aliphatic GSLs in leaf tissues was due to the shutdown of BCAT4, MAM1 and CYP79F1 expressions. Glutathione was

  9. De novo post-pollen mitosis II tobacco pollen tube transcriptome

    Czech Academy of Sciences Publication Activity Database

    Hafidh, Said; Breznenová, Katarína; Honys, David

    2012-01-01

    Roč. 7, č. 8 (2012), s. 918-921 ISSN 1559-2316 R&D Projects: GA ČR GPP501/11/P321; GA ČR GA522/09/0858 Institutional research plan: CEZ:AV0Z50380511 Keywords : de novo pollen tube transcriptome * male gametophyte development * pollen tube growth Subject RIV: ED - Physiology

  10. De novo assembly of a haplotype-resolved human genome.

    Science.gov (United States)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  11. De novo transcriptome and small RNA analysis of two Chinese willow cultivars reveals stress response genes in Salix matsudana.

    Directory of Open Access Journals (Sweden)

    Guodong Rao

    Full Text Available Salix matsudana Koidz. is a deciduous, rapidly growing, and drought resistant tree and is one of the most widely distributed and commonly cultivated willow species in China. Currently little transcriptomic and small RNAomic data are available to reveal the genes involve in the stress resistant in S. matsudana. Here, we report the RNA-seq analysis results of both transcriptome and small RNAome data using Illumina deep sequencing of shoot tips from two willow variants(Salix. matsudana and Salix matsudana Koidz. cultivar 'Tortuosa'. De novo gene assembly was used to generate the consensus transcriptome and small RNAome, which contained 106,403 unique transcripts with an average length of 944 bp and a total length of 100.45 MB, and 166 known miRNAs representing 35 miRNA families. Comparison of transcriptomes and small RNAomes combined with quantitative real-time PCR from the two Salix libraries revealed a total of 292 different expressed genes(DEGs and 36 different expressed miRNAs (DEMs. Among the DEGs and DEMs, 196 genes and 24 miRNAs were up regulated, 96 genes and 12 miRNA were down regulated in S. matsudana. Functional analysis of DEGs and miRNA targets showed that many genes were involved in stress resistance in S. matsudana. Our global gene expression profiling presents a comprehensive view of the transcriptome and small RNAome which provide valuable information and sequence resources for uncovering the stress response genes in S. matsudana. Moreover the transcriptome and small RNAome data provide a basis for future study of genetic resistance in Salix.

  12. De Novo Transcriptomic Analysis of an Oleaginous Microalga: Pathway Description and Gene Discovery for Production of Next-Generation Biofuels

    Science.gov (United States)

    Wan, LingLin; Han, Juan; Sang, Min; Li, AiFen; Wu, Hong; Yin, ShunJi; Zhang, ChengWu

    2012-01-01

    Background Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:22536352

  13. De novo transcriptomic analysis of an oleaginous microalga: pathway description and gene discovery for production of next-generation biofuels.

    Directory of Open Access Journals (Sweden)

    LingLin Wan

    Full Text Available Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production.We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to >3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem.Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  14. De novo Sequencing and Analysis of Lemongrass Transcriptome Provides First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    Directory of Open Access Journals (Sweden)

    Seema Meena

    2016-07-01

    Full Text Available Aromatic grasses of the genus Cymbopogon (Poaceae family represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavour, fragrance, cosmetic and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step towards understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases (TPS, pyrophosphatases (PPase, alcohol dehydrogenases (ADH, aldo-keto reductases (AKR, carotenoid cleavage dioxygenases (CCD, alcohol acetyltransferases (AAT and aldehyde dehydrogenases (ALDH, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes and acetates. Molecular modeling and docking further supported the role of identified enzymes in aroma formation in Cymbopogon. Also, simple sequence repeats (SSRs were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  15. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

    Science.gov (United States)

    Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  16. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    Science.gov (United States)

    Meena, Seema; Kumar, Sarma R.; Venkata Rao, D. K.; Dwivedi, Varun; Shilpashree, H. B.; Rastogi, Shubhra; Shasany, Ajit K.; Nagegowda, Dinesh A.

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768

  17. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  18. De novo assembly of Phlomis purpurea after challenging with Phytophthora cinnamomi.

    Science.gov (United States)

    Baldé, Aladje; Neves, Dina; García-Breijo, Francisco J; Pais, Maria Salomé; Cravador, Alfredo

    2017-09-06

    Phlomis plants are a source of biological active substances with potential applications in the control of phytopathogens. Phlomis purpurea (Lamiaceae) is autochthonous of southern Iberian Peninsula and Morocco and was found to be resistant to Phytophthora cinnamomi. Phlomis purpurea has revealed antagonistic effect in the rhizosphere of Quercus suber and Q. ilex against P. cinnamomi. Phlomis purpurea roots produce bioactive compounds exhibiting antitumor and anti-Phytophthora activities with potential to protect susceptible plants. Although these important capacities of P. purpurea have been demonstrated, there is no transcriptomic or genomic information available in public databases that could bring insights on the genes underlying this anti-oomycete activity. Using Illumina technology we obtained a de novo assembly of P. purpurea transcriptome and differential transcript abundance to identify putative defence related genes in challenged versus non-challenged plants. A total of 1,272,600,000 reads from 18 cDNA libraries were merged and assembled into 215,739 transcript contigs. BLASTX alignment to Nr NCBI database identified 124,386 unique annotated transcripts (57.7%) with significant hits. Functional annotation identified 83,550 out of 124,386 unique transcripts, which were mapped to 141 pathways. 39% of unigenes were assigned GO terms. Their functions cover biological processes, cellular component and molecular functions. Genes associated with response to stimuli, cellular and primary metabolic processes, catalytic and transporter functions were among those identified. Differential transcript abundance analysis using DESeq revealed significant differences among libraries depending on post-challenge times. Comparative cyto-histological studies of P. purpurea roots challenged with P. cinnamomi zoospores and controls revealed specific morphological features (exodermal strips and epi-cuticular layer), that may provide a constitutive efficient barrier against

  19. De novo transcriptome sequencing of the Octopus vulgaris hemocytes using Illumina RNA-Seq technology: response to the infection by the gastrointestinal parasite Aggregata octopiana.

    Science.gov (United States)

    Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino

    2014-01-01

    Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus' well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers for octopus resistance against

  20. De novo transcriptome profiling of cold-stressed siliques during pod filling stages of Indian mustard (Brassica juncea L.

    Directory of Open Access Journals (Sweden)

    Somya eSinha

    2015-10-01

    Full Text Available Low temperature is a major abiotic stress that impedes plant growth and development. Brassica juncea is an economically important oil seed crop and is sensitive to freezing stress during pod filling subsequently leading to abortion of seeds. To understand the cold stress mediated global perturbations in gene expression, whole transcriptome of B. juncea siliques that were exposed to sub-optimal temperature was sequenced. Manually self-pollinated siliques at different stages of development were subjected to either short (6 h or long (12 h durations of chilling stress followed by construction of RNA-seq libraries and deep sequencing using Illumina’s NGS platform. De-novo assembly of B. juncea transcriptome resulted in 133641 transcripts, whose combined length was 117 Mb and N50 value was 1428 bp. We identified 13342 differentially regulated transcripts by pair-wise comparison of 18 transcriptome libraries. Hierarchical clustering of these differentially expressed transcripts along with Spearman correlation analysis identified two major clusters representing early (5-15 DAP and late stages (20-30 DAP of silique development. Detailed analysis led to the discovery of two gene expression clusters whose transcripts were inducible at both durations of the cold stress irrespective of the developmental stages. We further explored the expression patterns of gene families encoding for transcription factors (TFs, transcription regulators (TRs and kinases, and found that cold stress induced protein kinases specifically during early silique development. We validated the digital gene expression profiles of selected transcripts by qPCR and found a high degree of concordance between the two analyses. To our knowledge this is the first report of transcriptome sequencing of cold-stressed B. juncea siliques. The data generated in this study would be a valuable resource for not only understanding the cold stress signaling pathway but also for introducing cold

  1. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  2. De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum Seedlings and Identification of Salt Tolerance Genes

    Directory of Open Access Journals (Sweden)

    Jiangtao Liu

    2016-03-01

    Full Text Available Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS. We further identified 1663 differentially-expressed genes (DEGs between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO, using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including “oxidation reduction”, “transcription factor activity”, and “ion channel transporter”. Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress.

  3. De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube.

    Science.gov (United States)

    Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2016-01-01

    In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.

  4. Transcriptome survey of Patagonian southern beech Nothofagus nervosa (= N. Alpina: assembly, annotation and molecular marker discovery

    Directory of Open Access Journals (Sweden)

    Torales Susana L

    2012-07-01

    Full Text Available Abstract Background Nothofagus nervosa is one of the most emblematic native tree species of Patagonian temperate forests. Here, the shotgun RNA-sequencing (RNA-Seq of the transcriptome of N. nervosa, including de novo assembly, functional annotation, and in silico discovery of potential molecular markers to support population and associations genetic studies, are described. Results Pyrosequencing of a young leaf cDNA library generated a total of 111,814 high quality reads, with an average length of 447 bp. De novo assembly using Newbler resulted into 3,005 tentative isotigs (including alternative transcripts. The non-assembled sequences (singletons were clustered with CD-HIT-454 to identify natural and artificial duplicates from pyrosequencing reads, leading to 21,881 unique singletons. 15,497 out of 24,886 non-redundant sequences or unigenes, were successfully annotated against a plant protein database. A substantial number of simple sequence repeat markers (SSRs were discovered in the assembled and annotated sequences. More than 40% of the SSR sequences were inside ORF sequences. To confirm the validity of these predicted markers, a subset of 73 SSRs selected through functional annotation evidences were successfully amplified from six seedlings DNA samples, being 14 polymorphic. Conclusions This paper is the first report that shows a highly precise representation of the mRNAs diversity present in young leaves of a native South American tree, N. nervosa, as well as its in silico deduced putative functionality. The reported Nothofagus transcriptome sequences represent a unique resource for genetic studies and provide a tool to discover genes of interest and genetic markers that will greatly aid questions involving evolution, ecology, and conservation using genetic and genomic approaches in the genus.

  5. De novo transcriptome sequencing of Isaria cateniannulata and comparative analysis of gene expression in response to heat and cold stresses.

    Directory of Open Access Journals (Sweden)

    Dingfeng Wang

    Full Text Available Isaria cateniannulata is a very important and virulent entomopathogenic fungus that infects many insect pest species. Although I. cateniannulata is commonly exposed to extreme environmental temperature conditions, little is known about its molecular response mechanism to temperature stress. Here, we sequenced and de novo assembled the transcriptome of I. cateniannulata in response to high and low temperature stresses using Illumina RNA-Seq technology. Our assembly encompassed 17,514 unigenes (mean length = 1,197 bp, in which 11,445 unigenes (65.34% showed significant similarities to known sequences in NCBI non-redundant protein sequences (Nr database. Using digital gene expression analysis, 4,483 differentially expressed genes (DEGs were identified after heat treatment, including 2,905 up-regulated genes and 1,578 down-regulated genes. Under cold stress, 1,927 DEGs were identified, including 1,245 up-regulated genes and 682 down-regulated genes. The expression patterns of 18 randomly selected candidate DEGs resulting from quantitative real-time PCR (qRT-PCR were consistent with their transcriptome analysis results. Although DEGs were involved in many pathways, we focused on the genes that were involved in endocytosis: In heat stress, the pathway of clathrin-dependent endocytosis (CDE was active; however at low temperature stresses, the pathway of clathrin-independent endocytosis (CIE was active. Besides, four categories of DEGs acting as temperature sensors were observed, including cell-wall-major-components-metabolism-related (CWMCMR genes, heat shock protein (Hsp genes, intracellular-compatible-solutes-metabolism-related (ICSMR genes and glutathione S-transferase (GST. These results enhance our understanding of the molecular mechanisms of I. cateniannulata in response to temperature stresses and provide a valuable resource for the future investigations.

  6. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high...

  7. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  8. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

    Science.gov (United States)

    Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

    2011-03-24

    The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This

  9. Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Cannon Charles H

    2011-07-01

    Full Text Available Abstract Background Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168 and one legume-specific family (miR2086. Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs in the transcriptomes of A. auriculiformis and A. mangium

  10. The de novo transcriptome and its analysis in the worldwide vegetable pest, Delia antiqua (Diptera: Anthomyiidae).

    Science.gov (United States)

    Zhang, Yu-Juan; Hao, Youjin; Si, Fengling; Ren, Shuang; Hu, Ganyu; Shen, Li; Chen, Bin

    2014-03-10

    The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, diapause, and further pest control of D. antiqua. Copyright © 2014 Zhang et al.

  11. De novo transcriptome sequencing and comparative analysis of differentially expressed genes in dryoperis fragrans under temperature stress

    International Nuclear Information System (INIS)

    Wang, W.Z.; Tong, W.S.; Gao, R.

    2016-01-01

    Dryopteris fragrans is a species of fern and contains flavonoids compounds with medicinal value. This study explain the temperature stress impact flavonoids synthesis in D. fragrans tissue culture seedlings under the low temperature at 4 degree C, high temperature at 35 degree C and moderate temperature at 25 degree C. By using Illumina HiSeq 2000 sequencing, 80.9 million raw sequence reads were de novo assembled into 66,716 non-redundant unigenes. 38,486 unigenes (57.7%) were annotated for their function. 13,973 unigenes and 29,598 unigenes were allocated to gene ontology (GO) and clusters of orthologous group (COG), respectively. 18,989 sequences mapped to 118 Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 204 genes were involved in flavonoid biosynthesis, regulation and transport. 25,292 and 16,817 unigenes exhibited marked differential expression in response to temperature shifts of 25 degree C to 4 degree C and 25 degree C to 35 degree C, respectively. 4CL and CHS genes involved in flavonoid biosynthesis were tested and suggested that they were responsible for biosynthesis of flavonoids. This study provides the first published data to describe the D. fragrans transcriptome and should accelerate understanding of flavonoids biosynthesis, regulation and transport mechanisms. Since most unigenes described here were successfully annotated, these results should facilitate future functional genomic understanding and research of D. fragrans. (author)

  12. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  13. De novo characterization of fall dormant and nondormant alfalfa (Medicago sativa L.) leaf transcriptome and identification of candidate genes related to fall dormancy.

    Science.gov (United States)

    Zhang, Senhao; Shi, Yinghua; Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

    2015-01-01

    Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa.

  14. De novo transcriptome sequencing and analysis of the juvenile and adult stages of Fasciola gigantica.

    Science.gov (United States)

    Zhang, Xiao-Xuan; Cong, Wei; Elsheikha, Hany M; Liu, Guo-Hua; Ma, Jian-Gang; Huang, Wei-Yi; Zhao, Quan; Zhu, Xing-Quan

    2017-07-01

    Fasciola gigantica is regarded as the major liver fluke causing fasciolosis in livestock in tropical countries. Despite the significant economic and public health impacts of F. gigantica there are few studies on the pathogenesis of this parasite and our understanding is further limited by the lack of genome and transcriptome information. In this study, de novo Illumina RNA sequencing (RNA-seq) was performed to obtain a comprehensive transcriptome profile of the juvenile (42days post infection) and adult stages of F. gigantica. A total of 49,720 unigenes were produced from juvenile and adult stages of F. gigantica, with an average length of 1286 nucleotides (nt) and N50 of 2076nt. A total of 27,862 (56.03%) unigenes were annotated by BLAST similarity searches against the NCBI non-redundant protein database. Because F. gigantica needs to feed and/or digest host tissues, some proteases (including cysteine proteases and aspartic proteases), which play a role in the degradation of host tissues (protein), have been paid more attention in the present study. A total of 6511 distinct genes were found differentially expressed between juveniles and adults, of which 3993 genes were up-regulated and 2518 genes were down-regulated in adults versus juveniles, respectively. Moreover, stage-specific differentially expressed genes were identified in juvenile (17,009) and adult (6517) F. gigantica. The significantly divergent pathways of differentially expressed genes included cAMP signaling pathway (226; 4.12%), proteoglycans in cancer (256; 4.67%) and focal adhesion (199; 3.63%). The transcription pattern also revealed two egg-laying-associated pathways: cGMP-PKG signaling pathway and TGF-β signaling pathway. This study provides the first comparative transcriptomic data concerning juvenile and adult stages of F. gigantica that will be of great value for future research efforts into understanding parasite pathogenesis and developing vaccines against this important parasite

  15. Comparative transcriptome sequencing and de novo analysis of Vaccinium corymbosum during fruit and color development.

    Science.gov (United States)

    Li, Lingli; Zhang, Hehua; Liu, Zhongshuai; Cui, Xiaoyue; Zhang, Tong; Li, Yanfang; Zhang, Lingyun

    2016-10-12

    Blueberry is an economically important fruit crop in Ericaceae family. The substantial quantities of flavonoids in blueberry have been implicated in a broad range of health benefits. However, the information regarding fruit development and flavonoid metabolites based on the transcriptome level is still limited. In the present study, the transcriptome and gene expression profiling over berry development, especially during color development were initiated. A total of approximately 13.67 Gbp of data were obtained and assembled into 186,962 transcripts and 80,836 unigenes from three stages of blueberry fruit and color development. A large number of simple sequence repeats (SSRs) and candidate genes, which are potentially involved in plant development, metabolic and hormone pathways, were identified. A total of 6429 sequences containing 8796 SSRs were characterized from 15,457 unigenes and 1763 unigenes contained more than one SSR. The expression profiles of key genes involved in anthocyanin biosynthesis were also studied. In addition, a comparison between our dataset and other published results was carried out. Our high quality reads produced in this study are an important advancement and provide a new resource for the interpretation of high-throughput data for blueberry species whether regarding sequencing data depth or species extension. The use of this transcriptome data will serve as a valuable public information database for the studies of blueberry genome and would greatly boost the research of fruit and color development, flavonoid metabolisms and regulation and breeding of more healthful blueberries.

  16. Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species

    Directory of Open Access Journals (Sweden)

    Hornett Emily A

    2012-08-01

    Full Text Available Abstract Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.

  17. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx

    Directory of Open Access Journals (Sweden)

    Colbourne John K

    2009-05-01

    Full Text Available Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing using 454, as well as strategies for assembling a useful catalog of genes from the output. We have applied these methods to sequence the transcriptome of planulae larvae from the coral Acropora millepora. Results More than 600,000 reads produced in a single 454 sequencing run were assembled into ~40,000 contigs with five-fold average sequencing coverage. Based on sequence similarity with known proteins, these analyses identified ~11,000 different genes expressed in a range of conditions including thermal stress and settlement induction. Assembled sequences were annotated with gene names, conserved domains, and Gene Ontology terms. Targeted searches using these annotations identified the majority of genes associated with essential metabolic pathways and conserved signaling pathways, as well as novel candidate genes for stress-related processes. Comparisons with the genome of the anemone Nematostella vectensis revealed ~8,500 pairs of orthologs and ~100 candidate coral-specific genes. More than 30,000 SNPs were detected in the coral sequences, and a subset of these validated by re-sequencing. Conclusion The methods described here for deep sequencing of the transcriptome should be widely applicable to generate catalogs of genes and genetic markers in emerging model organisms. Our data provide the most comprehensive sequence resource currently available for reef-building corals, and include an extensive collection of potential genetic markers for association and

  18. De novo transcriptome characterization and gene expression profiling of the desiccation tolerant moss Bryum argenteum following rehydration.

    Science.gov (United States)

    Gao, Bei; Zhang, Daoyuan; Li, Xiaoshuang; Yang, Honglan; Zhang, Yuanming; Wood, Andrew J

    2015-05-28

    The desiccation-tolerant moss Bryum argenteum is an important component of the Biological Soil Crusts (BSCs) found in the Gurbantunggut desert. Desiccation tolerance is defined as the ability to revive from the air dried state. To elucidate the molecular mechanisms related to desiccation tolerance, we employed RNA-Seq and digital gene expression (DGE) technologies to study the genome-wide expression profiles of the dehydration and rehydration processes in this important desert plant. We applied a two-step approach to investigate the gene expression profile upon rehydration in the moss Bryum argenteum using Illumina HiSeq2000 sequencing platform. First, a total of 57,247 transcript assembly contigs (TACs) were obtained from 54.79 million reads by de novo assembly, with an average length of 863 bp and N50 of 1,372 bp. Among the reconstructed TACs, 36,916 (64.5%) revealed similarity with existing protein sequences in the public databases. 23,509 and 21,607 TACs were assigned GO and KEGG annotation information, respectively. Second, samples were taken from 3 hydration stages: desiccated (Dry), rehydrated 2 h (R2) and rehydrated 24 h (R24), and DEG libraries were constructed for Differentially Expressed Genes (DEGs) discovery. 4,081 and 6,709 DEGs were identified in R2 and R24, compared with Dry, respectively. Compared to the desiccated sample, up-regulated genes after two hours of hydration are primarily related to stress responses. GO function enrichment network, EKGG metabolic pathway and MapMan analysis supports the idea of the rapid recovery of photosynthesis after 24 h of rehydration. We identified 770 transcription factors (TFs) which were classified into 50 TF families. 142 TF transcripts were up-regulated upon rehydration including 23 members of the ERF family. In this study, we constructed a pioneering, high-quality reference transcriptome in B. argenteum and generated three DGE libraries to elucidate the changes of gene expression upon rehydration. Expression

  19. Improved de novo genomic assembly for the domestic donkey

    DEFF Research Database (Denmark)

    Renaud, Gabriel; Petersen, Bent; Seguin-Orlando, Andaine

    2018-01-01

    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better......-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation...... and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome...

  20. De novo transcriptome sequencing and comparative analysis of midgut tissues of four non-model insects pertaining to Hemiptera, Coleoptera, Diptera and Lepidoptera.

    Science.gov (United States)

    Gazara, Rajesh K; Cardoso, Christiane; Bellieny-Rabelo, Daniel; Ferreira, Clélia; Terra, Walter R; Venancio, Thiago M

    2017-09-05

    Despite the great morphological diversity of insects, there is a regularity in their digestive functions, which is apparently related to their physiology. In the present work we report the de novo midgut transcriptomes of four non-model insects from four distinct orders: Spodoptera frugiperda (Lepidoptera), Musca domestica (Diptera), Tenebrio molitor (Coleoptera) and Dysdercus peruvianus (Hemiptera). We employed a computational strategy to merge assemblies obtained with two different algorithms, which substantially increased the quality of the final transcriptomes. Unigenes were annotated and analyzed using the eggNOG database, which allowed us to assign some level of functional and evolutionary information to 79.7% to 93.1% of the transcriptomes. We found interesting transcriptional patterns, such as: i) the intense use of lysozymes in digestive functions of M. domestica larvae, which are streamlined and adapted to feed on bacteria; ii) the up-regulation of orthologous UDP-glycosyl transferase and cytochrome P450 genes in the whole midguts different species, supporting the existence of an ancient defense frontline to counter xenobiotics; iii) evidence supporting roles for juvenile hormone binding proteins in the midgut physiology, probably as a way to activate genes that help fight anti-nutritional substances (e.g. protease inhibitors). The results presented here shed light on the digestive and structural properties of the digestive systems of these distantly related species. Furthermore, the produced datasets will also be useful for scientists studying these insects. Copyright © 2017. Published by Elsevier B.V.

  1. Transcriptome sequencing and de novo analysis of a cytoplasmic male sterile line and its near-isogenic restorer line in chili pepper (Capsicum annuum L..

    Directory of Open Access Journals (Sweden)

    Chen Liu

    Full Text Available BACKGROUND: The use of cytoplasmic male sterility (CMS in F1 hybrid seed production of chili pepper is increasingly popular. However, the molecular mechanisms of cytoplasmic male sterility and fertility restoration remain poorly understood due to limited transcriptomic and genomic data. Therefore, we analyzed the difference between a CMS line 121A and its near-isogenic restorer line 121C in transcriptome level using next generation sequencing technology (NGS, aiming to find out critical genes and pathways associated with the male sterility. RESULTS: We generated approximately 53 million sequencing reads and assembled de novo, yielding 85,144 high quality unigenes with an average length of 643 bp. Among these unigenes, 27,191 were identified as putative homologs of annotated sequences in the public protein databases, 4,326 and 7,061 unigenes were found to be highly abundant in lines 121A and 121C, respectively. Many of the differentially expressed unigenes represent a set of potential candidate genes associated with the formation or abortion of pollen. CONCLUSIONS: Our study profiled anther transcriptomes of a chili pepper CMS line and its restorer line. The results shed the lights on the occurrence and recovery of the disturbances in nuclear-mitochondrial interaction and provide clues for further investigations.

  2. Meta-IDBA: a de Novo assembler for metagenomic data.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

    2011-07-01

    Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated. We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy. Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba. chin@cs.hku.hk.

  3. Identifying wrong assemblies in de novo short read primary

    Indian Academy of Sciences (India)

    Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembledprimary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simpleunsupervised learning-based novel approach to identify mis-assemblies in the contigs which was ...

  4. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  5. Improved de novo genomic assembly for the domestic donkey

    Science.gov (United States)

    Newton, Richard; Paillot, Romain; Bryant, Neil; Vaudin, Mark

    2018-01-01

    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome-wide and locally, and to detect runs of homozygosity potentially pertaining to positive selection in domestic donkeys. Finally, this new assembly allowed us to identify fine-scale chromosomal rearrangements between the horse and the donkey that likely played an active role in their divergence and, ultimately, speciation. PMID:29740610

  6. Improved de novo genomic assembly for the domestic donkey.

    Science.gov (United States)

    Renaud, Gabriel; Petersen, Bent; Seguin-Orlando, Andaine; Bertelsen, Mads Frost; Waller, Andrew; Newton, Richard; Paillot, Romain; Bryant, Neil; Vaudin, Mark; Librado, Pablo; Orlando, Ludovic

    2018-04-01

    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome-wide and locally, and to detect runs of homozygosity potentially pertaining to positive selection in domestic donkeys. Finally, this new assembly allowed us to identify fine-scale chromosomal rearrangements between the horse and the donkey that likely played an active role in their divergence and, ultimately, speciation.

  7. NxRepair: error correction in de novo sequence assembly using Nextera mate pairs

    Directory of Open Access Journals (Sweden)

    Rebecca R. Murphy

    2015-06-01

    Full Text Available Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors. We show that NxRepair can identify and correct large scaffolding errors, without use of a reference sequence, resulting in quantitative improvements in the assembly quality. NxRepair can be downloaded from GitHub or PyPI, the Python Package Index; a tutorial and user documentation are also available.

  8. Developmental Transcriptome for a Facultatively Eusocial Bee, Megalopta genalis

    OpenAIRE

    Jones, Beryl M.; Wcislo, William T.; Robinson, Gene E.

    2015-01-01

    Transcriptomes provide excellent foundational resources for mechanistic and evolutionary analyses of complex traits. We present a developmental transcriptome for the facultatively eusocial bee Megalopta genalis, which represents a potential transition point in the evolution of eusociality. A de novo transcriptome assembly of Megalopta genalis was generated using paired-end Illumina sequencing and the Trinity assembler. Males and females of all life stages were aligned to this transcriptome fo...

  9. De novo assembly of human genomes with massively parallel short read sequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue

    2010-01-01

    genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....

  10. Long-read sequencing and de novo assembly of a Chinese genome

    Science.gov (United States)

    Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...

  11. Norgal: Extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

    DEFF Research Database (Denmark)

    Al-Nakeeb, Kosai Ali Ahmed; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-01-01

    and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences...

  12. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

    Directory of Open Access Journals (Sweden)

    Garzón-Martínez Gina A

    2012-04-01

    Full Text Available Abstract Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs, using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato and Solanum tuberosum (potato. We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the

  13. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction.

    Science.gov (United States)

    Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo

    2012-04-25

    Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S

  14. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data.

    Science.gov (United States)

    Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-11-21

    Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .

  15. Comparison of de novo assembly statistics of Cucumis sativus L.

    Science.gov (United States)

    Wojcieszek, Michał; Kuśmirek, Wiktor; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Genome sequencing is the core of genomic research. With the development of NGS and lowering the cost of procedure there is another tight gap - genome assembly. Developing the proper tool for this task is essential as quality of genome has important impact on further research. Here we present comparison of several de Bruijn assemblers tested on C. sativus genomic reads. The assessment shows that newly developed software - dnaasm provides better results in terms of quantity and quality. The number of generated sequences is lower by 5 - 33% with even two fold higher N50. Quality check showed reliable results were generated by dnaasm. This provides us with very strong base for future genomic analysis.

  16. Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

    Science.gov (United States)

    Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2013-01-01

    Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799

  17. Dataset of the first transcriptome assembly of the tree crop “yerba mate” (Ilex paraguariensis and systematic characterization of protein coding genes

    Directory of Open Access Journals (Sweden)

    Patricia M. Aguilera

    2018-04-01

    Full Text Available This contribution contains data associated to the research article entitled “Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil. by NGS and de novo transcriptome assembly” (Debat et al., 2014 [1]. By means of a bioinformatic approach involving extensive NGS data analyses, we provide a resource encompassing the full transcriptome assembly of yerba mate, the first available reference for the Ilex L. genus. This dataset (Supplementary files 1 and 2 consolidates the transcriptome-wide assembled sequences of I. paraguariensis with further comprehensive annotation of the protein coding genes of yerba mate via the integration of Arabidopsis thaliana databases. The generated data is pivotal for the characterization of agronomical relevant genes in the tree crop yerba mate -a non-model species- and related taxa in Ilex. The raw sequencing data dissected here is available at DDBJ/ENA/GenBank (NCBI Resource Coordinators, 2016 [2] Sequence Read Archive (SRA under the accession SRP043293 and the assembled sequences have been deposited at the Transcriptome Shotgun Assembly Sequence Database (TSA under the accession GFHV00000000.

  18. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Science.gov (United States)

    Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-04-08

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for

  19. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Directory of Open Access Journals (Sweden)

    Minou Nowrousian

    2010-04-01

    Full Text Available Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data

  20. Assembly and Analysis of Differential Transcriptome Responses of Hevea brasiliensis on Interaction with Microcyclus ulei.

    Directory of Open Access Journals (Sweden)

    Uriel Alonso Hurtado Páez

    Full Text Available Natural rubber (Hevea brasiliensis is a tropical tree used commercially for the production of latex, from which 40,000 products are generated. The fungus Microcyclus ulei infects this tree, causing South American leaf blight (SALB disease. This disease causes developmental delays and significant crop losses, thereby decreasing the production of latex. Currently several groups are working on obtaining clones of rubber tree with durable resistance to SALB through the use of extensive molecular biology techniques. In this study, we used a secondary clone that was resistant to M. ulei isolate GCL012. This clone, FX 3864 was obtained by crossing between clones PB 86 and B 38 (H. brasiliensis x H. brasiliensis. RNA-Seq high-throughput sequencing technology was used to analyze the differential expression of the FX 3864 clone transcriptome at 0 and 48 h post infection (hpi with the M. ulei isolate GCL012. A total of 158,134,220 reads were assembled using the de novo assembly strategy to generate 90,775 contigs with an N50 of 1672. Using a reference-based assembly, 76,278 contigs were generated with an N50 of 1324. We identified 86 differentially expressed genes associated with the defense response of FX 3864 to GCL012. Seven putative genes members of the AP2/ERF ethylene (ET-dependent superfamily were found to be down-regulated. An increase in salicylic acid (SA was associated with the up-regulation of three genes involved in cell wall synthesis and remodeling, as well as in the down-regulation of the putative gene CPR5. The defense response of FX 3864 against the GCL012 isolate was associated with the antagonistic SA, ET and jasmonic acid (JA pathways. These responses are characteristic of plant resistance to biotrophic pathogens.

  1. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Wenyu Zhang

    Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.

  2. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq.

    Science.gov (United States)

    Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan

    2015-01-01

    Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal

  3. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq.

    Directory of Open Access Journals (Sweden)

    Shi-Weng Li

    Full Text Available Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77% were annotated using BLASTx. Among them, 28,225 (35.75% and 28,119 (35.62% unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2 during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3% with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles

  4. Transcriptome sequencing of different narrow-leafed lupin tissue types provides a comprehensive uni-gene assembly and extensive gene-based molecular markers

    Science.gov (United States)

    Kamphuis, Lars G; Hane, James K; Nelson, Matthew N; Gao, Lingling; Atkins, Craig A; Singh, Karam B

    2015-01-01

    Narrow-leafed lupin (NLL; Lupinus angustifolius L.) is an important grain legume crop that is valuable for sustainable farming and is becoming recognized as a human health food. NLL breeding is directed at improving grain production, disease resistance, drought tolerance and health benefits. However, genetic and genomic studies have been hindered by a lack of extensive genomic resources for the species. Here, the generation, de novo assembly and annotation of transcriptome datasets derived from five different NLL tissue types of the reference accession cv. Tanjil are described. The Tanjil transcriptome was compared to transcriptomes of an early domesticated cv. Unicrop, a wild accession P27255, as well as accession 83A:476, together being the founding parents of two recombinant inbred line (RIL) populations. In silico predictions for transcriptome-derived gene-based length and SNP polymorphic markers were conducted and corroborated using a survey assembly sequence for NLL cv. Tanjil. This yielded extensive indel and SNP polymorphic markers for the two RIL populations. A total of 335 transcriptome-derived markers and 66 BAC-end sequence-derived markers were evaluated, and 275 polymorphic markers were selected to genotype the reference NLL 83A:476 × P27255 RIL population. This significantly improved the completeness, marker density and quality of the reference NLL genetic map. PMID:25060816

  5. De novo Transcriptome Sequencing Reveals a Considerable Bias in the Incidence of Simple Sequence Repeats towards the Downstream of ‘Pre-miRNAs’ of Black Pepper

    Science.gov (United States)

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176

  6. Efficient assembly of de novo human artificial chromosomes from large genomic loci

    Directory of Open Access Journals (Sweden)

    Stromberg Gregory

    2005-07-01

    Full Text Available Abstract Background Human Artificial Chromosomes (HACs are potentially useful vectors for gene transfer studies and for functional annotation of the genome because of their suitability for cloning, manipulating and transferring large segments of the genome. However, development of HACs for the transfer of large genomic loci into mammalian cells has been limited by difficulties in manipulating high-molecular weight DNA, as well as by the low overall frequencies of de novo HAC formation. Indeed, to date, only a small number of large (>100 kb genomic loci have been reported to be successfully packaged into de novo HACs. Results We have developed novel methodologies to enable efficient assembly of HAC vectors containing any genomic locus of interest. We report here the creation of a novel, bimolecular system based on bacterial artificial chromosomes (BACs for the construction of HACs incorporating any defined genomic region. We have utilized this vector system to rapidly design, construct and validate multiple de novo HACs containing large (100–200 kb genomic loci including therapeutically significant genes for human growth hormone (HGH, polycystic kidney disease (PKD1 and ß-globin. We report significant differences in the ability of different genomic loci to support de novo HAC formation, suggesting possible effects of cis-acting genomic elements. Finally, as a proof of principle, we have observed sustained ß-globin gene expression from HACs incorporating the entire 200 kb ß-globin genomic locus for over 90 days in the absence of selection. Conclusion Taken together, these results are significant for the development of HAC vector technology, as they enable high-throughput assembly and functional validation of HACs containing any large genomic locus. We have evaluated the impact of different genomic loci on the frequency of HAC formation and identified segments of genomic DNA that appear to facilitate de novo HAC formation. These genomic loci

  7. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

    Science.gov (United States)

    Schneider, Valerie A; Graves-Lindsay, Tina; Howe, Kerstin; Bouk, Nathan; Chen, Hsiu-Chuan; Kitts, Paul A; Murphy, Terence D; Pruitt, Kim D; Thibaud-Nissen, Françoise; Albracht, Derek; Fulton, Robert S; Kremitzki, Milinn; Magrini, Vincent; Markovic, Chris; McGrath, Sean; Steinberg, Karyn Meltz; Auger, Kate; Chow, William; Collins, Joanna; Harden, Glenn; Hubbard, Timothy; Pelan, Sarah; Simpson, Jared T; Threadgold, Glen; Torrance, James; Wood, Jonathan M; Clarke, Laura; Koren, Sergey; Boitano, Matthew; Peluso, Paul; Li, Heng; Chin, Chen-Shan; Phillippy, Adam M; Durbin, Richard; Wilson, Richard K; Flicek, Paul; Eichler, Evan E; Church, Deanna M

    2017-05-01

    The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.

  8. De novo characterization of Larimichthys crocea transcriptome for growth-/immune-related gene identification and massive microsatellite (SSR) marker development

    Science.gov (United States)

    Han, Zhaofang; Xiao, Shijun; Liu, Xiande; Liu, Yang; Li, Jiakai; Xie, Yangjie; Wang, Zhiyong

    2017-03-01

    The large yellow croaker, Larimichthys crocea is an important marine fish in China with a high economic value. In the last decade, the stock conservation and aquaculture industry of this species have been facing severe challenges because of wild population collapse and degeneration of important economic traits. However, genes contributing to growth and immunity in L. crocea have not been thoroughly analyzed, and available molecular markers are still not sufficient for genetic resource management and molecular selection. In this work, we sequenced the transcriptome in L. crocea liver tissue with a Roche 454 sequencing platform and assembled the transcriptome into 93 801 transcripts. Of them, 38 856 transcripts were successfully annotated in nt, nr, Swiss-Prot, InterPro, COG, GO and KEGG databases. Based on the annotation information, 3 165 unigenes related to growth and immunity were identified. Additionally, a total of 6 391 simple sequence repeats (SSRs) were identified from the transcriptome, among which 4 498 SSRs had enough flanking regions to design primers for polymerase chain reactions (PCR). To access the polymorphism of these markers, 30 primer pairs were randomly selected for PCR amplification and validation in 30 individuals, and 12 primer pairs (40.0%) exhibited obvious length polymorphisms. This work applied RNA-Seq to assemble and analyze a live transcriptome in L. crocea. With gene annotation and sequence information, genes related to growth and immunity were identified and massive SSR markers were developed, providing valuable genetic resources for future gene functional analysis and selective breeding of L. crocea.

  9. De novo transcriptome analysis and molecular marker development of two Hemarthria species

    Directory of Open Access Journals (Sweden)

    Xiu eHuang

    2016-04-01

    Full Text Available Hemarthria R. Br. is an important genus of perennial forage grasses that is widely used in subtropical and tropical regions. Hemarthria grasses have made remarkable contributions to the development of animal husbandry and agro-ecosystem maintenance; however, there is currently a lack of comprehensive genomic data available for these species. In this study, we used Illumina high-throughput deep sequencing to characterize of two agriculturally important Hemarthria materials, H. compressa ‘Yaan’ and H. altissima ‘1110.’ Sequencing runs that used each of four normalized RNA samples from the leaves or roots of the two materials yielded more than 24 million high-quality reads. After de novo assembly, 137,142 and 77,150 unigenes were obtained for ‘Yaan’ and ‘1110’, respectively. In addition, a total of 86,731 ‘Yaan’ and 48,645 ‘1110’ unigenes were successfully annotated. After consolidating the unigenes for both materials, 42,646 high-quality SNPs were identified in 10,880 unigenes and 10,888 SSRs were identified in 8,330 unigenes. To validate the identified markers, high quality PCR primers were designed for both SNPs and SSRs. We randomly tested 16 of the SNP primers and 54 of the SSR primers and found that the majority of these primers successfully amplified the desired PCR product. In addition, high cross-species transferability (61.11%-87.04% of SSR markers was achieved for four other Poaceae species. The amount of RNA sequencing data that was generated for these two Hemarthria species greatly increases the amount of genomic information available for Hemarthria and the SSR and SNP markers identified in this study will facilitate further advancements in genetic and molecular studies of the Hemarthria genus.

  10. Evaluation of nine popular de novo assemblers in microbial genome assembly.

    Science.gov (United States)

    Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher

    2017-12-01

    Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.

  11. When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

    Science.gov (United States)

    Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J

    2015-09-15

    As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. The sequence and de novo assembly of the giant panda genome

    Science.gov (United States)

    Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2013-01-01

    Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809

  13. Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data.

    Directory of Open Access Journals (Sweden)

    Tsutomu Ikegami

    Full Text Available A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.

  14. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    Science.gov (United States)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  15. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    Energy Technology Data Exchange (ETDEWEB)

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-08-16

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  16. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Sakakibara, Yasumbumi

    2011-10-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  17. Novel transcriptome assembly and comparative toxicity pathway analysis in mahi-mahi (Coryphaena hippurus) embryos and larvae exposed to Deepwater Horizon oil

    Science.gov (United States)

    Xu, Elvis Genbo; Mager, Edward M.; Grosell, Martin; Hazard, E. Starr; Hardiman, Gary; Schlenk, Daniel

    2017-03-01

    The impacts of Deepwater Horizon (DWH) oil on morphology and function during embryonic development have been documented for a number of fish species, including the economically and ecologically important pelagic species, mahi-mahi (Coryphaena hippurus). However, further investigations on molecular events and pathways responsible for developmental toxicity have been largely restricted due to the limited molecular data available for this species. We sought to establish the de novo transcriptomic database from the embryos and larvae of mahi-mahi exposed to water accommodated fractions (HEWAFs) of two DWH oil types (weathered and source oil), in an effort to advance our understanding of the molecular aspects involved during specific toxicity responses. By high throughput sequencing (HTS), we obtained the first de novo transcriptome of mahi-mahi, with 60,842 assembled transcripts and 30,518 BLAST hits. Among them, 2,345 genes were significantly regulated in 96hpf larvae after exposure to weathered oil. With comparative analysis to a reference-transcriptome-guided approach on gene ontology and tox-pathways, we confirmed the novel approach effective for exploring tox-pathways in non-model species, and also identified a list of co-expressed genes as potential biomarkers which will provide information for the construction of an Adverse Outcome Pathway which could be useful in Ecological Risk Assessments.

  18. De novo assembly of plant body plan: a step ahead of Deadpool.

    Science.gov (United States)

    Kareem, Abdul; Radhakrishnan, Dhanya; Sondhi, Yash; Aiyaz, Mohammed; Roy, Merin V; Sugimoto, Kaoru; Prasad, Kalika

    2016-08-01

    While in the movie Deadpool it is possible for a human to recreate an arm from scratch, in reality plants can even surpass that. Not only can they regenerate lost parts, but also the whole plant body can be reborn from a few existing cells. Despite the decades old realization that plant cells possess the ability to regenerate a complete shoot and root system, it is only now that the underlying mechanisms are being unraveled. De novo plant regeneration involves the initiation of regenerative mass, acquisition of the pluripotent state, reconstitution of stem cells and assembly of regulatory interactions. Recent studies have furthered our understanding on the making of a complete plant system in the absence of embryonic positional cues. We review the recent studies probing the molecular mechanisms of de novo plant regeneration in response to external inductive cues and our current knowledge of direct reprogramming of root to shoot and vice versa. We further discuss how de novo regeneration can be exploited to meet the demands of green culture industries and to serve as a general model to address the fundamental questions of regeneration across the plant kingdom.

  19. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  20. De novo RNA-Seq based transcriptome analysis of Papiliotrema laurentii strain RY1 under nitrogen starvation.

    Science.gov (United States)

    Sarkar, Soumyadev; Chakravorty, Somnath; Mukherjee, Avishek; Bhattacharya, Debanjana; Bhattacharya, Semantee; Gachhui, Ratan

    2018-03-01

    Nitrogen is a key nutrient for all cell forms. Most organisms respond to nitrogen scarcity by slowing down their growth rate. On the contrary, our previous studies have shown that Papiliotrema laurentii strain RY1 has a robust growth under nitrogen starvation. To understand the global regulation that leads to such an extraordinary response, we undertook a de novo approach for transcriptome analysis of the yeast. Close to 33 million sequence reads of high quality for nitrogen limited and enriched condition were generated using Illumina NextSeq500. Trinity analysis and clustered transcripts annotation of the reads produced 17,611 unigenes, out of which 14,157 could be annotated. Gene Ontology term analysis generated 44.92% cellular component terms, 39.81% molecular function terms and 15.24% biological process terms. The most over represented pathways in general were translation, carbohydrate metabolism, amino acid metabolism, general metabolism, folding, sorting, degradation followed by transport and catabolism, nucleotide metabolism, replication and repair, transcription and lipid metabolism. A total of 4256 Single Sequence Repeats were identified. Differential gene expression analysis detected 996 P-significant transcripts to reveal transmembrane transport, lipid homeostasis, fatty acid catabolism and translation as the enriched terms which could be essential for Papiliotrema laurentii strain RY1 to adapt during nitrogen deprivation. Transcriptome data was validated by quantitative real-time PCR analysis of twelve transcripts. To the best of our knowledge, this is the first report of Papiliotrema laurentii strain RY1 transcriptome which would play a pivotal role in understanding the biochemistry of the yeast under acute nitrogen stress and this study would be encouraging to initiate extensive investigations into this Papiliotrema system. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Transcriptome

    Science.gov (United States)

    ... Also: Talking Glossary of Genetic Terms Definitions for genetic terms used on this page En Español: Transcriptoma Transcriptome What is a transcriptome? What can a transcriptome tell us? How can transcriptome data be used to explore gene function? What is ...

  2. Examination of Triacylglycerol Biosynthetic Pathways via De Novo Transcriptomic and Proteomic Analyses in an Unsequenced Microalga

    Science.gov (United States)

    2011-10-17

    and none of the TAG enzymatic components. Conversely , utilization of the C. vulgaris transcriptome as a search database allowed us to identify all...for conversion to biodiesel or renewable diesel and jet fuel [1,2,3]. Many of these species can also grow rapidly under a large range of environmental...overnight. Approximately 5 mg of dry biomass was suspended in chloroform-methanol (2:1, v/v), and glyceroli- pids were transesterified in HCl-methanol (5

  3. De novo sequencing and analysis of the Ulva linza transcriptome to discover putative mechanisms associated with its successful colonization of coastal ecosystems

    Directory of Open Access Journals (Sweden)

    Zhang Xiaowen

    2012-10-01

    Full Text Available Abstract Background The green algal genus Ulva Linnaeus (Ulvaceae, Ulvales, Chlorophyta is well known for its wide distribution in marine, freshwater, and brackish environments throughout the world. The Ulva species are also highly tolerant of variations in salinity, temperature, and irradiance and are the main cause of green tides, which can have deleterious ecological effects. However, limited genomic information is currently available in this non-model and ecologically important species. Ulva linza is a species that inhabits bedrock in the mid to low intertidal zone, and it is a major contributor to biofouling. Here, we presented the global characterization of the U. linza transcriptome using the Roche GS FLX Titanium platform, with the aim of uncovering the genomic mechanisms underlying rapid and successful colonization of the coastal ecosystems. Results De novo assembly of 382,884 reads generated 13,426 contigs with an average length of 1,000 bases. Contiguous sequences were further assembled into 10,784 isotigs with an average length of 1,515 bases. A total of 304,101 reads were nominally identified by BLAST; 4,368 isotigs were functionally annotated with 13,550 GO terms, and 2,404 isotigs having enzyme commission (EC numbers were assigned to 262 KEGG pathways. When compared with four other full sequenced green algae, 3,457 unique isotigs were found in U. linza and 18 conserved in land plants. In addition, a specific photoprotective mechanism based on both LhcSR and PsbS proteins and a C4-like carbon-concentrating mechanism were found, which may help U. linza survive stress conditions. At least 19 transporters for essential inorganic nutrients (i.e., nitrogen, phosphorus, and sulphur were responsible for its ability to take up inorganic nutrients, and at least 25 eukaryotic cytochrome P450s, which is a higher number than that found in other algae, may be related to their strong allelopathy. Multi-origination of the stress related proteins

  4. A hybrid reference-guided de novo assembly approach for generating Cyclospora mitochondrion genomes.

    Science.gov (United States)

    Gopinath, G R; Cinar, H N; Murphy, H R; Durigan, M; Almeria, M; Tall, B D; DaSilva, A J

    2018-01-01

    Cyclospora cayetanensis is a coccidian parasite associated with large and complex foodborne outbreaks worldwide. Linking samples from cyclosporiasis patients during foodborne outbreaks with suspected contaminated food sources, using conventional epidemiological methods, has been a persistent challenge. To address this issue, development of new methods based on potential genomically-derived markers for strain-level identification has been a priority for the food safety research community. The absence of reference genomes to identify nucleotide and structural variants with a high degree of confidence has limited the application of using sequencing data for source tracking during outbreak investigations. In this work, we determined the quality of a high resolution, curated, public mitochondrial genome assembly to be used as a reference genome by applying bioinformatic analyses. Using this reference genome, three new mitochondrial genome assemblies were built starting with metagenomic reads generated by sequencing DNA extracted from oocysts present in stool samples from cyclosporiasis patients. Nucleotide variants were identified in the new and other publicly available genomes in comparison with the mitochondrial reference genome. A consolidated workflow, presented here, to generate new mitochondrion genomes using our reference-guided de novo assembly approach could be useful in facilitating the generation of other mitochondrion sequences, and in their application for subtyping C. cayetanensis strains during foodborne outbreak investigations.

  5. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.

    Directory of Open Access Journals (Sweden)

    Mukesh Kumar

    Full Text Available The cereal cyst nematode (CCN, Heterodera avenae is a major pest of wheat (Triticum spp that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy revealed fewer glycoside hydrolases (GHs but more glycosyl transferases (GTs and carbohydrate esterases (CEs when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.

  6. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.

    Science.gov (United States)

    Kumar, Mukesh; Gantasala, Nagavara Prasad; Roychowdhury, Tanmoy; Thakur, Prasoon Kumar; Banakar, Prakash; Shukla, Rohit N; Jones, Michael G K; Rao, Uma

    2014-01-01

    The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.

  7. De novo centriole formation in human cells is error-prone and does not require SAS-6 self-assembly.

    Science.gov (United States)

    Wang, Won-Jing; Acehan, Devrim; Kao, Chien-Han; Jane, Wann-Neng; Uryu, Kunihiro; Tsou, Meng-Fu Bryan

    2015-11-26

    Vertebrate centrioles normally propagate through duplication, but in the absence of preexisting centrioles, de novo synthesis can occur. Consistently, centriole formation is thought to strictly rely on self-assembly, involving self-oligomerization of the centriolar protein SAS-6. Here, through reconstitution of de novo synthesis in human cells, we surprisingly found that normal looking centrioles capable of duplication and ciliation can arise in the absence of SAS-6 self-oligomerization. Moreover, whereas canonically duplicated centrioles always form correctly, de novo centrioles are prone to structural errors, even in the presence of SAS-6 self-oligomerization. These results indicate that centriole biogenesis does not strictly depend on SAS-6 self-assembly, and may require preexisting centrioles to ensure structural accuracy, fundamentally deviating from the current paradigm.

  8. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  9. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.

    Science.gov (United States)

    Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  10. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies

    Science.gov (United States)

    Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  11. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    Science.gov (United States)

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression

  12. De Novo Adult Transcriptomes of Two European Brittle Stars: Spotlight on Opsin-Based Photoreception.

    Directory of Open Access Journals (Sweden)

    Jérôme Delroisse

    Full Text Available Next generation sequencing (NGS technology allows to obtain a deeper and more complete view of transcriptomes. For non-model or emerging model marine organisms, NGS technologies offer a great opportunity for rapid access to genetic information. In this study, paired-end Illumina HiSeqTM technology has been employed to analyse transcriptomes from the arm tissues of two European brittle star species, Amphiura filiformis and Ophiopsila aranea. About 48 million Illumina reads were generated and 136,387 total unigenes were predicted from A. filiformis arm tissues. For O. aranea arm tissues, about 47 million reads were generated and 123,324 total unigenes were obtained. Twenty-four percent of the total unigenes from A. filiformis show significant matches with sequences present in reference online databases, whereas, for O. aranea, this percentage amounts to 23%. In both species, around 50% of the predicted annotated unigenes were significantly similar to transcripts from the purple sea urchin, the closest species to date that has undergone complete genome sequencing and annotation. GO, COG and KEGG analyses were performed on predicted brittle star unigenes. We focused our analyses on the phototransduction actors involved in light perception. Firstly, two new echinoderm opsins were identified in O. aranea: one rhabdomeric opsin (homologous to vertebrate melanopsin and one RGR opsin. The RGR-opsin is supposed to be involved in retinal regeneration while the r-opsin is suspected to play a role in visual-like behaviour. Secondly, potential phototransduction actors were identified in both transcriptomes using the fly (rhabdomeric and mammal (ciliary classical phototransduction pathways as references. Finally, the sensitivity of O.aranea to monochromatic light was investigated to complement data available for A. filiformis. The presence of microlens-like structures at the surface of dorsal arm plate of O. aranea could potentially explain phototactic

  13. De novo transcriptome sequencing and digital gene expression analysis predict biosynthetic pathway of rhynchophylline and isorhynchophylline from Uncaria rhynchophylla, a non-model plant with potent anti-alzheimer's properties.

    Science.gov (United States)

    Guo, Qianqian; Ma, Xiaojun; Wei, Shugen; Qiu, Deyou; Wilson, Iain W; Wu, Peng; Tang, Qi; Liu, Lijun; Dong, Shoukun; Zu, Wei

    2014-08-12

    The major medicinal alkaloids isolated from Uncaria rhynchophylla (gouteng in chinese) capsules are rhynchophylline (RIN) and isorhynchophylline (IRN). Extracts containing these terpene indole alkaloids (TIAs) can inhibit the formation and destabilize preformed fibrils of amyloid β protein (a pathological marker of Alzheimer's disease), and have been shown to improve the cognitive function of mice with Alzheimer-like symptoms. The biosynthetic pathways of RIN and IRN are largely unknown. In this study, RNA-sequencing of pooled Uncaria capsules RNA samples taken at three developmental stages that accumulate different amount of RIN and IRN was performed. More than 50 million high-quality reads from a cDNA library were generated and de novo assembled. Sequences for all of the known enzymes involved in TIAs synthesis were identified. Additionally, 193 cytochrome P450 (CYP450), 280 methyltransferase and 144 isomerase genes were identified, that are potential candidates for enzymes involved in RIN and IRN synthesis. Digital gene expression profile (DGE) analysis was performed on the three capsule developmental stages, and based on genes possessing expression profiles consistent with RIN and IRN levels; four CYP450s, three methyltransferases and three isomerases were identified as the candidates most likely to be involved in the later steps of RIN and IRN biosynthesis. A combination of de novo transcriptome assembly and DGE analysis was shown to be a powerful method for identifying genes encoding enzymes potentially involved in the biosynthesis of important secondary metabolites in a non-model plant. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the capsule extract from Uncaria, and provides information that may aid in metabolic engineering to increase yields of these important alkaloids.

  14. De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

    Science.gov (United States)

    Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

    2016-01-01

    Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.

  15. A de novo transcriptome of European pollen beetle populations and its analysis, with special reference to insecticide action and resistance.

    Science.gov (United States)

    Zimmer, C T; Maiwald, F; Schorn, C; Bass, C; Ott, M-C; Nauen, R

    2014-08-01

    The pollen beetle Meligethes aeneus is the most important coleopteran pest in European oilseed rape cultivation, annually infesting millions of hectares and responsible for substantial yield losses if not kept under economic damage thresholds. This species is primarily controlled with insecticides but has recently developed high levels of resistance to the pyrethroid class. The aim of the present study was to provide a transcriptomic resource to investigate mechanisms of resistance. cDNA was sequenced on both Roche (Indianapolis, IN, USA) and Illumina (LGC Genomics, Berlin, Germany) platforms, resulting in a total of ∼53 m reads which assembled into 43 396 expressed sequence tags (ESTs). Manual annotation revealed good coverage of genes encoding insecticide target sites and detoxification enzymes. A total of 77 nonredundant cytochrome P450 genes were identified. Mapping of Illumina RNAseq sequences (from susceptible and pyrethroid-resistant strains) against the reference transcriptome identified a cytochrome P450 (CYP6BQ23) as highly overexpressed in pyrethroid resistance strains. Single-nucleotide polymorphism analysis confirmed the presence of a target-site resistance mutation (L1014F) in the voltage-gated sodium channel of one resistant strain. Our results provide new insights into the important genes associated with pyrethroid resistance in M. aeneus. Furthermore, a comprehensive EST resource is provided for future studies on insecticide modes of action and resistance mechanisms in pollen beetle. © 2014 The Royal Entomological Society.

  16. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

    Science.gov (United States)

    Dudchenko, Olga; Batra, Sanjit S; Omer, Arina D; Nyquist, Sarah K; Hoeger, Marie; Durand, Neva C; Shamim, Muhammad S; Machol, Ido; Lander, Eric S; Aiden, Aviva Presser; Aiden, Erez Lieberman

    2017-04-07

    The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Ae aegypti and Culex quinquefasciatus , each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species. Copyright © 2017, American Association for the Advancement of Science.

  17. RNA-seq de novo Assembly Reveals Differential Gene Expression in Glossina palpalis gambiensis Infected with Trypanosoma brucei gambiense vs. Non-Infected and Self-Cured Flies.

    Science.gov (United States)

    Hamidou Soumana, Illiassou; Klopp, Christophe; Ravel, Sophie; Nabihoudine, Ibouniyamine; Tchicaya, Bernadette; Parrinello, Hugues; Abate, Luc; Rialle, Stéphanie; Geiger, Anne

    2015-01-01

    Trypanosoma brucei gambiense (Tbg), causing the sleeping sickness chronic form, completes its developmental cycle within the tsetse fly vector Glossina palpalis gambiensis (Gpg) before its transmission to humans. Within the framework of an anti-vector disease control strategy, a global gene expression profiling of trypanosome infected (susceptible), non-infected, and self-cured (refractory) tsetse flies was performed, on their midguts, to determine differential genes expression resulting from in vivo trypanosomes, tsetse flies (and their microbiome) interactions. An RNAseq de novo assembly was achieved. The assembled transcripts were mapped to reference sequences for functional annotation. Twenty-four percent of the 16,936 contigs could not be annotated, possibly representing untranslated mRNA regions, or Gpg- or Tbg-specific ORFs. The remaining contigs were classified into 65 functional groups. Only a few transposable elements were present in the Gpg midgut transcriptome, which may represent active transpositions and play regulatory roles. One thousand three hundred and seventy three genes differentially expressed (DEGs) between stimulated and non-stimulated flies were identified at day-3 post-feeding; 52 and 1025 between infected and self-cured flies at 10 and 20 days post-feeding, respectively. The possible roles of several DEGs regarding fly susceptibility and refractoriness are discussed. The results provide new means to decipher fly infection mechanisms, crucial to develop anti-vector control strategies.

  18. Transcriptome Assembly and Comparison of an Allotetraploid Weed Species, Annual Bluegrass, with its Two Diploid Progenitor Species, Poa supina Schrad and Poa infirma Kunth

    Directory of Open Access Journals (Sweden)

    Shu Chen

    2016-03-01

    Full Text Available Annual bluegrass ( L. is one of the most widespread weed species in this world. As a young allotetraploid, has occupied diverse environments from Antarctic area to subtropical regions. To unveil the evolutionary mystery behind ’s wide distribution, extensive adaptability and phenotypic plasticity needs collaboration from multiple research scopes from ecology and plant physiology to population genetics and molecular biology. However, the lack of omic data and reference has greatly hampered the study. This is the first comprehensive transcriptome study on species. Total RNA was extracted from and its two proposed diploid parents, Schrad and Kunth, and sequenced in Illumina Hiseq2000. Optimized, nonredundant transcriptome references were generated for each species using four de novo assemblers (Trinity, Velvet, SOAPdenovo, and CLC Genomics Workbench and a redundancy-reducing pipeline (CD-HIT-EST and EvidentialGene tr2aacds. Using the constructed transcriptomes together with sequencing reads, we found high similarity in nucleotide sequences and homeologous polymorphisms between and the two proposed parents. Comparison of chloroplast and mitochondrion genes further confirmed as the maternal parent. Less nucleotide percentage differences were observed between and homeologs than between and homeologs, indicating a higher nucleotide substitution rates in homeologs than in homeologs. Gene ontology (GO enrichment analysis suggested the more compatible cytoplasmic environment and cellular apparatus for homeologs as the major cause for this phenomenon.

  19. De novo transcriptome sequencing of two cultivated jute species under salinity stress.

    Directory of Open Access Journals (Sweden)

    Zemao Yang

    Full Text Available Soil salinity, a major environmental stress, reduces agricultural productivity by restricting plant development and growth. Jute (Corchorus spp., a commercially important bast fiber crop, includes two commercially cultivated species, Corchorus capsularis and Corchorus olitorius. We conducted high-throughput transcriptome sequencing of 24 C. capsularis and C. olitorius samples under salt stress and found 127 common differentially expressed genes (DEGs; additionally, 4489 and 492 common DEGs were identified in the root and leaf tissues, respectively, of both Corchorus species. Further, 32, 196, and 11 common differentially expressed transcription factors (DTFs were detected in the leaf, root, or both tissues, respectively. Several Gene Ontology (GO terms were enriched in NY and YY. A Kyoto Encyclopedia of Genes and Genomes analysis revealed numerous DEGs in both species. Abscisic acid and cytokinin signal pathways enriched respectively about 20 DEGs in leaves and roots of both NY and YY. The Ca2+, mitogen-activated protein kinase signaling and oxidative phosphorylation pathways were also found to be related to the plant response to salt stress, as evidenced by the DEGs in the roots of both species. These results provide insight into salt stress response mechanisms in plants as well as a basis for future breeding of salt-tolerant cultivars.

  20. De novo transcriptome analyses of host-fungal interactions in oil palm (Elaeis guineensis Jacq.).

    Science.gov (United States)

    Ho, Chai-Ling; Tan, Yung-Chie; Yeoh, Keat-Ai; Ghazali, Ahmad-Kamal; Yee, Wai-Yan; Hoh, Chee-Choong

    2016-01-19

    Basal stem rot (BSR) is a fungal disease in oil palm (Elaeis guineensis Jacq.) which is caused by hemibiotrophic white rot fungi belonging to the Ganoderma genus. Molecular responses of oil palm to these pathogens are not well known although this information is crucial to strategize effective measures to eradicate BSR. In order to elucidate the molecular interactions between oil palm and G. boninense and its biocontrol fungus Trichoderma harzianum, we compared the root transcriptomes of untreated oil palm seedlings with those inoculated with G. boninense and T. harzianum, respectively. Differential gene expression analyses revealed that jasmonate (JA) and salicylate (SA) may act in an antagonistic manner in affecting the hormone biosynthesis, signaling, and downstream defense responses in G. boninense-treated oil palm roots. In addition, G. boninense may compete with the host to control disease symptom through the transcriptional regulation of ethylene (ET) biosynthesis, reactive oxygen species (ROS) production and scavenging. The strengthening of host cell walls and production of pathogenesis-related proteins as well as antifungal secondary metabolites in host plants, are among the important defense mechanisms deployed by oil palm against G. boninense. Meanwhile, endophytic T. harzianum was shown to improve the of nutrition status and nutrient transportation in host plants. The findings of this analysis have enhanced our understanding on the molecular interactions of G. boninense and oil palm, and also the biocontrol mechanisms involving T. harzianum, thus contributing to future formulations of better strategies for prevention and treatment of BSR.

  1. De novo transcriptome analysis in Dendrobium and identification of critical genes associated with flowering.

    Science.gov (United States)

    Chen, Yue; Shen, Qi; Lin, Renan; Zhao, Zhuangliu; Shen, Chenjia; Sun, Chongbo

    2017-10-01

    Artificial control of flowering time is pivotal for the ornamental value of orchids including the genus Dendrobium. Although various flowering pathways have been revealed in model plants, little information is available on the genetic regualtion of flowering in Dendrobium. To identify the critical genes associated with flowering, transcriptomes from four organs (leaf, root, stem and flower) of D. officinale were analyzed in our study. In total, 2645 flower-specific transcripts were identified. Functional annotation and classification suggested that several metabolic pathways, including four sugar-related pathways and two fatty acid-related pathways, were enriched. A total of 24 flowering-related transcripts were identified in D. officinale according to the similarities to their homologous genes from Arabidopsis, suggesting that most classical flowering pathways existed in D. officinale. Furthermore, phylogenetic analysis suggested that the FLOWERING LOCUS T homologs in orchids are highly conserved during evolution process. In addition, expression changes in nine randomly-selected critical flowering-related transcripts between the vegetative stage and reproductive stage were quantified by qRT-PCR analysis. Our study provided a number of candidate genes and sequence resources for investigating the mechanisms underlying the flowering process of the Dendrobium genus. Copyright © 2017. Published by Elsevier Masson SAS.

  2. Analysis Of Transcriptomes In A Porcine Tissue Collection Using RNA-Seq And Genome Assembly 10

    DEFF Research Database (Denmark)

    Hornshøj, Henrik; Thomsen, Bo; Hedegaard, Jakob

    2011-01-01

    The release of Sus scrofa genome assembly 10 supports improvement of the pig genome annotation and in depth transcriptome analyses using next-generation sequencing technologies. In this study we analyze RNA-seq reads from a tissue collection, including 10 separate tissues from Duroc boars and 10...... short read alignment software we mapped the reads to the genome assembly 10. We extracted contig sequences of gene transcripts using the Cufflinks software. Based on this information we identified expressed genes that are present in the genome assembly. The portion of these genes being previously known...... was roughly estimated by sequence comparison to known genes. Similarly, we searched for genes that are expressed in the tissues but not present in the genome assembly by aligning the non-genome-mapped reads to known gene transcripts. For the genes predicted to have alternative transcript variants by Cufflinks...

  3. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

    Science.gov (United States)

    Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza

    2013-10-01

    Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

  4. The de novo assembly of mitochondrial genomes of the extinct passenger pigeon (Ectopistes migratorius with next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Chih-Ming Hung

    Full Text Available The information from ancient DNA (aDNA provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome of two extinct passenger pigeons (Ectopistes migratorius using de novo assembly of massive short (90 bp, paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.

  5. The De Novo Assembly of Mitochondrial Genomes of the Extinct Passenger Pigeon (Ectopistes migratorius) with Next Generation Sequencing

    Science.gov (United States)

    Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien

    2013-01-01

    The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111

  6. De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits

    Science.gov (United States)

    Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng

    2017-01-01

    Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar ‘Fusi-3’. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1–6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism. PMID:29145430

  7. De Novo Transcriptome Sequencing in Passiflora edulis Sims to Identify Genes and Signaling Pathways Involved in Cold Tolerance

    Directory of Open Access Journals (Sweden)

    Sian Liu

    2017-11-01

    Full Text Available The passion fruit (Passiflora edulis Sims, also known as the purple granadilla, is widely cultivated as the new darling of the fruit market throughout southern China. This exotic and perennial climber is adapted to warm and humid climates, and thus is generally intolerant of cold. There is limited information about gene regulation and signaling pathways related to the cold stress response in this species. In this study, two transcriptome libraries (KEDU_AP vs. GX_AP were constructed from the aerial parts of cold-tolerant and cold-susceptible varieties of P. edulis, respectively. Overall, 126,284,018 clean reads were obtained, and 86,880 unigenes with a mean size of 1449 bp were assembled. Of these, there were 64,067 (73.74% unigenes with significant similarity to publicly available plant protein sequences. Expression profiles were generated, and 3045 genes were found to be significantly differentially expressed between the KEDU_AP and GX_AP libraries, including 1075 (35.3% up-regulated and 1970 (64.7% down-regulated. These included 36 genes in enriched pathways of plant hormone signal transduction, and 56 genes encoding putative transcription factors. Six genes involved in the ICE1–CBF–COR pathway were induced in the cold-tolerant variety, and their expression levels were further verified using quantitative real-time PCR. This report is the first to identify genes and signaling pathways involved in cold tolerance using high-throughput transcriptome sequencing in P. edulis. These findings may provide useful insights into the molecular mechanisms regulating cold tolerance and genetic breeding in Passiflora spp.

  8. De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits.

    Directory of Open Access Journals (Sweden)

    Haisheng Zhu

    Full Text Available Fresh-cut luffa (Luffa cylindrica fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar 'Fusi-3'. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1-6 h. Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism.

  9. A de novo transcriptome of the Asian tiger mosquito, Aedes albopictus, to identify candidate transcripts for diapause preparation

    Directory of Open Access Journals (Sweden)

    Poelchau Monica F

    2011-12-01

    Full Text Available Abstract Background Many temperate insects survive the harsh conditions of winter by undergoing photoperiodic diapause, a pre-programmed developmental arrest initiated by short day lengths. Despite the well-established ecological significance of photoperiodic diapause, the molecular basis of this crucial adaptation remains largely unresolved. The Asian tiger mosquito, Aedes albopictus (Skuse, represents an outstanding emerging model to investigate the molecular basis of photoperiodic diapause in a well-defined ecological and evolutionary context. Ae. albopictus is a medically significant vector and is currently considered the most invasive mosquito in the world. Traits related to diapause appear to be important factors contributing to the rapid spread of this mosquito. To generate novel sequence information for this species, as well as to discover transcripts involved in diapause preparation, we sequenced the transcriptome of Ae. albopictus oocytes destined to become diapausing or non-diapausing pharate larvae. Results 454 GS-FLX transcriptome sequencing yielded >1.1 million quality-filtered reads, which we assembled into 69,474 contigs (N50 = 1,009 bp. Our contig filtering approach, where we took advantage of strong sequence similarity to the fully sequenced genome of Aedes aegypti, as well as other reference organisms, resulted in 11,561 high-quality, conservative ESTs. Differential expression estimates based on normalized read counts revealed 57 genes with higher expression, and 257 with lower expression under diapause-inducing conditions. Analysis of expression by qPCR for 47 of these genes indicated a high correlation of expression levels between 454 sequence data and qPCR, but congruence of statistically significant differential expression was low. Seven genes identified as differentially expressed based on qPCR have putative functions that are consistent with the insect diapause syndrome; three genes have unknown function and represent

  10. CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.

    Science.gov (United States)

    Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun

    2012-09-15

    To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ liangc@muohio.edu or liangc.mu@gmail.com; glji@xmu.edu.cn Supplementary data are available at Bioinformatics online.

  11. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

    DEFF Research Database (Denmark)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo

    2012-01-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....

  12. Analysis of the leaf transcriptome of Musa acuminata during interaction with Mycosphaerella musicola: gene assembly, annotation and marker development.

    Science.gov (United States)

    Passos, Marco A N; de Cruz, Viviane Oliveira; Emediato, Flavia L; de Teixeira, Cristiane Camargo; Azevedo, Vânia C Rennó; Brasileiro, Ana C M; Amorim, Edson P; Ferreira, Claudia F; Martins, Natalia F; Togawa, Roberto C; Júnior, Georgios J Pappas; da Silva, Orzenil Bonfim; Miller, Robert N G

    2013-02-05

    Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases.Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. A large set

  13. Genomics of Compositae crops: reference transcriptome assemblies and evidence of hybridization with wild relatives.

    Science.gov (United States)

    Hodgins, Kathryn A; Lai, Zhao; Oliveira, Luiz O; Still, David W; Scascitelli, Moira; Barker, Michael S; Kane, Nolan C; Dempewolf, Hannes; Kozik, Alex; Kesseli, Richard V; Burke, John M; Michelmore, Richard W; Rieseberg, Loren H

    2014-01-01

    Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values. © 2013 John Wiley & Sons Ltd.

  14. Insights into the development and evolution of exaggerated traits using de novo transcriptomes of two species of horned scarab beetles.

    Directory of Open Access Journals (Sweden)

    Ian A Warren

    Full Text Available Scarab beetles exhibit an astonishing variety of rigid exo-skeletal outgrowths, known as "horns". These traits are often sexually dimorphic and vary dramatically across species in size, shape, location, and allometry with body size. In many species, the horn exhibits disproportionate growth resulting in an exaggerated allometric relationship with body size, as compared to other traits, such as wings, that grow proportionately with body size. Depending on the species, the smallest males either do not produce a horn at all, or they produce a disproportionately small horn for their body size. While the diversity of horn shapes and their behavioural ecology have been reasonably well studied, we know far less about the proximate mechanisms that regulate horn growth. Thus, using 454 pyrosequencing, we generated transcriptome profiles, during horn growth and development, in two different scarab beetle species: the Asian rhinoceros beetle, Trypoxylus dichotomus, and the dung beetle, Onthophagus nigriventris. We obtained over half a million reads for each species that were assembled into over 6,000 and 16,000 contigs respectively. We combined these data with previously published studies to look for signatures of molecular evolution. We found a small subset of genes with horn-biased expression showing evidence for recent positive selection, as is expected with sexual selection on horn size. We also found evidence of relaxed selection present in genes that demonstrated biased expression between horned and horn-less morphs, consistent with the theory of developmental decoupling of phenotypically plastic traits.

  15. Insights into the development and evolution of exaggerated traits using de novo transcriptomes of two species of horned scarab beetles.

    Science.gov (United States)

    Warren, Ian A; Vera, J Cristobal; Johns, Annika; Zinna, Robert; Marden, James H; Emlen, Douglas J; Dworkin, Ian; Lavine, Laura C

    2014-01-01

    Scarab beetles exhibit an astonishing variety of rigid exo-skeletal outgrowths, known as "horns". These traits are often sexually dimorphic and vary dramatically across species in size, shape, location, and allometry with body size. In many species, the horn exhibits disproportionate growth resulting in an exaggerated allometric relationship with body size, as compared to other traits, such as wings, that grow proportionately with body size. Depending on the species, the smallest males either do not produce a horn at all, or they produce a disproportionately small horn for their body size. While the diversity of horn shapes and their behavioural ecology have been reasonably well studied, we know far less about the proximate mechanisms that regulate horn growth. Thus, using 454 pyrosequencing, we generated transcriptome profiles, during horn growth and development, in two different scarab beetle species: the Asian rhinoceros beetle, Trypoxylus dichotomus, and the dung beetle, Onthophagus nigriventris. We obtained over half a million reads for each species that were assembled into over 6,000 and 16,000 contigs respectively. We combined these data with previously published studies to look for signatures of molecular evolution. We found a small subset of genes with horn-biased expression showing evidence for recent positive selection, as is expected with sexual selection on horn size. We also found evidence of relaxed selection present in genes that demonstrated biased expression between horned and horn-less morphs, consistent with the theory of developmental decoupling of phenotypically plastic traits.

  16. De novo assembly and transcriptome characterization of an Armillaria solidipes mycelial fan

    Science.gov (United States)

    Amy L. Ross-Davis; Jane E. Stewart; John W. Hanna; Mee-Sook Kim; Rich C. Cronn; Hardeep S. Rai; Bryce A. Richardson; Geral I. McDonald; Ned B. Klopfenstein

    2012-01-01

    Armillaria (Fr.) Staude is a widely distributed fungal genus comprising approximately 40 species (Volk and Burdsall 1995) that display diverse ecological behaviors ranging from beneficial saprobe to virulent pathogen. Armillaria solidipes (formerly A. ostoyae; Burdsall and Volk 2008; pending vote to conserve A. ostoyae; Redhead et al. 2011), one of the causal agents of...

  17. A Long-Read Transcriptome Assembly of Cotton (Gossypium hirsutum L. and Intraspecific Single Nucleotide Polymorphism Discovery

    Directory of Open Access Journals (Sweden)

    Hamid Ashrafi

    2015-07-01

    Full Text Available Upland cotton ( L. has a narrow germplasm base, which constrains marker development and hampers intraspecific breeding. A pressing need exists for high-throughput single nucleotide polymorphism (SNP markers that can be readily applied to germplasm in breeding and breeding-related research programs. Despite progress made in developing new sequencing technologies during the past decade, the cost of sequencing remains substantial when one is dealing with numerous samples and large genomes. Several strategies have been proposed to lower the cost of sequencing for multiple genotypes of large-genome species like cotton, such as transcriptome sequencing and reduced-representation DNA sequencing. This paper reports the development of a transcriptome assembly of the inbred line Texas Marker-1 (TM-1, a genetic standard for cotton, its usefulness as a reference for RNA sequencing (RNA-seq-based SNP identification, and the availability of transcriptome sequences of four other cotton cultivars. An assembly of TM-1 was made using Roche 454 transcriptome reads combined with an assembly of all available public expressed sequence tag (EST sequences of TM-1. The TM-1 assembly consists of 72,450 contigs with a total of 70 million bp. Functional predictions of the transcripts were estimated by alignment to selected protein databases. Transcriptome sequences of the five lines, including TM-1, were obtained using an Illumina Genome Analyzer-II, and the short reads were mapped to the TM-1 assembly to discover SNPs among the five lines. We identified >14,000 unfiltered allelic SNPs, of which ∼3,700 SNPs were retained for assay development after applying several rigorous filters. This paper reports availability of the reference transcriptome assembly and shows its utility in developing intraspecific SNP markers in upland cotton.

  18. Comprehensive transcriptome assembly of Chickpea (Cicer arietinum L. using sanger and next generation sequencing platforms: development and applications.

    Directory of Open Access Journals (Sweden)

    Himabindu Kudapa

    Full Text Available A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinumTranscriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201, comprising 46,369 transcript assembly contigs (TACs has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8% of the TACs and gene ontology assignments were determined for 21,471 (46.3%. The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs and intron spanning regions (ISRs for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding

  19. Comparing genome guided assembly and phased variants based assembly approach to separate the homoeolog transcripts in tetraploid peanut (Arachis hypogaea L.)

    Science.gov (United States)

    Homoeologous copies of transcripts are abundant in many self-pollinating species including tetraploid peanut, and can impose a challenge to build a transcriptome reference without the merging of homoeologs. De novo transcriptome assembly of tetraploid OLin with single kmer and multiple kmer approach...

  20. Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae

    Science.gov (United States)

    Background: Bactrocera cucurbitae is an important agricultural pest. Basic genomic information is lacking for this species and this would be useful to inform methods of control, damage mitigation, and eradication efforts. Here, we have sequenced, assembled, and annotated a comprehensive transcriptom...

  1. De novo assembly of mitochondrial genomes provides insights into genetic diversity and molecular evolution in wild boars and domestic pigs.

    Science.gov (United States)

    Ni, Pan; Bhuiyan, Ali Akbar; Chen, Jian-Hai; Li, Jingjin; Zhang, Cheng; Zhao, Shuhong; Du, Xiaoyong; Li, Hua; Yu, Hui; Liu, Xiangdong; Li, Kui

    2018-05-10

    Up to date, the scarcity of publicly available complete mitochondrial sequences for European wild pigs hampers deeper understanding about the genetic changes following domestication. Here, we have assembled 26 de novo mtDNA sequences of European wild boars from next generation sequencing (NGS) data and downloaded 174 complete mtDNA sequences to assess the genetic relationship, nucleotide diversity, and selection. The Bayesian consensus tree reveals the clear divergence between the European and Asian clade and a very small portion (10 out of 200 samples) of maternal introgression. The overall nucleotides diversities of the mtDNA sequences have been reduced following domestication. Interestingly, the selection efficiencies in both European and Asian domestic pigs are reduced, probably caused by changes in both selection constraints and maternal population size following domestication. This study suggests that de novo assembled mitogenomes can be a great boon to uncover the genetic turnover following domestication. Further investigation is warranted to include more samples from the ever-increasing amounts of NGS data to help us to better understand the process of domestication.

  2. De novo transcriptome analysis and microsatellite marker development for population genetic study of a serious insect pest, Rhopalosiphum padi (L.) (Hemiptera: Aphididae).

    Science.gov (United States)

    Duan, Xinle; Wang, Kang; Su, Sha; Tian, Ruizheng; Li, Yuting; Chen, Maohua

    2017-01-01

    The bird cherry-oat aphid, Rhopalosiphum padi (L.), is one of the most abundant aphid pests of cereals and has a global distribution. Next-generation sequencing (NGS) is a rapid and efficient method for developing molecular markers. However, transcriptomic and genomic resources of R. padi have not been investigated. In this study, we used transcriptome information obtained by RNA-Seq to develop polymorphic microsatellites for investigating population genetics in this species. The transcriptome of R. padi was sequenced on an Illumina HiSeq 2000 platform. A total of 114.4 million raw reads with a GC content of 40.03% was generated. The raw reads were cleaned and assembled into 29,467 unigenes with an N50 length of 1,580 bp. Using several public databases, 82.47% of these unigenes were annotated. Of the annotated unigenes, 8,022 were assigned to COG pathways, 9,895 were assigned to GO pathways, and 14,586 were mapped to 257 KEGG pathways. A total of 7,936 potential microsatellites were identified in 5,564 unigenes, 60 of which were selected randomly and amplified using specific primer pairs. Fourteen loci were found to be polymorphic in the four R. padi populations. The transcriptomic data presented herein will facilitate gene discovery, gene analyses, and development of molecular markers for future studies of R. padi and other closely related aphid species.

  3. Brain transcriptome sequencing and assembly of three songbird model systems for the study of social behavior

    Directory of Open Access Journals (Sweden)

    Christopher N. Balakrishnan

    2014-05-01

    Full Text Available Emberizid sparrows (emberizidae have played a prominent role in the study of avian vocal communication and social behavior. We present here brain transcriptomes for three emberizid model systems, song sparrow Melospiza melodia, white-throated sparrow Zonotrichia albicollis, and Gambel’s white-crowned sparrow Zonotrichia leucophrys gambelii. Each of the assemblies covered fully or in part, over 89% of the previously annotated protein coding genes in the zebra finch Taeniopygia guttata, with 16,846, 15,805, and 16,646 unique BLAST hits in song, white-throated and white-crowned sparrows, respectively. As in previous studies, we find tissue of origin (auditory forebrain versus hypothalamus and whole brain as an important determinant of overall expression profile. We also demonstrate the successful isolation of RNA and RNA-sequencing from post-mortem samples from building strikes and suggest that such an approach could be useful when traditional sampling opportunities are limited. These transcriptomes will be an important resource for the study of social behavior in birds and for data driven annotation of forthcoming whole genome sequences for these and other bird species.

  4. Transcriptome of the Antarctic brooding gastropod mollusc Margarella antarctica.

    Science.gov (United States)

    Clark, Melody S; Thorne, Michael A S

    2015-12-01

    454 RNA-Seq transcriptome data were generated from foot tissue of the Antarctic brooding gastropod mollusc Margarella antarctica. A total of 6195 contigs were assembled de novo, providing a useful resource for researchers with an interest in Antarctic marine species, phylogenetics and mollusc biology, especially shell production. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  6. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  7. Combining de novo and reference-guided assembly with scaffold_builder

    NARCIS (Netherlands)

    Silva, G.G.; Dutilh, B.E.; Matthews, T.D.; Elkins, K.; Schmieder, R.; Dinsdale, E.A.; Edwards, R.A.

    2013-01-01

    Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter

  8. Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study.

    Science.gov (United States)

    Cerdeira, Louise Teixeira; Carneiro, Adriana Ribeiro; Ramos, Rommel Thiago Jucá; de Almeida, Sintia Silva; D'Afonseca, Vivian; Schneider, Maria Paula Cruz; Baumbach, Jan; Tauch, Andreas; McCulloch, John Anthony; Azevedo, Vasco Ariston Carvalho; Silva, Artur

    2011-08-01

    Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. Copyright © 2011 Elsevier B.V. All rights reserved.

  9. De novo assembly of plant body plan: a step ahead of Deadpool

    OpenAIRE

    Kareem, Abdul; Radhakrishnan, Dhanya; Sondhi, Yash; Aiyaz, Mohammed; Roy, Merin V.; Sugimoto, Kaoru; Prasad, Kalika

    2016-01-01

    Abstract While in the movie Deadpool it is possible for a human to recreate an arm from scratch, in reality plants can even surpass that. Not only can they regenerate lost parts, but also the whole plant body can be reborn from a few existing cells. Despite the decades old realization that plant cells possess the ability to regenerate a complete?shoot and root system, it is only now that the underlying mechanisms are being unraveled. De novo plant regeneration involves the initiation of regen...

  10. De novo design of an RNA tile that self-assembles into a homo-octameric nanoprism

    Science.gov (United States)

    Yu, Jinwen; Liu, Zhiyu; Jiang, Wen; Wang, Guansong; Mao, Chengde

    2015-01-01

    Rational, de novo design of RNA nanostructures can potentially integrate a wide array of structural and functional diversities. Such nanostructures have great promises in biomedical applications. Despite impressive progress in this field, all RNA building blocks (or tiles) reported so far are not geometrically well defined. They are generally flexible and can only assemble into a mixture of complexes with different sizes. To achieve defined structures, multiple tiles with different sequences are needed. In this study, we design an RNA tile that can homo-oligomerize into a uniform RNA nanostructure. The designed RNA nanostructure is characterized by gel electrophoresis, atomic force microscopy and cryogenic electron microscopy imaging. We believe that development along this line would help RNA nanotechnology to reach the structural control that is currently associated with DNA nanotechnology.

  11. SEQUENCING AND DE NOVO DRAFT ASSEMBLIES OF A FATHEAD MINNOW (Pimpehales promelas) reference genome

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset provides the URLs for accessing the genome sequence data and two draft assemblies as well as fathead minnow genotyping data associated with estimating...

  12. Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop.

    Science.gov (United States)

    Hatakeyama, Masaomi; Aluri, Sirisha; Balachadran, Mathi Thumilan; Sivarajan, Sajeevan Radha; Patrignani, Andrea; Grüter, Simon; Poveda, Lucy; Shimizu-Inatsugi, Rie; Baeten, John; Francoijs, Kees-Jan; Nataraja, Karaba N; Reddy, Yellodu A Nanja; Phadnis, Shamprasad; Ravikumar, Ramapura L; Schlapbach, Ralph; Sreeman, Sheshshayee M; Shimizu, Kentaro K

    2017-09-05

    Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length >2.6 Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  13. Meraculous: De Novo Genome Assembly with Short Paired-End Reads

    Energy Technology Data Exchange (ETDEWEB)

    Chapman, Jarrod A.; Ho, Isaac; Sunkara, Sirisha; Luo, Shujun; Schroth, Gary P.; Rokhsar, Daniel S.; Salzberg, Steven L.

    2011-08-18

    We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ~280 bp or ~3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.

  14. De novo assembly of a haplotype-resolved human genome

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang

    2015-01-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-...

  15. Sequencing and De novo Draft Assemblies of the Fathead Minnow (Pimphales promelas)Reference Genome

    Science.gov (United States)

    This study was undertaken to develop genome-scale resources for the fathead minnow (Pimphales promelas) an important model organism widely used in both aquatic ecotoxicology research and in regulatory toxicity testing. We report on the first sequencing and two draft assemblies fo...

  16. Construction and in vivo assembly of a catalytically proficient and hyperthermostable de novo enzyme.

    Science.gov (United States)

    Watkins, Daniel W; Jenkins, Jonathan M X; Grayson, Katie J; Wood, Nicola; Steventon, Jack W; Le Vay, Kristian K; Goodwin, Matthew I; Mullen, Anna S; Bailey, Henry J; Crump, Matthew P; MacMillan, Fraser; Mulholland, Adrian J; Cameron, Gus; Sessions, Richard B; Mann, Stephen; Anderson, J L Ross

    2017-08-25

    Although catalytic mechanisms in natural enzymes are well understood, achieving the diverse palette of reaction chemistries in re-engineered native proteins has proved challenging. Wholesale modification of natural enzymes is potentially compromised by their intrinsic complexity, which often obscures the underlying principles governing biocatalytic efficiency. The maquette approach can circumvent this complexity by combining a robust de novo designed chassis with a design process that avoids atomistic mimicry of natural proteins. Here, we apply this method to the construction of a highly efficient, promiscuous, and thermostable artificial enzyme that catalyzes a diverse array of substrate oxidations coupled to the reduction of H 2 O 2 . The maquette exhibits kinetics that match and even surpass those of certain natural peroxidases, retains its activity at elevated temperature and in the presence of organic solvents, and provides a simple platform for interrogating catalytic intermediates common to natural heme-containing enzymes.Catalytic mechanisms of enzymes are well understood, but achieving diverse reaction chemistries in re-engineered proteins can be difficult. Here the authors show a highly efficient and thermostable artificial enzyme that catalyzes a diverse array of substrate oxidations coupled to the reduction of H 2 O 2 .

  17. De novo characterization of the Iris lactea var. chinensis transcriptome and an analysis of genes under cadmium or lead exposure.

    Science.gov (United States)

    Gu, Chun-Sun; Liu, Liang-Qin; Deng, Yan-Ming; Zhang, Yong-Xia; Wang, Zhi-Quan; Yuan, Hai-Yan; Huang, Su-Zhen

    2017-10-01

    Iris lactea var. chinensis (I. lactea var. chinensis) is tolerant to accumulations of cadmium (Cd) and lead (Pb). In this study, the transcriptome of I. lactea var. chinensis was investigated under Cd or Pb stresses. Using the gene ontology database, 31,974 unigenes were classified into biological process, cellular component and molecular function. In total, 13,132 unigenes were involved in enriched Encyclopedia of Genes and Genomes (KEGG) metabolic pathways, and the expression levels of 5904 unigenes were significantly changed after exposure to Cd or Pb stresses. Of these, 974 were co-up-regulated and 1281 were co-down-regulated under the two stresses. The transcriptome expression profiles of I. lactea var. chinensis under Cd or Pb stresses obtained in this study provided a resource for identifying common mechanisms in the detoxification of different heavy metals. Furthermore, the identified unigenes may be used for the genetic breeding of heavy-metal tolerant plants. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. De Novo Transcriptome Analysis of Plant Pathogenic Fungus Myrothecium roridum and Identification of Genes Associated with Trichothecene Mycotoxin Biosynthesis

    Directory of Open Access Journals (Sweden)

    Wei Ye

    2017-02-01

    Full Text Available Myrothecium roridum is a plant pathogenic fungus that infects different crops and decreases the yield of economical crops, including soybean, cotton, corn, pepper, and tomato. Until now, the pathogenic mechanism of M. roridum has remained unclear. Different types of trichothecene mycotoxins were isolated from M. roridum, and trichothecene was considered as a plant pathogenic factor of M. roridum. In this study, the transcriptome of M. roridum in different incubation durations was sequenced using an Illumina Hiseq 2000. A total of 35,485 transcripts and 25,996 unigenes for M. roridum were obtained from 8.0 Gb clean reads. The protein–protein network of the M. roridum transcriptome indicated that the mitogen-activated protein kinases signal pathway also played an important role in the pathogenicity of M. roridum. The genes related to trichothecene biosynthesis were annotated. The expression levels of these genes were also predicted and validated through quantitative real-time polymerase chain reaction. Tri5 gene encoding trichodiene synthase was cloned and expressed, and the purified trichodiene synthase was able to catalyze farnesyl pyrophosphate into different kinds of sesquiterpenoids.Tri4 and Tri11 genes were expressed in Escherichia coli, and their corresponding enzymatic properties were characterized. The phylogenetic tree of trichodiene synthase showed a great discrepancy between the trichodiene synthase from M. roridum and other species. Our study on the genes related to trichothecene biosynthesis establishes a foundation for the M. roridum hazard prevention, thus improving the yields of economical crops.

  19. The dual role of fragments in fragment-assembly methods for de novo protein structure prediction

    Science.gov (United States)

    Handl, Julia; Knowles, Joshua; Vernon, Robert; Baker, David; Lovell, Simon C.

    2013-01-01

    In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta. PMID:22095594

  20. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery

    Directory of Open Access Journals (Sweden)

    Benkman Craig W

    2010-03-01

    Full Text Available Abstract Background Massively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp. have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda, genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies. Results We obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs. A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci

  1. A multi-platform draft de novo genome assembly and comparative analysis for the Scarlet Macaw (Ara macao.

    Directory of Open Access Journals (Sweden)

    Christopher M Seabury

    Full Text Available Data deposition to NCBI Genomes: This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly. The version described in this paper is the first version (AMXX01000000. The scaffolded assembly (SMACv1.1 has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000. Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw. Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb includes more than 997 Mb of unambiguous sequence data (excluding N's. Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7, which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity which were independently supported by the results of previous human GWAS

  2. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.

    Directory of Open Access Journals (Sweden)

    Carole F S Koning-Boucoiran

    2015-04-01

    Full Text Available In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array.Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L. genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  3. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.).

    Science.gov (United States)

    Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M

    2015-01-01

    In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  4. Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard Shinisaurus crocodilurus.

    Science.gov (United States)

    Gao, Jian; Li, Qiye; Wang, Zongji; Zhou, Yang; Martelli, Paolo; Li, Fang; Xiong, Zijun; Wang, Jian; Yang, Huanming; Zhang, Guojie

    2017-07-01

    The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction-duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution. © The Authors 2017. Published by Oxford University Press.

  5. De novo transcriptome sequencing and comparative analysis to discover genes involved in ovarian maturity in Strongylocentrotus nudus.

    Science.gov (United States)

    Jia, Zhiying; Wang, Qiai; Wu, Kaikai; Wei, Zhenlin; Zhou, Zunchun; Liu, Xiaolin

    2017-09-01

    Strongylocentrotus nudus is an edible sea urchin, mainly harvested in China. Correlation studies indicated that S. nudus with larger diameter have a prolonged marketing time and better palatability owing to their precocious gonads and extended maturation process. However, the molecular mechanism underlying this phenomenon is still unknown. Here, transcriptome sequencing was applied to study the ovaries of adult S. nudus with different shell diameters to explore the possible mechanism. In this study, four independent cDNA libraries were constructed, including two from the big size urchins and two from the small ones using a HiSeq™2500 platform. A total of 88,581 unigenes were acquired with a mean length of 1354bp, of which 66,331 (74.88%) unigenes could be annotated using six major publicly available databases. Comparative analysis revealed that 353 unigenes were differentially expressed (with log2(ratio)≥1, FDR≤0.001) between the two groups. Of these, 20 differentially expressed genes (DEGs) were selected to confirm the accuracy of RNA-seq data by quantitative real-time RT-PCR. Furthermore, gene ontology and KEGG pathway enrichment analyses were performed to find the putative genes and pathways related to ovarian maturity. Eight unigenes were identified as significant DEGs involved in reproduction related pathways; these included Mos, Cdc20, Rec8, YP30, cytochrome P450 2U1, ovoperoxidase, proteoliaisin, and rendezvin. Our research fills the gap in the studies on the S. nudus ovaries using transcriptome analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    Science.gov (United States)

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  7. De novo characterisation of the greenlip abalone transcriptome (Haliotis laevigata) with a focus on the heat shock protein 70 (HSP70) family.

    Science.gov (United States)

    Shiel, Brett P; Hall, Nathan E; Cooke, Ira R; Robinson, Nicholas A; Strugnell, Jan M

    2015-02-01

    Abalone (Haliotis) are economically important molluscs for fisheries and aquaculture industries worldwide. Despite this, genomic resources for abalone and molluscs are still limited. Here we present a description and functional annotation of the greenlip abalone (Haliotis laevigata) transcriptome. We present a focused analysis on the heat shock protein 70 (HSP70) family of genes with putative functions affecting temperature stress and immunity. A total of ~38 million paired end Illumina reads were obtained, resulting in a Trinity assembly of 222,172 contigs with minimum length of 200 base pairs and maximum length of 33 kilobases. The 20,702 contigs were annotated with gene descriptions by BLAST. We created a program to maximise the number of functionally annotated genes, and over 10,000 contigs were assigned Gene ontologies (GO terms). By using CateGOrizer, immunity related GO terms for stressors such as heat, hypoxia, oxidative stress and wounding received the highest counts. Twenty-six contigs with homology to the HSP70 family of genes were identified. Ninety-one putative single-nucleotide polymorphisms were observed in the abalone HSP70 contigs. Eleven of these were considered non-synonymous. The annotated transcriptome described in this study will be a useful basis for future work investigating the genetic response of abalone to stress.

  8. De Novo Transcriptome Analysis Shows That SAV-3 Infection Upregulates Pattern Recognition Receptors of the Endosomal Toll-Like and RIG-I-Like Receptor Signaling Pathways in Macrophage/Dendritic Like TO-Cells

    Directory of Open Access Journals (Sweden)

    Cheng Xu

    2016-04-01

    Full Text Available A fundamental step in cellular defense mechanisms is the recognition of “danger signals” made of conserved pathogen associated molecular patterns (PAMPs expressed by invading pathogens, by host cell germ line coded pattern recognition receptors (PRRs. In this study, we used RNA-seq and the Kyoto encyclopedia of genes and genomes (KEGG to identify PRRs together with the network pathway of differentially expressed genes (DEGs that recognize salmonid alphavirus subtype 3 (SAV-3 infection in macrophage/dendritic like TO-cells derived from Atlantic salmon (Salmo salar L headkidney leukocytes. Our findings show that recognition of SAV-3 in TO-cells was restricted to endosomal Toll-like receptors (TLRs 3 and 8 together with RIG-I-like receptors (RLRs and not the nucleotide-binding oligomerization domain-like receptors NOD-like receptor (NLRs genes. Among the RLRs, upregulated genes included the retinoic acid inducible gene I (RIG-I, melanoma differentiation association 5 (MDA5 and laboratory of genetics and physiology 2 (LGP2. The study points to possible involvement of the tripartite motif containing 25 (TRIM25 and mitochondrial antiviral signaling protein (MAVS in modulating RIG-I signaling being the first report that links these genes to the RLR pathway in SAV-3 infection in TO-cells. Downstream signaling suggests that both the TLR and RLR pathways use interferon (IFN regulatory factors (IRFs 3 and 7 to produce IFN-a2. The validity of RNA-seq data generated in this study was confirmed by quantitative real time qRT-PCR showing that genes up- or downregulated by RNA-seq were also up- or downregulated by RT-PCR. Overall, this study shows that de novo transcriptome assembly identify key receptors of the TLR and RLR sensors engaged in host pathogen interaction at cellular level. We envisage that data presented here can open a road map for future intervention strategies in SAV infection of salmon.

  9. A de novo transcriptome of the Malpighian tubules in non-blood-fed and blood-fed Asian tiger mosquitoes Aedes albopictus: insights into diuresis, detoxification, and blood meal processing

    Directory of Open Access Journals (Sweden)

    Carlos J. Esquivel

    2016-03-01

    Full Text Available Background. In adult female mosquitoes, the renal (Malpighian tubules play an important role in the post-prandial diuresis, which removes excess ions and water from the hemolymph of mosquitoes following a blood meal. After the post-prandial diuresis, the roles that Malpighian tubules play in the processing of blood meals are not well described. Methods. We used a combination of next-generation sequencing (paired-end RNA sequencing and physiological/biochemical assays in adult female Asian tiger mosquitoes (Aedes albopictus to generate molecular and functional insights into the Malpighian tubules and how they may contribute to blood meal processing (3–24 h after blood ingestion. Results/Discussion. Using RNA sequencing, we sequenced and assembled the first de novo transcriptome of Malpighian tubules from non-blood-fed (NBF and blood-fed (BF mosquitoes. We identified a total of 8,232 non-redundant transcripts. The Malpighian tubules of NBF mosquitoes were characterized by the expression of transcripts associated with active transepithelial fluid secretion/diuresis (e.g., ion transporters, water channels, V-type H+-ATPase subunits, xenobiotic detoxification (e.g., cytochrome P450 monoxygenases, glutathione S-transferases, ATP-binding cassette transporters, and purine metabolism (e.g., xanthine dehydrogenase. We also detected the expression of transcripts encoding sodium calcium exchangers, G protein coupled-receptors, and septate junctional proteins not previously described in mosquito Malpighian tubules. Within 24 h after a blood meal, transcripts associated with active transepithelial fluid secretion/diuresis exhibited a general downregulation, whereas those associated with xenobiotic detoxification and purine catabolism exhibited a general upregulation, suggesting a reinvestment of the Malpighian tubules’ molecular resources from diuresis to detoxification. Physiological and biochemical assays were conducted in mosquitoes and isolated

  10. The chaperonin-60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data.

    Science.gov (United States)

    Links, Matthew G; Dumonceaux, Tim J; Hemmingsen, Sean M; Hill, Janet E

    2012-01-01

    Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances ("barcode gap") was found from cpn60. Distribution of sequence diversity along the ∼555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.

  11. ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.

    Science.gov (United States)

    Yang, Rendong; Nelson, Andrew C; Henzler, Christine; Thyagarajan, Bharat; Silverstein, Kevin A T

    2015-12-07

    Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.

  12. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    Science.gov (United States)

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  13. SNP detection from de novo transcriptome sequencing in the bivalve Macoma balthica: marker development for evolutionary studies.

    Directory of Open Access Journals (Sweden)

    Eric Pante

    Full Text Available Hybrid zones are noteworthy systems for the study of environmental adaptation to fast-changing environments, as they constitute reservoirs of polymorphism and are key to the maintenance of biodiversity. They can move in relation to climate fluctuations, as temperature can affect both selection and migration, or remain trapped by environmental and physical barriers. There is therefore a very strong incentive to study the dynamics of hybrid zones subjected to climate variations. The infaunal bivalve Macoma balthica emerges as a noteworthy model species, as divergent lineages hybridize, and its native NE Atlantic range is currently contracting to the North. To investigate the dynamics and functioning of hybrid zones in M. balthica, we developed new molecular markers by sequencing the collective transcriptome of 30 individuals. Ten individuals were pooled for each of the three populations sampled at the margins of two hybrid zones. A single 454 run generated 277 Mb from which 17K SNPs were detected. SNP density averaged 1 polymorphic site every 14 to 19 bases, for mitochondrial and nuclear loci, respectively. An [Formula: see text] scan detected high genetic divergence among several hundred SNPs, some of them involved in energetic metabolism, cellular respiration and physiological stress. The high population differentiation, recorded for nuclear-encoded ATP synthase and NADH dehydrogenase as well as most mitochondrial loci, suggests cytonuclear genetic incompatibilities. Results from this study will help pave the way to a high-resolution study of hybrid zone dynamics in M. balthica, and the relative importance of endogenous and exogenous barriers to gene flow in this system.

  14. Transcriptome sequencing and de novo analysis of cytoplasmic male sterility and maintenance in JA-CMS cotton.

    Science.gov (United States)

    Yang, Peng; Han, Jinfeng; Huang, Jinling

    2014-01-01

    Cytoplasmic male sterility (CMS) is the failure to produce functional pollen, which is inherited maternally. And it is known that anther development is modulated through complicated interactions between nuclear and mitochondrial genes in sporophytic and gametophytic tissues. However, an unbiased transcriptome sequencing analysis of CMS in cotton is currently lacking in the literature. This study compared differentially expressed (DE) genes of floral buds at the sporogenous cells stage (SS) and microsporocyte stage (MS) (the two most important stages for pollen abortion in JA-CMS) between JA-CMS and its fertile maintainer line JB cotton plants, using the Illumina HiSeq 2000 sequencing platform. A total of 709 (1.8%) DE genes including 293 up-regulated and 416 down-regulated genes were identified in JA-CMS line comparing with its maintainer line at the SS stage, and 644 (1.6%) DE genes with 263 up-regulated and 381 down-regulated genes were detected at the MS stage. By comparing the two stages in the same material, there were 8 up-regulated and 9 down-regulated DE genes in JA-CMS line and 29 up-regulated and 9 down-regulated DE genes in JB maintainer line at the MS stage. Quantitative RT-PCR was used to validate 7 randomly selected DE genes. Bioinformatics analysis revealed that genes involved in reduction-oxidation reactions and alpha-linolenic acid metabolism were down-regulated, while genes pertaining to photosynthesis and flavonoid biosynthesis were up-regulated in JA-CMS floral buds compared with their JB counterparts at the SS and/or MS stages. All these four biological processes play important roles in reactive oxygen species (ROS) homeostasis, which may be an important factor contributing to the sterile trait of JA-CMS. Further experiments are warranted to elucidate molecular mechanisms of these genes that lead to CMS.

  15. De novo characterization of the spleen transcriptome of the large yellow croaker (Pseudosciaena crocea) and analysis of the immune relevant genes and pathways involved in the antiviral response

    KAUST Repository

    Mu, Yinnan

    2014-05-12

    The large yellow croaker (Pseudosciaena crocea) is an economically important marine fish in China. To understand the molecular basis for antiviral defense in this species, we used Illumia paired-end sequencing to characterize the spleen transcriptome of polyriboinosinic:polyribocytidylic acid [poly(I:C)]-induced large yellow croakers. The library produced 56,355,728 reads and assembled into 108,237 contigs. As a result, 15,192 unigenes were found from this transcriptome. Gene ontology analysis showed that 4,759 genes were involved in three major functional categories: biological process, cellular component, and molecular function. We further ascertained that numerous consensus sequences were homologous to known immune-relevant genes. Kyoto Encyclopedia of Genes and Genomes orthology mapping annotated 5,389 unigenes and identified numerous immune-relevant pathways. These immune-relevant genes and pathways revealed major antiviral immunity effectors, including but not limited to: pattern recognition receptors, adaptors and signal transducers, the interferons and interferon-stimulated genes, inflammatory cytokines and receptors, complement components, and B-cell and T-cell antigen activation molecules. Moreover, the partial genes of Toll-like receptor signaling pathway, RIG-I-like receptors signaling pathway, Janus kinase-Signal Transducer and Activator of Transcription (JAK-STAT) signaling pathway, and T-cell receptor (TCR) signaling pathway were found to be changed after poly(I:C) induction by real-time polymerase chain reaction (PCR) analysis, suggesting that these signaling pathways may be regulated by poly(I:C), a viral mimic. Overall, the antivirus-related genes and signaling pathways that were identified in response to poly(I:C) challenge provide valuable leads for further investigation of the antiviral defense mechanism in the large yellow croaker. © 2014 Mu et al.

  16. De novo characterization of the spleen transcriptome of the large yellow croaker (Pseudosciaena crocea and analysis of the immune relevant genes and pathways involved in the antiviral response.

    Directory of Open Access Journals (Sweden)

    Yinnan Mu

    Full Text Available The large yellow croaker (Pseudosciaena crocea is an economically important marine fish in China. To understand the molecular basis for antiviral defense in this species, we used Illumia paired-end sequencing to characterize the spleen transcriptome of polyriboinosinic:polyribocytidylic acid [poly(I:C]-induced large yellow croakers. The library produced 56,355,728 reads and assembled into 108,237 contigs. As a result, 15,192 unigenes were found from this transcriptome. Gene ontology analysis showed that 4,759 genes were involved in three major functional categories: biological process, cellular component, and molecular function. We further ascertained that numerous consensus sequences were homologous to known immune-relevant genes. Kyoto Encyclopedia of Genes and Genomes orthology mapping annotated 5,389 unigenes and identified numerous immune-relevant pathways. These immune-relevant genes and pathways revealed major antiviral immunity effectors, including but not limited to: pattern recognition receptors, adaptors and signal transducers, the interferons and interferon-stimulated genes, inflammatory cytokines and receptors, complement components, and B-cell and T-cell antigen activation molecules. Moreover, the partial genes of Toll-like receptor signaling pathway, RIG-I-like receptors signaling pathway, Janus kinase-Signal Transducer and Activator of Transcription (JAK-STAT signaling pathway, and T-cell receptor (TCR signaling pathway were found to be changed after poly(I:C induction by real-time polymerase chain reaction (PCR analysis, suggesting that these signaling pathways may be regulated by poly(I:C, a viral mimic. Overall, the antivirus-related genes and signaling pathways that were identified in response to poly(I:C challenge provide valuable leads for further investigation of the antiviral defense mechanism in the large yellow croaker.

  17. De novo characterization of the spleen transcriptome of the large yellow croaker (Pseudosciaena crocea) and analysis of the immune relevant genes and pathways involved in the antiviral response

    KAUST Repository

    Mu, Yinnan; Li, Mingyu; Ding, Feng; Ding, Yang; Ao, Jingqun; Hu, Songnian; Chen, Xinhua

    2014-01-01

    The large yellow croaker (Pseudosciaena crocea) is an economically important marine fish in China. To understand the molecular basis for antiviral defense in this species, we used Illumia paired-end sequencing to characterize the spleen transcriptome of polyriboinosinic:polyribocytidylic acid [poly(I:C)]-induced large yellow croakers. The library produced 56,355,728 reads and assembled into 108,237 contigs. As a result, 15,192 unigenes were found from this transcriptome. Gene ontology analysis showed that 4,759 genes were involved in three major functional categories: biological process, cellular component, and molecular function. We further ascertained that numerous consensus sequences were homologous to known immune-relevant genes. Kyoto Encyclopedia of Genes and Genomes orthology mapping annotated 5,389 unigenes and identified numerous immune-relevant pathways. These immune-relevant genes and pathways revealed major antiviral immunity effectors, including but not limited to: pattern recognition receptors, adaptors and signal transducers, the interferons and interferon-stimulated genes, inflammatory cytokines and receptors, complement components, and B-cell and T-cell antigen activation molecules. Moreover, the partial genes of Toll-like receptor signaling pathway, RIG-I-like receptors signaling pathway, Janus kinase-Signal Transducer and Activator of Transcription (JAK-STAT) signaling pathway, and T-cell receptor (TCR) signaling pathway were found to be changed after poly(I:C) induction by real-time polymerase chain reaction (PCR) analysis, suggesting that these signaling pathways may be regulated by poly(I:C), a viral mimic. Overall, the antivirus-related genes and signaling pathways that were identified in response to poly(I:C) challenge provide valuable leads for further investigation of the antiviral defense mechanism in the large yellow croaker. © 2014 Mu et al.

  18. De novo analysis of the Adelphocoris suturalis Jakovlev metathoracic scent glands transcriptome and expression patterns of pheromone biosynthesis-related genes.

    Science.gov (United States)

    Luo, Jing; Liu, Xiangyang; Liu, Lang; Zhang, Poyao; Chen, Longjia; Gao, Qiao; Ma, Weihua; Chen, Lizhen; Lei, Chaoliang

    2014-11-10

    Adelphocoris suturalis Jakovlev is a major cotton pest in Southern China. Metathoracic scent glands (MTGs) produced pheromones that play an important role in survival and population propagation of this species, and also show great potential for pest control. Up to the present, there is little information that underlined the molecular basis of the pheromone biosynthesis of this bug. It is essential to clarify genes involved in the production of pheromone components, and also in the regulation of the variation of the blend ratio. We sequenced the transcriptome of metathoracic scent glands (MTGs) of A. suturalis. A total of 52 million 91-bp-long reads were obtained and assembled into 70,296 unigenes with a mean length of 691bp. Of these unigenes, a total of 26,744 (38%) unigenes showed significant similarity to known proteins in the NCBI database (E-valuepheromone biosynthesis were selected, and the gene expression patterns were verified by qRT-PCR. The qRT-PCR results indicated that Asdelta9-DES, AsFAR, AsAOX, Ascarboxylesterase, AsNT-ES and AsATFs have a higher expression level in the period when female A. suturalis release sex pheromones. These data constitutes the first transcriptomic analysis exploring the repertoire of genes expressed in insect MTGs. We identified a large number of potential pheromone biosynthetic pathway genes. In this context, our study provides an invaluable resource for future exploration of molecular mechanisms of pheromone biosynthesis in A. suturalis, as well as other hemipteran species. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Transcriptome assembly and analysis of Tibetan Hulless Barley (Hordeum vulgare L. var. nudum developing grains, with emphasis on quality properties.

    Directory of Open Access Journals (Sweden)

    Xin Chen

    Full Text Available BACKGROUND: Hulless barley is attracting increasing attention due to its unique nutritional value and potential health benefits. However, the molecular biology of the barley grain development and nutrient storage are not well understood. Furthermore, the genetic potential of hulless barley has not been fully tapped for breeding. METHODOLOGY/PRINCIPAL FINDINGS: In the present study, we investigated the transcriptome features during hulless barley grain development. Using Illumina paired-end RNA-Sequencing, we generated two data sets of the developing grain transcriptomes from two hulless barley landraces. A total of 13.1 and 12.9 million paired-end reads with lengths of 90 bp were generated from the two varieties and were assembled to 48,863 and 45,788 unigenes, respectively. A combined dataset of 46,485 All-Unigenes were generated from two transcriptomes with an average length of 542 bp, and 36,278 among were annotated with gene descriptions, conserved protein domains or gene ontology terms. Furthermore, sequences and expression levels of genes related to the biosynthesis of storage reserve compounds (starch, protein, and β-glucan were analyzed, and their temporal and spatial patterns were deduced from the transcriptome data of cultivated barley Morex. CONCLUSIONS/SIGNIFICANCE: We established a sequences and functional annotation integrated database and examined the expression profiles of the developing grains of Tibetan hulless barley. The characterization of genes encoding storage proteins and enzymes of starch synthesis and (1-3;1-4-β-D-glucan synthesis provided an overview of changes in gene expression associated with grain nutrition and health properties. Furthermore, the characterization of these genes provides a gene reservoir, which helps in quality improvement of hulless barley.

  20. De Novo Transcriptome Sequencing and the Hypothetical Cold Response Mode of Saussurea involucrata in Extreme Cold Environments.

    Science.gov (United States)

    Li, Jin; Liu, Hailiang; Xia, Wenwen; Mu, Jianqiang; Feng, Yujie; Liu, Ruina; Yan, Panyao; Wang, Aiying; Lin, Zhongping; Guo, Yong; Zhu, Jianbo; Chen, Xianfeng

    2017-06-07

    Saussurea involucrata grows in high mountain areas covered by snow throughout the year. The temperature of this habitat can change drastically in one day. To gain a better understanding of the cold response signaling pathways and molecular metabolic reactions involved in cold stress tolerance, genome-wide transcriptional analyses were performed using RNA-Seq technologies. A total of 199,758 transcripts were assembled, producing 138,540 unigenes with 46.8 Gb clean data. Overall, 184,416 (92.32%) transcripts were successfully annotated. The 365 transcription factors identified (292 unigenes) belonged to 49 transcription factor families associated with cold stress responses. A total of 343 transcripts on the signal transduction (132 upregulated and 212 downregulated in at least any one of the conditions) were strongly affected by cold temperature, such as the CBL-interacting serine/threonine-protein kinase ( CIPKs ), receptor-like protein kinases , and protein kinases . The circadian rhythm pathway was activated by cold adaptation, which was necessary to endure the severe temperature changes within a day. There were 346 differentially expressed genes (DEGs) related to transport, of which 138 were upregulated and 22 were downregulated in at least any one of the conditions. Under cold stress conditions, transcriptional regulation, molecular transport, and signal transduction were involved in the adaptation to low temperature in S. involucrata . These findings contribute to our understanding of the adaptation of plants to harsh environments and the survival traits of S. involucrata . In addition, the present study provides insight into the molecular mechanisms of chilling and freezing tolerance.

  1. De novo RNA sequencing transcriptome of Rhododendron obtusum identified the early heat response genes involved in the transcriptional regulation of photosynthesis.

    Directory of Open Access Journals (Sweden)

    Linchuan Fang

    Full Text Available Rhododendron spp. is an important ornamental species that is widely cultivated for landscape worldwide. Heat stress is a major obstacle for its cultivation in south China. Previous studies on rhododendron principally focused on its physiological and biochemical processes, which are involved in a series of stress tolerance. However, molecular or genetic properties of rhododendron's response to heat stress are still poorly understood. The phenotype and chlorophyll fluorescence kinetics parameters of four rhododendron cultivars were compared under normal or heat stress conditions, and a cultivar with highest heat tolerance, "Yanzhimi" (R. obtusum was selected for transcriptome sequencing. A total of 325,429,240 high quality reads were obtained and assembled into 395,561 transcripts and 92,463 unigenes. Functional annotation showed that 38,724 unigenes had sequence similarity to known genes in at least one of the proteins or nucleotide databases used in this study. These 38,724 unigenes were categorized into 51 functional groups based on Gene Ontology classification and were blasted to 24 known cluster of orthologous groups. A total of 973 identified unigenes belonged to 57 transcription factor families, including the stress-related HSF, DREB, ZNF, and NAC genes. Photosynthesis was significantly enriched in the Kyoto Encyclopedia of Genes and Genomes pathway, and the changed expression pattern was illustrated. The key pathways and signaling components that contribute to heat tolerance in rhododendron were revealed. These results provide a potentially valuable resource that can be used for heat-tolerance breeding.

  2. De novo transcriptome analysis of pneumatophores (modified roots in the true mangrove species Avicennia marina and identification of the genes related to root gas exchange

    Directory of Open Access Journals (Sweden)

    Purushothaman Natarajan

    2017-10-01

    Full Text Available Mangroves plants which grow in estuaries naturally tolerate extreme conditions of high salinity (90 ppt and high light intensity. Avicennia marina is a true mangrove tree species with physiological adaptations like modified root system (pneumatophores and salt excretion glands in leaves as its one of the unique features to consider. The pneumatophores are a special type of roots with negative geotropism that project above the water surface or the level of flooded soils [1]. In contact with air these roots develop lenticels, which improve gas exchange between roots and environment [2]. In swamps and wetlands the presence of pneumatophores facilitates oxygen diffusion through the tissues, maintaining levels adequate for cellular respiration [3]. Objective of this study was to perform the whole transcriptome analysis of pneumatophore tissue of A. marina by Illumina sequencing and to identify putative genes involved in process of root gas exchange. We generated 19.73 million of paired-end reads and assembled into 86,856 unigenes with an average length of 772 bp. Further, annotation, tissue specific gene expression and genes related to root gas exchange will be presented.

  3. De novo RNA sequencing transcriptome of Rhododendron obtusum identified the early heat response genes involved in the transcriptional regulation of photosynthesis

    Science.gov (United States)

    Tong, Jun; Dong, Yanfang; Xu, Dongyun; Mao, Jing; Zhou, Yuan

    2017-01-01

    Rhododendron spp. is an important ornamental species that is widely cultivated for landscape worldwide. Heat stress is a major obstacle for its cultivation in south China. Previous studies on rhododendron principally focused on its physiological and biochemical processes, which are involved in a series of stress tolerance. However, molecular or genetic properties of rhododendron’s response to heat stress are still poorly understood. The phenotype and chlorophyll fluorescence kinetics parameters of four rhododendron cultivars were compared under normal or heat stress conditions, and a cultivar with highest heat tolerance, “Yanzhimi” (R. obtusum) was selected for transcriptome sequencing. A total of 325,429,240 high quality reads were obtained and assembled into 395,561 transcripts and 92,463 unigenes. Functional annotation showed that 38,724 unigenes had sequence similarity to known genes in at least one of the proteins or nucleotide databases used in this study. These 38,724 unigenes were categorized into 51 functional groups based on Gene Ontology classification and were blasted to 24 known cluster of orthologous groups. A total of 973 identified unigenes belonged to 57 transcription factor families, including the stress-related HSF, DREB, ZNF, and NAC genes. Photosynthesis was significantly enriched in the Kyoto Encyclopedia of Genes and Genomes pathway, and the changed expression pattern was illustrated. The key pathways and signaling components that contribute to heat tolerance in rhododendron were revealed. These results provide a potentially valuable resource that can be used for heat-tolerance breeding. PMID:29059200

  4. Identification of Genes Related to Learning and Memory in the Brain Transcriptome of the Mollusc, "Hermissenda Crassicornis"

    Science.gov (United States)

    Tamvacakis, Arianna N.; Senatore, Adriano; Katz, Paul S.

    2015-01-01

    The sea slug "Hermissenda crassicornis" (Mollusca, Gastropoda, Nudibranchia) has been studied extensively in associative learning paradigms. However, lack of genetic information previously hindered molecular-level investigations. Here, the "Hermissenda" brain transcriptome was sequenced and assembled de novo, producing 165,743…

  5. Transcriptome profiling of Elettaria cardamomum (L. Maton (small cardamom

    Directory of Open Access Journals (Sweden)

    F. Nadiya

    2017-03-01

    Full Text Available Elettaria cardamomum (L. Maton, known as ‘queen of spices, is a perennial herbaceous monocot of the family Zingiberaceae, native to southern India. Cardamom is an economically valuable spice crop and used widely in culinary and medicinal purposes. In the present study, using Ion Proton RNA sequencing technology, we performed transcriptome sequencing and de novo transcriptome assembly of a wild and five cultivar genotypes of cardamom. RNA-seq generated a total of 22,811,983 (92 base and 24,889,197 (75 base raw reads accounting for approximately 8.21GB and 7.65GB of sequence data for wild and cultivar genotypes of cardamom respectively. The raw data were submitted to SRA database of NCBI under the accession numbers SRX1141272 (wild and SRX1141276 (cultivars. The raw reads were quality filtered and assembled using MIRA assembler resulted with 112,208 and 264,161contigs having N50 value 616 and 664 for wild and cultivar cardamom respectively. The assembled unigenes were functionally annotated using several databases including PlantCyc for pathway annotation. This work represents the first report on cardamom transcriptome sequencing. In order to generate a comprehensive reference transcriptome, we further assembled the raw reads of wild and cultivar genotypes which might enrich the plant transcriptome database and trigger advanced research in cardamom genomics.

  6. Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader.

    Science.gov (United States)

    Richardson, Mark F; Sequeira, Fernando; Selechnik, Daniel; Carneiro, Miguel; Vallinoto, Marcelo; Reid, Jack G; West, Andrea J; Crossland, Michael R; Shine, Richard; Rollins, Lee A

    2018-01-01

    Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. © The Authors 2017. Published by Oxford University Press.

  7. Transcriptome Analysis of the Response of Burmese Python to Digestion

    OpenAIRE

    Duan, Jinjie; Sanggaard, Kristian Wejse; Schauser, Leif; Lauridsen, Sanne Enok; Enghild, Jan J.; Schierup, Mikkel Heide; Wang, Tobias

    2017-01-01

    Abstract Exceptional and extreme feeding behaviour makes the Burmese python (Python bivittatus) an interesting model to study physiological remodelling and metabolic adaptation in response to refeeding after prolonged starvation. In this study, we used transcriptome sequencing of 5 visceral organs during fasting as well as 24 hours and 48 hours after ingestion of a large meal to unravel the postprandial changes in Burmese pythons. We first used the pooled data to perform a de novo assembly of...

  8. Prediction of the neuropeptidomes of members of the Astacidea (Crustacea, Decapoda) using publicly accessible transcriptome shotgun assembly (TSA) sequence data.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan

    2015-12-01

    The decapod infraorder Astacidea is comprised of clawed lobsters and freshwater crayfish. Due to their economic importance and their use as models for investigating neurochemical signaling, much work has focused on elucidating their neurochemistry, particularly their peptidergic systems. Interestingly, no astacidean has been the subject of large-scale peptidomic analysis via in silico transcriptome mining, this despite growing transcriptomic resources for members of this taxon. Here, the publicly accessible astacidean transcriptome shotgun assembly data were mined for putative peptide-encoding transcripts; these sequences were used to predict the structures of mature neuropeptides. One hundred seventy-six distinct peptides were predicted for Procambarus clarkii, including isoforms of adipokinetic hormone-corazonin-like peptide (ACP), allatostatin A (AST-A), allatostatin B, allatostatin C (AST-C) bursicon α, bursicon β, CCHamide, crustacean hyperglycemic hormone (CHH)/ion transport peptide (ITP), diuretic hormone 31 (DH31), eclosion hormone (EH), FMRFamide-like peptide, GSEFLamide, intocin, leucokinin, neuroparsin, neuropeptide F, pigment dispersing hormone, pyrokinin, RYamide, short neuropeptide F (sNPF), SIFamide, sulfakinin and tachykinin-related peptide (TRP). Forty-six distinct peptides, including isoforms of AST-A, AST-C, bursicon α, CCHamide, CHH/ITP, DH31, EH, intocin, myosuppressin, neuroparsin, red pigment concentrating hormone, sNPF and TRP, were predicted for Pontastacus leptodactylus, with a bursicon β and a neuroparsin predicted for Cherax quadricarinatus. The identification of ACP is the first from a decapod, while the predictions of CCHamide, EH, GSEFLamide, intocin, neuroparsin and RYamide are firsts for the Astacidea. Collectively, these data greatly expand the catalog of known astacidean neuropeptides and provide a foundation for functional studies of peptidergic signaling in members of this decapod infraorder. Copyright © 2015 Elsevier Inc

  9. Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea ( Vigna unguiculata (L.) Walp).

    Science.gov (United States)

    Spriggs, Andrew; Henderson, Steven T; Hand, Melanie L; Johnson, Susan D; Taylor, Jennifer M; Koltunow, Anna

    2018-02-09

    Cowpea ( Vigna unguiculata (L.) Walp) is an important legume crop for food security in areas of low-input and smallholder farming throughout Africa and Asia. Genetic improvements are required to increase yield and resilience to biotic and abiotic stress and to enhance cowpea crop performance. An integrated cowpea genomic and gene expression data resource has the potential to greatly accelerate breeding and the delivery of novel genetic traits for cowpea. Extensive genomic resources for cowpea have been absent from the public domain; however, a recent early release reference genome for IT97K-499-35 ( Vigna unguiculata  v1.0, NSF, UCR, USAID, DOE-JGI, http://phytozome.jgi.doe.gov/) has now been established in a collaboration between the Joint Genome Institute (JGI) and University California (UC) Riverside. Here we release supporting genomic and transcriptomic data for IT97K-499-35 and a second transformable cowpea variety, IT86D-1010. The transcriptome resource includes six tissue-specific datasets for each variety, with particular emphasis on reproductive tissues that extend and support the V. unguiculata v1.0 reference. Annotations have been included in our resource to allow direct mapping to the v1.0 cowpea reference. Access to this resource provided here is supported by raw and assembled data downloads.

  10. Transcriptome assembly and expression profiling of molecular responses to cadmium toxicity in hepatopancreas of the freshwater crab Sinopotamon henanense

    Science.gov (United States)

    Sun, Min; Ting Li, Yi; Liu, Yang; Chin Lee, Shao; Wang, Lan

    2016-01-01

    Cadmium (Cd) pollution is a serious global problem, which causes irreversible toxic effects on animals. Freshwater crab, Sinopotamon henanense, is a useful environmental indicator since it is widely distributed in benthic habitats whereby it tends to accumulate Cd and other toxicants. However, its molecular responses to Cd toxicity remain unclear. In this study, we performed transcriptome sequencing and gene expression analyses of its hepatopancreas with and without Cd treatments. A total of 7.78 G clean reads were obtained from the pooled samples, and 68,648 unigenes with an average size of 622 bp were assembled, in which 5,436 were metabolism-associated and 2,728 were stimulus response-associated that include 380 immunity-related unigenes. Expression profile analysis demonstrated that most genes involved in macromolecular metabolism, oxidative phosphorylation, detoxification and anti-oxidant defense were up-regulated by Cd exposure, whereas immunity-related genes were down-regulated, except the genes involved in phagocytosis were up-regulated. The current data indicate that Cd exposure alters gene expressions in a concentration-dependent manner. Therefore, our results provide the first comprehensive S.henanense transcriptome dataset, which is useful for biological and ecotoxicological studies on this crab and its related species at molecular level, and some key Cd-responsive genes may provide candidate biomarkers for monitoring aquatic pollution by heavy metals.

  11. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    Science.gov (United States)

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  12. De Novo Transcriptome Analysis of Two Seahorse Species (Hippocampus erectus and H. mohnikei and the Development of Molecular Markers for Population Genetics.

    Directory of Open Access Journals (Sweden)

    Qiang Lin

    Full Text Available Seahorse conservation has been performed utilizing various strategies for many decades, and the deeper understanding of genomic information is necessary to more efficiently protect the germplasm resources of seahorse species. However, little genetic information about seahorses currently exists in the public databases. In this study, high-throughput RNA sequencing for two seahorse species, Hippocampus erectus and H. mohnikei, was carried out, and de novo assembly generated 37,506 unigenes for H. erectus and 36,113 unigenes for H. mohnikei. Among them, 17,338 (46.23% unigenes for H. erectus and 17,900 (49.57% for H. mohnikei were successfully annotated based on the information available from the public databases. Through comparing the unigenes of two seahorse species, 7,802 candidate orthologous genes were identified and 5,268 genes among them could be annotated. In addition, gene ontology analysis of two species was similarly performed on biological processes, cellular components, and molecular functions. Twenty-four and twenty-one unigenes in H. erectus and H. mohnikei were annotated in the biosynthesis of unsaturated fatty acids pathways, and both seahorses lacked the Δ12 and Δ15 desaturases. Total of 8,992 and 9,116 SSR loci were obtained from H. erectus and H. mohnikei unigenes, respectively. Dozens of SSR were developed and then applied to assess the population genetic diversity, as well as cross-amplified in a related species, H. trimaculatus. The HO and HE values of the tested populations for H. erectus, H. mohnikei, and H. trimaculatus were medium. These resources would facilitate the conservation of the species through a better understanding of the genomics and comparative genome analysis within the Hippocampus genus.

  13. De Novo Transcriptome Analysis of Two Seahorse Species (Hippocampus erectus and H. mohnikei) and the Development of Molecular Markers for Population Genetics.

    Science.gov (United States)

    Lin, Qiang; Luo, Wei; Wan, Shiming; Gao, Zexia

    2016-01-01

    Seahorse conservation has been performed utilizing various strategies for many decades, and the deeper understanding of genomic information is necessary to more efficiently protect the germplasm resources of seahorse species. However, little genetic information about seahorses currently exists in the public databases. In this study, high-throughput RNA sequencing for two seahorse species, Hippocampus erectus and H. mohnikei, was carried out, and de novo assembly generated 37,506 unigenes for H. erectus and 36,113 unigenes for H. mohnikei. Among them, 17,338 (46.23%) unigenes for H. erectus and 17,900 (49.57%) for H. mohnikei were successfully annotated based on the information available from the public databases. Through comparing the unigenes of two seahorse species, 7,802 candidate orthologous genes were identified and 5,268 genes among them could be annotated. In addition, gene ontology analysis of two species was similarly performed on biological processes, cellular components, and molecular functions. Twenty-four and twenty-one unigenes in H. erectus and H. mohnikei were annotated in the biosynthesis of unsaturated fatty acids pathways, and both seahorses lacked the Δ12 and Δ15 desaturases. Total of 8,992 and 9,116 SSR loci were obtained from H. erectus and H. mohnikei unigenes, respectively. Dozens of SSR were developed and then applied to assess the population genetic diversity, as well as cross-amplified in a related species, H. trimaculatus. The HO and HE values of the tested populations for H. erectus, H. mohnikei, and H. trimaculatus were medium. These resources would facilitate the conservation of the species through a better understanding of the genomics and comparative genome analysis within the Hippocampus genus.

  14. The mitochondrial genomes of Atlas Geckos (Quedenfeldtia): mitogenome assembly from transcriptomes and anchored hybrid enrichment datasets

    OpenAIRE

    Lyra, Mariana L.; Joger, Ulrich; Schulte, Ulrich; Slimani, Tahar; El Mouden, El Hassan; Bouazza, Abdellah; Künzel, Sven; Lemmon, Alan R.; Moriarty Lemmon, Emily; Vences, Miguel

    2017-01-01

    The nearly complete mitogenomes of the two species of North African Atlas geckos, Quedenfeldtia moerens and Q. trachyblepharus were assembled from anchored hybrid enrichment data and RNAseq data. Congruent assemblies were obtained for four samples included in both datasets. We recovered the 13 protein-coding genes, 22 tRNA genes, and two rRNA genes for both species, including partial control region. The order of genes agrees with that of other geckos.

  15. Analysis of the transcriptome of Isodon rubescens and key enzymes involved in terpenoid biosynthesis

    Directory of Open Access Journals (Sweden)

    Xiuhong Su

    2016-05-01

    Full Text Available Isodon rubescens is an important medicinal plant in China that has been shown to reduce tumour growth due to the presence of the compound oridonin. In an effort to facilitate molecular research on oridonin biosynthesis, we reported the use of next generation massively parallel sequencing technologies and de novo transcriptome assembly to gain a comprehensive overview of I. rubescens transcriptome. In our study, a total of 50,934,276 clean reads, 101,640 transcripts and 44,626 unigenes were generated through de novo transcriptome assembly. A number of unigenes – 23,987, 10,263, 7359, 18,245, 17,683, 19,485, 9361 – were annotated in the National Center for Biotechnology Information (NCBI non-redundant protein (Nr, NCBI nucleotide sequences (Nt, Kyoto Encyclopedia of Genes and Genomes (KEGG Orthology (KO, Swiss-Prot, protein family (Pfam, gene ontology (GO, eukaryotic ortholog groups (KOG databases, respectively. Furthermore, the annotated unigenes were functionally classified according to the GO, KOG and KEGG. Based on these results, candidate genes encoding enzymes involved in terpenoids backbone biosynthesis were detected. Our data provided the most comprehensive sequence resource available for the study on I. rubescens, as well as demonstrated the effective use of Illumina sequencing and de novo transcriptome assembly on a species lacking genomic information.

  16. Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants

    Directory of Open Access Journals (Sweden)

    Gruenheit Nicole

    2012-03-01

    Full Text Available Abstract Background Transcriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants. One issue for both transcriptome assembly and differential gene expression analyses is the common occurrence in plants of hybridisation and whole genome duplication (WGD and hybridization resulting in allopolyploidy. The divergence of duplicated genes following WGD creates near identical homeologues that can be problematic for de novo assembly and also reference based assembly protocols that use short reads (35 - 100 bp. Results Here we report a successful strategy for the assembly of two transcriptomes made using 75 bp Illumina reads from Pachycladon fastigiatum and Pachycladon cheesemanii. Both are allopolyploid plant species (2n = 20 that originated in the New Zealand Alps about 0.8 million years ago. In a systematic analysis of 19 different coverage cutoffs and 20 different k-mer sizes we showed that i none of the genes could be assembled across all of the parameter space ii assembly of each gene required an optimal set of parameter values and iii these parameter values could be explained in part by different gene expression levels and different degrees of similarity between genes. Conclusions To obtain optimal transcriptome assemblies for allopolyploid plants, k-mer size and k-mer coverage need to be considered simultaneously across a broad parameter space. This is important for assembling a maximum number of full length ESTs and for avoiding chimeric assemblies of homeologous and paralogous gene copies.

  17. Improving transcriptome construction in non-model organisms: integrating manual and automated gene definition in Emiliania huxleyi.

    OpenAIRE

    Feldmesser, Ester; Rosenwasser, Shilo; Vardi, Assaf; Ben-Dor, Shifra

    2014-01-01

    Background The advent of Next Generation Sequencing technologies and corresponding bioinformatics tools allows the definition of transcriptomes in non-model organisms. Non-model organisms are of great ecological and biotechnological significance, and consequently the understanding of their unique metabolic pathways is essential. Several methods that integrate de novo assembly with genome-based assembly have been proposed. Yet, there are many open challenges in defining genes, particularly whe...

  18. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    Science.gov (United States)

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. The Carcinogenic Liver Fluke, Clonorchis sinensis: New Assembly, Reannotation and Analysis of the Genome and Characterization of Tissue Transcriptomes

    Science.gov (United States)

    Wang, Xiaoyun; Liu, Hailiang; Chen, Yangyi; Guo, Lei; Luo, Fang; Sun, Jiufeng; Mao, Qiang; Liang, Pei; Xie, Zhizhi; Zhou, Chenhui; Tian, Yanli; Lv, Xiaoli; Huang, Lisi; Zhou, Juanjuan; Hu, Yue; Li, Ran; Zhang, Fan; Lei, Huali; Li, Wenfang; Hu, Xuchu; Liang, Chi; Xu, Jin; Li, Xuerong; Yu, Xinbing

    2013-01-01

    Clonorchis sinensis (C. sinensis), an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis). Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis. PMID:23382950

  20. The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes.

    Directory of Open Access Journals (Sweden)

    Yan Huang

    Full Text Available Clonorchis sinensis (C. sinensis, an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis. Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis.

  1. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  2. A draft de novo genome assembly for the northern bobwhite (Colinus virginianus reveals evidence for a rapid decline in effective population size beginning in the Late Pleistocene.

    Directory of Open Access Journals (Sweden)

    Yvette A Halley

    Full Text Available Wild populations of northern bobwhites (Colinus virginianus; hereafter bobwhite have declined across nearly all of their U.S. range, and despite their importance as an experimental wildlife model for ecotoxicology studies, no bobwhite draft genome assembly currently exists. Herein, we present a bobwhite draft de novo genome assembly with annotation, comparative analyses including genome-wide analyses of divergence with the chicken (Gallus gallus and zebra finch (Taeniopygia guttata genomes, and coalescent modeling to reconstruct the demographic history of the bobwhite for comparison to other birds currently in decline (i.e., scarlet macaw; Ara macao. More than 90% of the assembled bobwhite genome was captured within 14,000 unique genes and proteins. Bobwhite analyses of divergence with the chicken and zebra finch genomes revealed many extremely conserved gene sequences, and evidence for lineage-specific divergence of noncoding regions. Coalescent models for reconstructing the demographic history of the bobwhite and the scarlet macaw provided evidence for population bottlenecks which were temporally coincident with human colonization of the New World, the late Pleistocene collapse of the megafauna, and the last glacial maximum. Demographic trends predicted for the bobwhite and the scarlet macaw also were concordant with how opposing natural selection strategies (i.e., skewness in the r-/K-selection continuum would be expected to shape genome diversity and the effective population sizes in these species, which is directly relevant to future conservation efforts.

  3. Detection of an inversion in the Ty-2 region between S. lycopersicum and S. habrochaites by a combination of de novo genome assembly and BAC cloning.

    Science.gov (United States)

    Wolters, Anne-Marie A; Caro, Myluska; Dong, Shufang; Finkers, Richard; Gao, Jianchang; Visser, Richard G F; Wang, Xiaoxuan; Du, Yongchen; Bai, Yuling

    2015-10-01

    A chromosomal inversion associated with the tomato Ty - 2 gene for TYLCV resistance is the cause of severe suppression of recombination in a tomato Ty - 2 introgression line. Among tomato and its wild relatives inversions are often observed, which result in suppression of recombination. Such inversions hamper the transfer of important traits from a related species to the crop by introgression breeding. Suppression of recombination was reported for the TYLCV resistance gene, Ty-2, which has been introgressed in cultivated tomato (Solanum lycopersicum) from the wild relative S. habrochaites accession B6013. Ty-2 was mapped to a 300-kb region on the long arm of chromosome 11. The suppression of recombination in the Ty-2 region could be caused by chromosomal rearrangements in S. habrochaites compared with S. lycopersicum. With the aim of visualizing the genome structure of the Ty-2 region, we compared the draft de novo assembly of S. habrochaites accession LYC4 with the sequence of cultivated tomato ('Heinz'). Furthermore, using populations derived from intraspecific crosses of S. habrochaites accessions, the order of markers in the Ty-2 region was studied. Results showed the presence of an inversion of approximately 200 kb in the Ty-2 region when comparing S. lycopersicum and S. habrochaites. By sequencing a BAC clone from the Ty-2 introgression line, one inversion breakpoint was identified. Finally, the obtained results are discussed with respect to introgression breeding and the importance of a priori de novo sequencing of the species involved.

  4. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset

    KAUST Repository

    Shokry, Ahmed M.

    2014-02-01

    The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). © 2014 Académie des sciences.

  5. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shairul Izan

    2017-08-01

    Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

  6. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

    2012-06-01

    Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs. We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy. The IDBA-UD toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud

  7. De Novo Assembly and Phasing of Dikaryotic Genomes from Two Isolates of Puccinia coronata f. sp. avenae, the Causal Agent of Oat Crown Rust

    Directory of Open Access Journals (Sweden)

    Marisa E. Miller

    2018-02-01

    Full Text Available Oat crown rust, caused by the fungus Pucinnia coronata f. sp. avenae, is a devastating disease that impacts worldwide oat production. For much of its life cycle, P. coronata f. sp. avenae is dikaryotic, with two separate haploid nuclei that may vary in virulence genotype, highlighting the importance of understanding haplotype diversity in this species. We generated highly contiguous de novo genome assemblies of two P. coronata f. sp. avenae isolates, 12SD80 and 12NC29, from long-read sequences. In total, we assembled 603 primary contigs for 12SD80, for a total assembly length of 99.16 Mbp, and 777 primary contigs for 12NC29, for a total length of 105.25 Mbp; approximately 52% of each genome was assembled into alternate haplotypes. This revealed structural variation between haplotypes in each isolate equivalent to more than 2% of the genome size, in addition to about 260,000 and 380,000 heterozygous single-nucleotide polymorphisms in 12SD80 and 12NC29, respectively. Transcript-based annotation identified 26,796 and 28,801 coding sequences for isolates 12SD80 and 12NC29, respectively, including about 7,000 allele pairs in haplotype-phased regions. Furthermore, expression profiling revealed clusters of coexpressed secreted effector candidates, and the majority of orthologous effectors between isolates showed conservation of expression patterns. However, a small subset of orthologs showed divergence in expression, which may contribute to differences in virulence between 12SD80 and 12NC29. This study provides the first haplotype-phased reference genome for a dikaryotic rust fungus as a foundation for future studies into virulence mechanisms in P. coronata f. sp. avenae.

  8. A stochastic de novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus

    NARCIS (Netherlands)

    Bucur, Doina

    2017-01-01

    A genetic algorithm with stochastic macro mutation operators which merge, split, move, reverse and align DNA contigs on a scaffold is shown to accurately and consistently assemble raw DNA reads from an accurately sequenced single-read library into a contiguous genome. A candidate solution is a

  9. Transcriptome differences between enrofloxacin-resistant and enrofloxacin-susceptible strains of Aeromonas hydrophila

    OpenAIRE

    Zhu, Fengjiao; Yang, Zongying; Zhang, Yiliu; Hu, Kun; Fang, Wenhong

    2017-01-01

    Enrofloxacin is the most commonly used antibiotic to control diseases in aquatic animals caused by A. hydrophila. This study conducted de novo transcriptome sequencing and compared the global transcriptomes of enrofloxacin-resistant and enrofloxacin-susceptible strains. We got a total of 4,714 unigenes were assembled. Of these, 4,122 were annotated. A total of 3,280 unigenes were assigned to GO, 3,388 unigenes were classified into Cluster of Orthologous Groups of proteins (COG) using BLAST an...

  10. Developmental Transcriptome for a Facultatively Eusocial Bee, Megalopta genalis.

    Science.gov (United States)

    Jones, Beryl M; Wcislo, William T; Robinson, Gene E

    2015-08-14

    Transcriptomes provide excellent foundational resources for mechanistic and evolutionary analyses of complex traits. We present a developmental transcriptome for the facultatively eusocial bee Megalopta genalis, which represents a potential transition point in the evolution of eusociality. A de novo transcriptome assembly of Megalopta genalis was generated using paired-end Illumina sequencing and the Trinity assembler. Males and females of all life stages were aligned to this transcriptome for analysis of gene expression profiles throughout development. Gene Ontology analysis indicates that stage-specific genes are involved in ion transport, cell-cell signaling, and metabolism. A number of distinct biological processes are upregulated in each life stage, and transitions between life stages involve shifts in dominant functional processes, including shifts from transcriptional regulation in embryos to metabolism in larvae, and increased lipid metabolism in adults. We expect that this transcriptome will provide a useful resource for future analyses to better understand the molecular basis of the evolution of eusociality and, more generally, phenotypic plasticity. Copyright © 2015 Jones et al.

  11. De novo assembly of the zucchini genome reveals a whole-genome duplication associated with the origin of the Cucurbita genus.

    Science.gov (United States)

    Montero-Pau, Javier; Blanca, José; Bombarely, Aureliano; Ziarsolo, Peio; Esteras, Cristina; Martí-Gómez, Carlos; Ferriol, María; Gómez, Pedro; Jamilena, Manuel; Mueller, Lukas; Picó, Belén; Cañizares, Joaquín

    2017-11-07

    The Cucurbita genus (squashes, pumpkins and gourds) includes important domesticated species such as C. pepo, C. maxima and C. moschata. In this study, we present a high-quality draft of the zucchini (C. pepo) genome. The assembly has a size of 263 Mb, a scaffold N50 of 1.8 Mb and 34 240 gene models. It includes 92% of the conserved BUSCO core gene set, and it is estimated to cover 93.0% of the genome. The genome is organized in 20 pseudomolecules that represent 81.4% of the assembly, and it is integrated with a genetic map of 7718 SNPs. Despite the small genome size, three independent lines of evidence support that the C. pepo genome is the result of a whole-genome duplication: the topology of the gene family phylogenies, the karyotype organization and the distribution of 4DTv distances. Additionally, 40 transcriptomes of 12 species of the genus were assembled and analysed together with all the other published genomes of the Cucurbitaceae family. The duplication was detected in all the Cucurbita species analysed, including C. maxima and C. moschata, but not in the more distant cucurbits belonging to the Cucumis and Citrullus genera, and it is likely to have occurred 30 ± 4 Mya in the ancestral species that gave rise to the genus. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  12. Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery

    Directory of Open Access Journals (Sweden)

    Materne Michael

    2011-05-01

    Full Text Available Abstract Background Lentil (Lens culinaris Medik. is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality. Results Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs. De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism. Conclusions A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.

  13. Developmental Transcriptome Analysis and Identification of Genes Involved in Larval Metamorphosis of the Razor Clam, Sinonovacula constricta.

    Science.gov (United States)

    Niu, Donghong; Wang, Fei; Xie, Shumei; Sun, Fanyue; Wang, Ze; Peng, Maoxiao; Li, Jiale

    2016-04-01

    The razor clam Sinonovacula constricta is an important commercial species. The deficiency of developmental transcriptomic data is becoming the bottleneck of further researches on the mechanisms underlying settlement and metamorphosis in early development. In this study, de novo transcriptome sequencing was performed for S. constricta at different early developmental stages by using Illumina HiSeq 2000 paired-end (PE) sequencing technology. A total of 112,209,077 PE clean reads were generated. De novo assembly generated 249,795 contigs with an average length of 585 bp. Gene annotation resulted in the identification of 22,870 unigene hits against the NCBI database. Eight unique sequences related to metamorphosis were identified and analyzed using real-time PCR. The razor clam reference transcriptome would provide useful information on early developmental and metamorphosis mechanisms and could be used in the genetic breeding of shellfish.

  14. De Novo Assembly and Phasing of Dikaryotic Genomes from Two Isolates of Puccinia coronata f. sp. avenae, the Causal Agent of Oat Crown Rust.

    Science.gov (United States)

    Miller, Marisa E; Zhang, Ying; Omidvar, Vahid; Sperschneider, Jana; Schwessinger, Benjamin; Raley, Castle; Palmer, Jonathan M; Garnica, Diana; Upadhyaya, Narayana; Rathjen, John; Taylor, Jennifer M; Park, Robert F; Dodds, Peter N; Hirsch, Cory D; Kianian, Shahryar F; Figueroa, Melania

    2018-02-20

    Oat crown rust, caused by the fungus Pucinnia coronata f. sp. avenae , is a devastating disease that impacts worldwide oat production. For much of its life cycle, P. coronata f. sp. avenae is dikaryotic, with two separate haploid nuclei that may vary in virulence genotype, highlighting the importance of understanding haplotype diversity in this species. We generated highly contiguous de novo genome assemblies of two P. coronata f. sp. avenae isolates, 12SD80 and 12NC29, from long-read sequences. In total, we assembled 603 primary contigs for 12SD80, for a total assembly length of 99.16 Mbp, and 777 primary contigs for 12NC29, for a total length of 105.25 Mbp; approximately 52% of each genome was assembled into alternate haplotypes. This revealed structural variation between haplotypes in each isolate equivalent to more than 2% of the genome size, in addition to about 260,000 and 380,000 heterozygous single-nucleotide polymorphisms in 12SD80 and 12NC29, respectively. Transcript-based annotation identified 26,796 and 28,801 coding sequences for isolates 12SD80 and 12NC29, respectively, including about 7,000 allele pairs in haplotype-phased regions. Furthermore, expression profiling revealed clusters of coexpressed secreted effector candidates, and the majority of orthologous effectors between isolates showed conservation of expression patterns. However, a small subset of orthologs showed divergence in expression, which may contribute to differences in virulence between 12SD80 and 12NC29. This study provides the first haplotype-phased reference genome for a dikaryotic rust fungus as a foundation for future studies into virulence mechanisms in P. coronata f. sp. avenae IMPORTANCE Disease management strategies for oat crown rust are challenged by the rapid evolution of Puccinia coronata f. sp. avenae , which renders resistance genes in oat varieties ineffective. Despite the economic importance of understanding P. coronata f. sp. avenae , resources to study the

  15. A snapshot of a coral "holobiont": a transcriptome assembly of the scleractinian coral, porites, captures a wide variety of genes from both the host and symbiotic zooxanthellae.

    Science.gov (United States)

    Shinzato, Chuya; Inoue, Mayuri; Kusakabe, Makoto

    2014-01-01

    Massive scleractinian corals of the genus Porites are important reef builders in the Indo-Pacific, and they are more resistant to thermal stress than other stony corals, such as the genus Acropora. Because coral health and survival largely depend on the interaction between a coral host and its symbionts, it is important to understand the molecular interactions of an entire "coral holobiont". We simultaneously sequenced transcriptomes of Porites australiensis and its symbionts using the Illumina Hiseq2000 platform. We obtained 14.3 Gbp of sequencing data and assembled it into 74,997 contigs (average: 1,263 bp, N50 size: 2,037 bp). We successfully distinguished contigs originating from the host (Porites) and the symbiont (Symbiodinium) by aligning nucleotide sequences with the decoded Acropora digitifera and Symbiodinium minutum genomes. In contrast to previous coral transcriptome studies, at least 35% of the sequences were found to have originated from the symbionts, indicating that it is possible to analyze both host and symbiont transcriptomes simultaneously. Conserved protein domain and KEGG analyses showed that the dataset contains broad gene repertoires of both Porites and Symbiodinium. Effective utilization of sequence reads revealed that the polymorphism rate in P. australiensis is 1.0% and identified the major symbiotic Symbiodinium as Type C15. Analyses of amino acid biosynthetic pathways suggested that this Porites holobiont is probably able to synthesize most of the common amino acids and that Symbiodinium is potentially able to provide essential amino acids to its host. We believe this to be the first molecular evidence of complementarity in amino acid metabolism between coral hosts and their symbionts. We successfully assembled genes originating from both the host coral and the symbiotic Symbiodinium to create a snapshot of the coral holobiont transcriptome. This dataset will facilitate a deeper understanding of molecular mechanisms of coral symbioses

  16. A snapshot of a coral "holobiont": a transcriptome assembly of the scleractinian coral, porites, captures a wide variety of genes from both the host and symbiotic zooxanthellae.

    Directory of Open Access Journals (Sweden)

    Chuya Shinzato

    Full Text Available Massive scleractinian corals of the genus Porites are important reef builders in the Indo-Pacific, and they are more resistant to thermal stress than other stony corals, such as the genus Acropora. Because coral health and survival largely depend on the interaction between a coral host and its symbionts, it is important to understand the molecular interactions of an entire "coral holobiont". We simultaneously sequenced transcriptomes of Porites australiensis and its symbionts using the Illumina Hiseq2000 platform. We obtained 14.3 Gbp of sequencing data and assembled it into 74,997 contigs (average: 1,263 bp, N50 size: 2,037 bp. We successfully distinguished contigs originating from the host (Porites and the symbiont (Symbiodinium by aligning nucleotide sequences with the decoded Acropora digitifera and Symbiodinium minutum genomes. In contrast to previous coral transcriptome studies, at least 35% of the sequences were found to have originated from the symbionts, indicating that it is possible to analyze both host and symbiont transcriptomes simultaneously. Conserved protein domain and KEGG analyses showed that the dataset contains broad gene repertoires of both Porites and Symbiodinium. Effective utilization of sequence reads revealed that the polymorphism rate in P. australiensis is 1.0% and identified the major symbiotic Symbiodinium as Type C15. Analyses of amino acid biosynthetic pathways suggested that this Porites holobiont is probably able to synthesize most of the common amino acids and that Symbiodinium is potentially able to provide essential amino acids to its host. We believe this to be the first molecular evidence of complementarity in amino acid metabolism between coral hosts and their symbionts. We successfully assembled genes originating from both the host coral and the symbiotic Symbiodinium to create a snapshot of the coral holobiont transcriptome. This dataset will facilitate a deeper understanding of molecular mechanisms of

  17. De novo comparative transcriptome analysis of genes involved in fruit morphology of pumpkin cultivars with extreme size difference and development of EST-SSR markers.

    Science.gov (United States)

    Xanthopoulou, Aliki; Ganopoulos, Ioannis; Psomopoulos, Fotis; Manioudaki, Maria; Moysiadis, Theodoros; Kapazoglou, Aliki; Osathanunkul, Maslin; Michailidou, Sofia; Kalivas, Apostolos; Tsaftaris, Athanasios; Nianiou-Obeidat, Irini; Madesis, Panagiotis

    2017-07-30

    The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. 'Munchkin' (small-fruit) and cv. 'Big Moose' (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits. Copyright © 2017. Published by Elsevier B.V.

  18. De Novo Assembly and Genome Analyses of the Marine-Derived Scopulariopsis brevicaulis Strain LF580 Unravels Life-Style Traits and Anticancerous Scopularide Biosynthetic Gene Cluster.

    Science.gov (United States)

    Kumar, Abhishek; Henrissat, Bernard; Arvas, Mikko; Syed, Muhammad Fahad; Thieme, Nils; Benz, J Philipp; Sørensen, Jens Laurids; Record, Eric; Pöggeler, Stefanie; Kempken, Frank

    2015-01-01

    The marine-derived Scopulariopsis brevicaulis strain LF580 produces scopularides A and B, which have anticancerous properties. We carried out genome sequencing using three next-generation DNA sequencing methods. De novo hybrid assembly yielded 621 scaffolds with a total size of 32.2 Mb and 16298 putative gene models. We identified a large non-ribosomal peptide synthetase gene (nrps1) and supporting pks2 gene in the same biosynthetic gene cluster. This cluster and the genes within the cluster are functionally active as confirmed by RNA-Seq. Characterization of carbohydrate-active enzymes and major facilitator superfamily (MFS)-type transporters lead to postulate S. brevicaulis originated from a soil fungus, which came into contact with the marine sponge Tethya aurantium. This marine sponge seems to provide shelter to this fungus and micro-environment suitable for its survival in the ocean. This study also builds the platform for further investigations of the role of life-style and secondary metabolites from S. brevicaulis.

  19. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation

    DEFF Research Database (Denmark)

    Chana Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene

    2017-01-01

    The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi......-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads...... and provides a new molecular tool to assist biological research in cartilaginous fishes....

  20. De novo transcriptome sequencing of black pepper (Piper nigrum L.) and an analysis of genes involved in phenylpropanoid metabolism in response to Phytophthora capsici.

    Science.gov (United States)

    Hao, Chaoyun; Xia, Zhiqiang; Fan, Rui; Tan, Lehe; Hu, Lisong; Wu, Baoduo; Wu, Huasong

    2016-10-21

    Piper nigrum L., or "black pepper", is an economically important spice crop in tropical regions. Black pepper production is markedly affected by foot rot disease caused by Phytophthora capsici, and genetic improvement of black pepper is essential for combating foot rot diseases. However, little is known about the mechanism of anti- P. capsici in black pepper. The molecular mechanisms underlying foot rot susceptibility were studied by comparing transcriptome analysis between resistant (Piper flaviflorum) and susceptible (Piper nigrum cv. Reyin-1) black pepper species. 116,432 unigenes were acquired from six libraries (three replicates of resistant and susceptible black pepper samples), which were integrated by applying BLAST similarity searches and noted by adopting Kyoto Encyclopaedia of Genes and Gene Ontology (GO) genome orthology identifiers. The reference transcriptome was mapped using two sets of digital gene expression data. Using GO enrichment analysis for the differentially expressed genes, the majority of the genes associated with the phenylpropanoid biosynthesis pathway were identified in P. flaviflorum. In addition, the expression of genes revealed that after susceptible and resistant species were inoculated with P. capsici, the majority of genes incorporated in the phenylpropanoid metabolism pathway were up-regulated in both species. Among various treatments and organs, all the genes were up-regulated to a relatively high degree in resistant species. Phenylalanine ammonia lyase and peroxidase enzyme activity increased in susceptible and resistant species after inoculation with P. capsici, and the resistant species increased faster. The resistant plants retain their vascular structure in lignin revealed by histochemical analysis. Our data provide critical information regarding target genes and a technological basis for future studies of black pepper genetic improvements, including transgenic breeding.

  1. Next generation sequencing and de novo transcriptome analysis of Costus pictus D. Don, a non-model plant with potent anti-diabetic properties

    Directory of Open Access Journals (Sweden)

    Annadurai Ramasamy S

    2012-11-01

    Full Text Available Abstract Background Phyto-remedies for diabetic control are popular among patients with Type II Diabetes mellitus (DM, in addition to other diabetic control measures. A number of plant species are known to possess diabetic control properties. Costus pictus D. Don is popularly known as “Insulin Plant” in Southern India whose leaves have been reported to increase insulin pools in blood plasma. Next Generation Sequencing is employed as a powerful tool for identifying molecular signatures in the transcriptome related to physiological functions of plant tissues. We sequenced the leaf transcriptome of C. pictus using Illumina reversible dye terminator sequencing technology and used combination of bioinformatics tools for identifying transcripts related to anti-diabetic properties of C. pictus. Results A total of 55,006 transcripts were identified, of which 69.15% transcripts could be annotated. We identified transcripts related to pathways of bixin biosynthesis and geraniol and geranial biosynthesis as major transcripts from the class of isoprenoid secondary metabolites and validated the presence of putative norbixin methyltransferase, a precursor of Bixin. The transcripts encoding these terpenoids are known to be Peroxisome Proliferator-Activated Receptor (PPAR agonists and anti-glycation agents. Sequential extraction and High Performance Liquid Chromatography (HPLC confirmed the presence of bixin in C. pictus methanolic extracts. Another significant transcript identified in relation to anti-diabetic, anti-obesity and immuno-modulation is of Abscisic Acid biosynthetic pathway. We also report many other transcripts for the biosynthesis of antitumor, anti-oxidant and antimicrobial metabolites of C. pictus leaves. Conclusion Solid molecular signatures (transcripts related to bixin, abscisic acid, and geranial and geraniol biosynthesis for the anti-diabetic properties of C. pictus leaves and vital clues related to the other phytochemical functions

  2. The head-regeneration transcriptome of the planarian Schmidtea mediterranea

    Science.gov (United States)

    2011-01-01

    Background Planarian flatworms can regenerate their head, including a functional brain, within less than a week. Despite the enormous potential of these animals for medical research and regenerative medicine, the mechanisms of regeneration and the molecules involved remain largely unknown. Results To identify genes that are differentially expressed during early stages of planarian head regeneration, we generated a de novo transcriptome assembly from more than 300 million paired-end reads from planarian fragments regenerating the head at 16 different time points. The assembly yielded 26,018 putative transcripts, including very long transcripts spanning multiple genomic supercontigs, and thousands of isoforms. Using short-read data from two platforms, we analyzed dynamic gene regulation during the first three days of head regeneration. We identified at least five different temporal synexpression classes, including genes specifically induced within a few hours after injury. Furthermore, we characterized the role of a conserved Runx transcription factor, smed-runt-like1. RNA interference (RNAi) knockdown and immunofluorescence analysis of the regenerating visual system indicated that smed-runt-like1 encodes a transcriptional regulator of eye morphology and photoreceptor patterning. Conclusions Transcriptome sequencing of short reads allowed for the simultaneous de novo assembly and differential expression analysis of transcripts, demonstrating highly dynamic regulation during head regeneration in planarians. PMID:21846378

  3. Analysis of Pigeon (Columba) Ovary Transcriptomes to Identify Genes Involved in Blue Light Regulation

    Science.gov (United States)

    Wang, Ying; Ding, Jia-tong; Yang, Hai-ming; Yan, Zheng-jie; Cao, Wei; Li, Yang-bai

    2015-01-01

    Monochromatic light is widely applied to promote poultry reproductive performance, yet little is currently known regarding the mechanism by which light wavelengths affect pigeon reproduction. Recently, high-throughput sequencing technologies have been used to provide genomic information for solving this problem. In this study, we employed Illumina Hiseq 2000 to identify differentially expressed genes in ovary tissue from pigeons under blue and white light conditions and de novo transcriptome assembly to construct a comprehensive sequence database containing information on the mechanisms of follicle development. A total of 157,774 unigenes (mean length: 790 bp) were obtained by the Trinity program, and 35.83% of these unigenes were matched to genes in a non-redundant protein database. Gene description, gene ontology, and the clustering of orthologous group terms were performed to annotate the transcriptome assembly. Differentially expressed genes between blue and white light conditions included those related to oocyte maturation, hormone biosynthesis, and circadian rhythm. Furthermore, 17,574 SSRs and 533,887 potential SNPs were identified in this transcriptome assembly. This work is the first transcriptome analysis of the Columba ovary using Illumina technology, and the resulting transcriptome and differentially expressed gene data can facilitate further investigations into the molecular mechanism of the effect of blue light on follicle development and reproduction in pigeons and other bird species. PMID:26599806

  4. Analysis of Pigeon (Columba Ovary Transcriptomes to Identify Genes Involved in Blue Light Regulation.

    Directory of Open Access Journals (Sweden)

    Ying Wang

    Full Text Available Monochromatic light is widely applied to promote poultry reproductive performance, yet little is currently known regarding the mechanism by which light wavelengths affect pigeon reproduction. Recently, high-throughput sequencing technologies have been used to provide genomic information for solving this problem. In this study, we employed Illumina Hiseq 2000 to identify differentially expressed genes in ovary tissue from pigeons under blue and white light conditions and de novo transcriptome assembly to construct a comprehensive sequence database containing information on the mechanisms of follicle development. A total of 157,774 unigenes (mean length: 790 bp were obtained by the Trinity program, and 35.83% of these unigenes were matched to genes in a non-redundant protein database. Gene description, gene ontology, and the clustering of orthologous group terms were performed to annotate the transcriptome assembly. Differentially expressed genes between blue and white light conditions included those related to oocyte maturation, hormone biosynthesis, and circadian rhythm. Furthermore, 17,574 SSRs and 533,887 potential SNPs were identified in this transcriptome assembly. This work is the first transcriptome analysis of the Columba ovary using Illumina technology, and the resulting transcriptome and differentially expressed gene data can facilitate further investigations into the molecular mechanism of the effect of blue light on follicle development and reproduction in pigeons and other bird species.

  5. De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read.

    Science.gov (United States)

    Austin, Christopher M; Tan, Mun Hua; Harrisson, Katherine A; Lee, Yin Peng; Croft, Laurence J; Sunnucks, Paul; Pavlova, Alexandra; Gan, Han Ming

    2017-08-01

    One of the most iconic Australian fish is the Murray cod, Maccullochella peelii (Mitchell 1838), a freshwater species that can grow to ∼1.8 metres in length and live to age ≥48 years. The Murray cod is of a conservation concern as a result of strong population contractions, but it is also popular for recreational fishing and is of growing aquaculture interest. In this study, we report the whole genome sequence of the Murray cod to support ongoing population genetics, conservation, and management research, as well as to better understand the evolutionary ecology and history of the species. A draft Murray cod genome of 633 Mbp (N50 = 109 974bp; BUSCO and CEGMA completeness of 94.2% and 91.9%, respectively) with an estimated 148 Mbp of putative repetitive sequences was assembled from the combined sequencing data of 2 fish individuals with an identical maternal lineage; 47.2 Gb of Illumina HiSeq data and 804 Mb of Nanopore data were generated from the first individual while 23.2 Gb of Illumina MiSeq data were generated from the second individual. The inclusion of Nanopore reads for scaffolding followed by subsequent gap-closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively. We also report the first transcriptome of Murray cod that was subsequently used to annotate the Murray cod genome, leading to the identification of 26 539 protein-coding genes. We present the whole genome of the Murray cod and anticipate this will be a catalyst for a range of genetic, genomic, and phylogenetic studies of the Murray cod and more generally other fish species of the Percichthydae family. © The Authors 2017. Published by Oxford University Press.

  6. De novo analysis of Wolfiporia cocos transcriptome to reveal the differentially expressed carbohydrate-active enzymes (CAZymes genes during the early stage of sclerotial growth

    Directory of Open Access Journals (Sweden)

    Shaopeng eZhang

    2016-02-01

    Full Text Available The sclerotium of Wolfiporia cocos has been used as an edible mushroom and/or a traditional herbal medicine for centuries. W. cocos sclerotial formation is dependent on parasitism of the wood of Pinus species. Currently, the sclerotial development mechanisms of W. cocos remain largely unknown and the lack of pine resources limit the commercial production. The CAZymes (carbohydrate-active enzymes play important roles in degradation of the plant cell wall to provide carbohydrates for fungal growth, development and reproduction. In this study, the transcript profiles from W. cocos mycelium and two-months-old sclerotium, the early stage of sclerotial growth, were specially analyzed using de novo sequencing technology. A total of 142,428,180 high-quality reads of mycelium and 70,594,319 high-quality reads of two-months-old sclerotium were obtained. Additionally, differentially expressed genes from the W. cocos mycelium and two-months-old sclerotium stages were analyzed, resulting in identification of 69 CAZymes genes which were significantly up-regulated during the early stage of sclerotial growth compared to that of in mycelium stage, and more than half of them belonged to glycosyl hydrolases (GHs family, indicating the importance of W. cocos GHs family for degrading the pine woods. And qRT-PCR was further used to confirm the expression pattern of these up-regulated CAZymes genes. Our results will provide comprehensive CAZymes genes expression information during W. cocos sclerotial growth at the transcriptional level and will lay a foundation for functional genes studies in this fungus. In addition, our study will also facilitate the efficient use of limited pine resources, which is significant for promoting steady development of Chinese W. cocos industry.

  7. The transcriptome of the Didelphis virginiana opossum kidney OK proximal tubule cell line.

    Science.gov (United States)

    Eshbach, Megan L; Sethi, Rahil; Avula, Raghunandan; Lamb, Janette; Hollingshead, Deborah J; Finegold, David N; Locker, Joseph D; Chandran, Uma R; Weisz, Ora A

    2017-09-01

    The OK cell line derived from the kidney of a female opossum Didelphis virginiana has proven to be a useful model in which to investigate the unique regulation of ion transport and membrane trafficking mechanisms in the proximal tubule (PT). Sequence data and comparison of the transcriptome of this cell line to eutherian mammal PTs would further broaden the utility of this culture model. However, the genomic sequence for D. virginiana is not available and although a draft genome sequence for the opossum Monodelphis domestica (sequenced in 2012 by the Broad Institute) exists, transcripts sequenced from both species show significant divergence. The M. domestica sequence is not highly annotated, and the majority of transcripts are predicted rather than experimentally validated. Using deep RNA sequencing of the D. virginiana OK cell line, we characterized its transcriptome via de novo transcriptome assembly and alignment to the M. domestica genome. The quality of the de novo assembled transcriptome was assessed by the extent of homology to sequences in nucleotide and protein databases. Gene expression levels in the OK cell line, from both the de novo transcriptome and genes aligned to the M. domestica genome, were compared with publicly available rat kidney nephron segment expression data. Our studies demonstrate the expression in OK cells of numerous PT-specific ion transporters and other key proteins relevant for rodent and human PT function. Additionally, the sequence and expression data reported here provide an important resource for genetic manipulation and other studies on PT cell function using these cells. Copyright © 2017 the American Physiological Society.

  8. De novo transcriptome sequencing of Acer palmatum and comprehensive analysis of differentially expressed genes under salt stress in two contrasting genotypes.

    Science.gov (United States)

    Rong, Liping; Li, Qianzhong; Li, Shushun; Tang, Ling; Wen, Jing

    2016-04-01

    Maple (Acer palmatum) is an important species for landscape planting worldwide. Salt stress affects the normal growth of the Maple leaf directly, leading to loss of esthetic value. However, the limited availability of Maple genomic information has hindered research on the mechanisms underlying this tolerance. In this study, we performed comprehensive analyses of the salt tolerance in two genotypes of Maple using RNA-seq. Approximately 146.4 million paired-end reads, representing 181,769 unigenes, were obtained. The N50 length of the unigenes was 738 bp, and their total length over 102.66 Mb. 14,090 simple sequence repeats and over 500,000 single nucleotide polymorphisms were identified, which represent useful resources for marker development. Importantly, 181,769 genes were detected in at least one library, and 303 differentially expressed genes (DEGs) were identified between salt-sensitive and salt-tolerant genotypes. Among these DEGs, 125 were upregulated and 178 were downregulated genes. Two MYB-related proteins and one LEA protein were detected among the first 10 most downregulated genes. Moreover, a methyltransferase-related gene was detected among the first 10 most upregulated genes. The three most significantly enriched pathways were plant hormone signal transduction, arginine and proline metabolism, and photosynthesis. The transcriptome analysis provided a rich genetic resource for gene discovery related to salt tolerance in Maple, and in closely related species. The data will serve as an important public information platform to further our understanding of the molecular mechanisms involved in salt tolerance in Maple.

  9. Sequencing and characterization of the guppy (Poecilia reticulata transcriptome

    Directory of Open Access Journals (Sweden)

    Rodd F Helen

    2011-04-01

    Full Text Available Abstract Background Next-generation sequencing is providing researchers with a relatively fast and affordable option for developing genomic resources for organisms that are not among the traditional genetic models. Here we present a de novo assembly of the guppy (Poecilia reticulata transcriptome using 454 sequence reads, and we evaluate potential uses of this transcriptome, including detection of sex-specific transcripts and deployment as a reference for gene expression analysis in guppies and a related species. Guppies have been model organisms in ecology, evolutionary biology, and animal behaviour for over 100 years. An annotated transcriptome and other genomic tools will facilitate understanding the genetic and molecular bases of adaptation and variation in a vertebrate species with a uniquely well known natural history. Results We generated approximately 336 Mbp of mRNA sequence data from male brain, male body, female brain, and female body. The resulting 1,162,670 reads assembled into 54,921 contigs, creating a reference transcriptome for the guppy with an average read depth of 28×. We annotated nearly 40% of this reference transcriptome by searching protein and gene ontology databases. Using this annotated transcriptome database, we identified candidate genes of interest to the guppy research community, putative single nucleotide polymorphisms (SNPs, and male-specific expressed genes. We also showed that our reference transcriptome can be used for RNA-sequencing-based analysis of differential gene expression. We identified transcripts that, in juveniles, are regulated differently in the presence and absence of an important predator, Rivulus hartii, including two genes implicated in stress response. For each sample in the RNA-seq study, >50% of high-quality reads mapped to unique sequences in the reference database with high confidence. In addition, we evaluated the use of the guppy reference transcriptome for gene expression analyses in

  10. Novel transcriptome resources for three scleractinian coral species from the Indo-Pacific.

    Science.gov (United States)

    Kenkel, Carly D; Bay, Line K

    2017-09-01

    Transcriptomic resources for coral species can provide insight into coral evolutionary history and stress-response physiology. Goniopora columna, Galaxea astreata, and Galaxea acrhelia are scleractinian corals of the Indo-Pacific, representing a diversity of morphologies and life-history traits. G. columna and G. astreata are common and cosmopolitan, while G. acrhelia is largely restricted to the coral triangle and Great Barrier Reef. Reference transcriptomes for these species were assembled from replicate colony fragments exposed to elevated (31°C) and ambient (27°C) temperatures. Trinity was used to create de novo assemblies for each species from 92-102 million raw Illumina Hiseq 2 × 150 bp reads. Host-specific assemblies contained 65 460-72 405 contigs, representing 26 693-37 894 isogroups (∼genes) with an average N50 of 2254. Gene name and/or gene ontology annotations were possible for 58% of isogroups on average. Transcriptomes contained 93.1-94.3% of EuKaryotic Orthologous Groups comprising the core eukaryotic gene set, and 89.98-91.92% of the single-copy metazoan core gene set orthologs were complete, indicating fairly comprehensive assemblies. This work expands the complement of transcriptomic resources available for scleractinian coral species, including the first reference for a representative of Goniopora spp. as well as species with novel morphology. © The Authors 2017. Published by Oxford University Press.

  11. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    Science.gov (United States)

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition

  12. Comparative Transcriptomics Among Four White Pine Species

    Directory of Open Access Journals (Sweden)

    Ethan A. G. Baker

    2018-05-01

    Full Text Available Conifers are the dominant plant species throughout the high latitude boreal forests as well as some lower latitude temperate forests of North America, Europe, and Asia. As such, they play an integral economic and ecological role across much of the world. This study focused on the characterization of needle transcriptomes from four ecologically important and understudied North American white pines within the Pinus subgenus Strobus. The populations of many Strobus species are challenged by native and introduced pathogens, native insects, and abiotic factors. RNA from the needles of western white pine (Pinus monticola, limber pine (Pinus flexilis, whitebark pine (Pinus albicaulis, and sugar pine (Pinus lambertiana was sampled, Illumina short read sequenced, and de novo assembled. The assembled transcripts and their subsequent structural and functional annotations were processed through custom pipelines to contend with the challenges of non-model organism transcriptome validation. Orthologous gene family analysis of over 58,000 translated transcripts, implemented through Tribe-MCL, estimated the shared and unique gene space among the four species. This revealed 2025 conserved gene families, of which 408 were aligned to estimate levels of divergence and reveal patterns of selection. Specific candidate genes previously associated with drought tolerance and white pine blister rust resistance in conifers were investigated.

  13. Cell type-specific responses to salinity - the epidermal bladder cell transcriptome of Mesembryanthemum crystallinum.

    Science.gov (United States)

    Oh, Dong-Ha; Barkla, Bronwyn J; Vera-Estrella, Rosario; Pantoja, Omar; Lee, Sang-Yeol; Bohnert, Hans J; Dassanayake, Maheshi

    2015-08-01

    Mesembryanthemum crystallinum (ice plant) exhibits extreme tolerance to salt. Epidermal bladder cells (EBCs), developing on the surface of aerial tissues and specialized in sodium sequestration and other protective functions, are critical for the plant's stress adaptation. We present the first transcriptome analysis of EBCs isolated from intact plants, to investigate cell type-specific responses during plant salt adaptation. We developed a de novo assembled, nonredundant EBC reference transcriptome. Using RNAseq, we compared the expression patterns of the EBC-specific transcriptome between control and salt-treated plants. The EBC reference transcriptome consists of 37 341 transcript-contigs, of which 7% showed significantly different expression between salt-treated and control samples. We identified significant changes in ion transport, metabolism related to energy generation and osmolyte accumulation, stress signalling, and organelle functions, as well as a number of lineage-specific genes of unknown function, in response to salt treatment. The salinity-induced EBC transcriptome includes active transcript clusters, refuting the view of EBCs as passive storage compartments in the whole-plant stress response. EBC transcriptomes, differing from those of whole plants or leaf tissue, exemplify the importance of cell type-specific resolution in understanding stress adaptive mechanisms. No claim to original US government works. New Phytologist © 2015 New Phytologist Trust.

  14. Comparative De Novo Transcriptome Analysis of Fertilized Ovules in Xanthoceras sorbifolium Uncovered a Pool of Genes Expressed Specifically or Preferentially in the Selfed Ovule That Are Potentially Involved in Late-Acting Self-Incompatibility.

    Directory of Open Access Journals (Sweden)

    Qingyuan Zhou

    Full Text Available Xanthoceras sorbifolium, a tree species endemic to northern China, has high oil content in its seeds and is recognized as an important biodiesel crop. The plant is characterized by late-acting self-incompatibility (LSI. LSI was found to occur in many angiosperm species and plays an important role in reducing inbreeding and its harmful effects, as do gametophytic self-incompatibility (GSI and sporophytic self-incompatibility (SSI. Molecular mechanisms of conventional GSI and SSI have been well characterized in several families, but no effort has been made to identify the genes involved in the LSI process. The present studies indicated that there were no significant differences in structural and histological features between the self- and cross-pollinated ovules during the early stages of ovule development until 5 days after pollination (DAP. This suggests that 5 DAP is likely to be a turning point for the development of the selfed ovules. Comparative de novo transcriptome analysis of the selfed and crossed ovules at 5 DAP identified 274 genes expressed specifically or preferentially in the selfed ovules. These genes contained a significant proportion of genes predicted to function in the biosynthesis of secondary metabolites, consistent with our histological observations in the fertilized ovules. The genes encoding signal transduction-related components, such as protein kinases and protein phosphatases, are overrepresented in the selfed ovules. X. sorbifolium selfed ovules also specifically or preferentially express many unique transcription factor (TF genes that could potentially be involved in the novel mechanisms of LSI. We also identified 42 genes significantly up-regulated in the crossed ovules compared to the selfed ovules. The expression of all 16 genes selected from the RNA-seq data was validated using PCR in the selfed and crossed ovules. This study represents the first genome-wide identification of genes expressed in the fertilized

  15. Transcriptomic analysis, genic SSR development, and genetic diversity of proso millet (Panicum miliaceum; Poaceae).

    Science.gov (United States)

    Hou, Siyu; Sun, Zhaoxia; Li, Yaoshen; Wang, Yijie; Ling, Hubin; Xing, Guofang; Han, Yuanhuai; Li, Hongying

    2017-07-01

    Proso millet ( Panicum miliaceum ; Poaceae) is a minor crop with good nutritional qualities and strong tolerance to drought stress and soil infertility. However, studies on genetic diversity have been limited due to a lack of efficient genetic markers. Illumina sequencing technology was used to generate short read sequences of proso millet, and de novo transcriptome assemblies were used to develop a de novo assembly of proso millet. Genic simple sequence repeat (SSR) markers were identified and used to detect polymorphism among 56 accessions. Population structure and genetic similarity coefficient were estimated. In total, 25,341 unique gene sequences and 4724 SSR loci were obtained from the transcriptome, of which 229 pairs of SSR primers were validated, which resulted in 14 polymorphic genic SSR primers exhibiting 43 total alleles. According to the ratio of polymorphic markers (6.1%, 14/229), there are potentially 288 polymorphic genic SSR markers available for genetic assay development in the future. Bayesian population analyses showed that the 56 accessions comprised two distinct groups. A genetic structure and cluster assay indicated that the accessions from the Loess Plateau of China shared a high genetic similarity coefficient with those from other regions and that there was no correlation between genetic diversity and geographic origin. The transcriptome sequencing data and millet-specific SSR markers developed in this study establish an excellent resource for gene discovery and may improve the development of breeding programs in proso millet in the future.

  16. Transcriptomic analysis, genic SSR development, and genetic diversity of proso millet (Panicum miliaceum; Poaceae)1

    Science.gov (United States)

    Hou, Siyu; Sun, Zhaoxia; Li, Yaoshen; Wang, Yijie; Ling, Hubin; Xing, Guofang; Han, Yuanhuai; Li, Hongying

    2017-01-01

    Premise of the study: Proso millet (Panicum miliaceum; Poaceae) is a minor crop with good nutritional qualities and strong tolerance to drought stress and soil infertility. However, studies on genetic diversity have been limited due to a lack of efficient genetic markers. Methods: Illumina sequencing technology was used to generate short read sequences of proso millet, and de novo transcriptome assemblies were used to develop a de novo assembly of proso millet. Genic simple sequence repeat (SSR) markers were identified and used to detect polymorphism among 56 accessions. Population structure and genetic similarity coefficient were estimated. Results: In total, 25,341 unique gene sequences and 4724 SSR loci were obtained from the transcriptome, of which 229 pairs of SSR primers were validated, which resulted in 14 polymorphic genic SSR primers exhibiting 43 total alleles. According to the ratio of polymorphic markers (6.1%, 14/229), there are potentially 288 polymorphic genic SSR markers available for genetic assay development in the future. Bayesian population analyses showed that the 56 accessions comprised two distinct groups. Discussion: A genetic structure and cluster assay indicated that the accessions from the Loess Plateau of China shared a high genetic similarity coefficient with those from other regions and that there was no correlation between genetic diversity and geographic origin. The transcriptome sequencing data and millet-specific SSR markers developed in this study establish an excellent resource for gene discovery and may improve the development of breeding programs in proso millet in the future. PMID:28791202

  17. De novo transcriptome assembly analysis of weed Apera spica-venti from seven tissues and growth stages

    DEFF Research Database (Denmark)

    Babineau, Marielle; Mahmood, Khalid; Mathiassen, Solvejg Kopp

    2017-01-01

    Background Loose silky bentgrass (Apera spica-venti) is an important weed in Europe with a recent increase in herbicide resistance cases. The lack of genetic information about this noxious weed limits its biological understanding such as growth, reproduction, genetic variation, molecular ecology ...

  18. De novo assembly, functional annotation, and analysis of the giant reed (Arundo donax L.) leaf transcriptome provide tools for the development of a biofuel feedstock

    NARCIS (Netherlands)

    Evangelistella, Chiara; Valentini, Alessio; Ludovisi, Riccardo; Firrincieli, Andrea; Fabbrini, Francesco; Scalabrin, Simone; Cattonaro, Federica; Morgante, Michele; Mugnozza, Giuseppe Scarascia; Keurentjes, Joost J.B.; Harfouche, Antoine

    2017-01-01

    Background: Arundo donax has attracted renewed interest as a potential candidate energy crop for use in biomass-to-liquid fuel conversion processes and biorefineries. This is due to its high productivity, adaptability to marginal land conditions, and suitability for biofuel and biomaterial

  19. De novo transcriptome assembly facilitates characterisation of fast-evolving gene families, MHC class I in the bank vole (Myodes glareolus)

    Czech Academy of Sciences Publication Activity Database

    Migalska, M.; Sebastian, A.; Konczal, M.; Kotlík, Petr; Radwan, J.

    2017-01-01

    Roč. 118, č. 4 (2017), s. 348-357 ISSN 0018-067X R&D Projects: GA ČR GAP506/11/1872; GA ČR(CZ) GA16-03248S Institutional support: RVO:67985904 Keywords : bank vole * major histocompatibility complex * RNA-seq data Subject RIV: EB - Genetics ; Molecular Biology OBOR OECD: Genetics and heredity (medical genetics to be 3) Impact factor: 3.961, year: 2016

  20. Development of EST-SSR markers in flowering Chinese cabbage (Brassica campestris L. ssp. chinensis var. utilis Tsen et Lee) based on de novo transcriptomeic assemblies

    Science.gov (United States)

    Flowering Chinese cabbage is one of the most important vegetable crops in southern China. Genetic improvement of various agronomic traits in this crop is underway to meet high market demand in the region, but the progress is hampered by limited number of molecular markers available in this crop. Thi...

  1. Origins of De Novo Genes in Human and Chimpanzee.

    Science.gov (United States)

    Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar

    2015-12-01

    The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.

  2. Desiccation tolerance in bryophytes: The dehydration and rehydration transcriptomes in the desiccation-tolerant bryophyte Bryum argenteum.

    Science.gov (United States)

    Gao, Bei; Li, Xiaoshuang; Zhang, Daoyuan; Liang, Yuqing; Yang, Honglan; Chen, Moxian; Zhang, Yuanming; Zhang, Jianhua; Wood, Andrew J

    2017-08-08

    The desiccation tolerant bryophyte Bryum argenteum is an important component of desert biological soil crusts (BSCs) and is emerging as a model system for studying vegetative desiccation tolerance. Here we present and analyze the hydration-dehydration-rehydration transcriptomes in B. argenteum to establish a desiccation-tolerance transcriptomic atlas. B. argenteum gametophores representing five different hydration stages (hydrated (H0), dehydrated for 2 h (D2), 24 h (D24), then rehydrated for 2 h (R2) and 48 h (R48)), were sampled for transcriptome analyses. Illumina high throughput RNA-Seq technology was employed and generated more than 488.46 million reads. An in-house de novo transcriptome assembly optimization pipeline based on Trinity assembler was developed to obtain a reference Hydration-Dehydration-Rehydration (H-D-R) transcriptome comprising of 76,206 transcripts, with an N50 of 2,016 bp and average length of 1,222 bp. Comprehensive transcription factor (TF) annotation discovered 978 TFs in 62 families, among which 404 TFs within 40 families were differentially expressed upon dehydration-rehydration. Pfam term enrichment analysis revealed 172 protein families/domains were significantly associated with the H-D-R cycle and confirmed early rehydration (i.e. the R2 stage) as exhibiting the maximum stress-induced changes in gene expression.

  3. Sex-specific differences in transcriptome profiles of brain and muscle tissue of the tropical gar.

    Science.gov (United States)

    Cribbin, Kayla M; Quackenbush, Corey R; Taylor, Kyle; Arias-Rodriguez, Lenin; Kelley, Joanna L

    2017-04-07

    The tropical gar (Atractosteus tropicus) is the southernmost species of the seven extant species of gar fishes in the world. In Mexico and Central America, the species is an important food source due to its nutritional quality and low price. Despite its regional importance and increasing concerns about overexploitation and habitat degradation, basic genetic information on the tropical gar is lacking. Determining genetic information on the tropical gar is important for the sustainable management of wild populations, implementation of best practices in aquaculture settings, evolutionary studies of ancient lineages, and an understanding of sex-specific gene expression. In this study, the transcriptome of the tropical gar was sequenced and assembled de novo using tissues from three males and three females using Illumina sequencing technology. Sex-specific and highly differentially expressed transcripts in brain and muscle tissues between adult males and females were subsequently identified. The transcriptome was assembled de novo resulting in 80,611 transcripts with a contig N50 of 3,355 base pairs and over 168 kilobases in total length. Male muscle, brain, and gonad as well as female muscle and brain were included in the assembly. The assembled transcriptome was annotated to identify the putative function of expressed transcripts using Trinotate and SwissProt, a database of well-annotated proteins. The brain and muscle datasets were then aligned to the assembled transcriptome to identify transcripts that were differentially expressed between males and females. The contrast between male and female brain identified 109 transcripts from 106 genes that were significantly differentially expressed. In the muscle comparison, 82 transcripts from 80 genes were identified with evidence for significant differential expression. Almost all genes identified as differentially expressed were sex-specific. The differentially expressed transcripts were enriched for genes involved in

  4. Whole transcriptome analysis using next-generation sequencing of model species Setaria viridis to support C4 photosynthesis research.

    Science.gov (United States)

    Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang

    2013-09-01

    Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.

  5. Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.

    Science.gov (United States)

    Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong

    2014-05-01

    We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.

  6. Analysis of the salivary gland transcriptome of Frankliniella occidentalis.

    Directory of Open Access Journals (Sweden)

    Candice A Stafford-Banks

    Full Text Available Saliva is known to play a crucial role in insect feeding behavior and virus transmission. Currently, little is known about the salivary glands and saliva of thrips, despite the fact that Frankliniella occidentalis (Pergande (the western flower thrips is a serious pest due to its destructive feeding, wide host range, and transmission of tospoviruses. As a first step towards characterizing thrips salivary gland functions, we sequenced the transcriptome of the primary salivary glands of F. occidentalis using short read sequencing (Illumina technology. A de novo-assembled transcriptome revealed 31,392 high quality contigs with an average size of 605 bp. A total of 12,166 contigs had significant BLASTx or tBLASTx hits (E≤1.0E-6 to known proteins, whereas a high percentage (61.24% of contigs had no apparent protein or nucleotide hits. Comparison of the F. occidentalis salivary gland transcriptome (sialotranscriptome against a published F. occidentalis full body transcriptome assembled from Roche-454 reads revealed several contigs with putative annotations associated with salivary gland functions. KEGG pathway analysis of the sialotranscriptome revealed that the majority (18 out of the top 20 predicted KEGG pathways of the salivary gland contig sequences match proteins involved in metabolism. We identified several genes likely to be involved in detoxification and inhibition of plant defense responses including aldehyde dehydrogenase, metalloprotease, glucose oxidase, glucose dehydrogenase, and regucalcin. We also identified several genes that may play a role in the extra-oral digestion of plant structural tissues including β-glucosidase and pectin lyase; and the extra-oral digestion of sugars, including α-amylase, maltase, sucrase, and α-glucosidase. This is the first analysis of a sialotranscriptome for any Thysanopteran species and it provides a foundational tool to further our understanding of how thrips interact with their plant hosts and the

  7. Global transcriptome analysis of developing chickpea (Cicer arietinum L.) seeds.

    Science.gov (United States)

    Pradhan, Seema; Bandhiwal, Nitesh; Shah, Niraj; Kant, Chandra; Gaur, Rashmi; Bhatia, Sabhyata

    2014-01-01

    Understanding developmental processes, especially in non-model crop plants, is extremely important in order to unravel unique mechanisms regulating development. Chickpea (C. arietinum L.) seeds are especially valued for their high carbohydrate and protein content. Therefore, in order to elucidate the mechanisms underlying seed development in chickpea, deep sequencing of transcriptomes from four developmental stages was undertaken. In this study, next generation sequencing platform was utilized to sequence the transcriptome of four distinct stages of seed development in chickpea. About 1.3 million reads were generated which were assembled into 51,099 unigenes by merging the de novo and reference assemblies. Functional annotation of the unigenes was carried out using the Uniprot, COG and KEGG databases. RPKM based digital expression analysis revealed specific gene activities at different stages of development which was validated using Real time PCR analysis. More than 90% of the unigenes were found to be expressed in at least one of the four seed tissues. DEGseq was used to determine differentially expressing genes which revealed that only 6.75% of the unigenes were differentially expressed at various stages. Homology based comparison revealed 17.5% of the unigenes to be putatively seed specific. Transcription factors were predicted based on HMM profiles built using TF sequences from five legume plants and analyzed for their differential expression during progression of seed development. Expression analysis of genes involved in biosynthesis of important secondary metabolites suggested that chickpea seeds can serve as a good source of antioxidants. Since transcriptomes are a valuable source of molecular markers like simple sequence repeats (SSRs), about 12,000 SSRs were mined in chickpea seed transcriptome and few of them were validated. In conclusion, this study will serve as a valuable resource for improved chickpea breeding.

  8. Global transcriptome analysis of developing chickpea (Cicer arietinum L. seeds

    Directory of Open Access Journals (Sweden)

    Seema ePradhan

    2014-12-01

    Full Text Available Understanding developmental processes, especially in non-model crop plants, is extremely important in order to unravel unique mechanisms regulating development. Chickpea (C. arietinum L. seeds are especially valued for their high carbohydrate and protein content. Therefore, in order to elucidate the mechanisms underlying seed development in chickpea, deep sequencing of transcriptomes from four developmental stages was undertaken. In this study, next generation sequencing platform was utilised to sequence the transcriptome of four distinct stages of seed development in chickpea. About 1.3 million reads were generated which were assembled into 51,099 unigenes by merging the de novo and reference assemblies. Functional annotation of the unigenes was carried out using the Uniprot, COG and KEGG databases. RPKM based digital expression analysis revealed specific gene activities at different stages of development which was validated using Real time PCR analysis. More than 90% of the unigenes were found to be expressed in at least one of the four seed tissues. DEGseq was used to determine differentially expressing genes which revealed that only 6.75% of the unigenes were differentially expressed at various stages. Homology based comparison revealed 17.5% of the unigenes to be putatively seed specific. Transcription factors were predicted based on HMM profiles built using TF sequences from five legume plants and analysed for their differential expression during progression of seed development. Expression analysis of genes involved in biosynthesis of important secondary metabolites suggested that chickpea seeds can serve as a good source of antioxidants. Since transcriptomes are a valuable source of molecular markers like simple sequence repeats (SSRs, about 12,000 SSRs were mined in chickpea seed transcriptome and few of them were validated. In conclusion, this study will serve as a valuable resource for improved chickpea breeding.

  9. Transcriptome profile of Trichoderma harzianum IOC-3844 induced by sugarcane bagasse.

    Science.gov (United States)

    Horta, Maria Augusta Crivelente; Vicentini, Renato; Delabona, Priscila da Silva; Laborda, Prianda; Crucello, Aline; Freitas, Sindélia; Kuroshu, Reginaldo Massanobu; Polikarpov, Igor; Pradella, José Geraldo da Cruz; Souza, Anete Pereira

    2014-01-01

    Profiling the transcriptome that underlies biomass degradation by the fungus Trichoderma harzianum allows the identification of gene sequences with potential application in enzymatic hydrolysis processing. In the present study, the transcriptome of T. harzianum IOC-3844 was analyzed using RNA-seq technology. The sequencing generated 14.7 Gbp for downstream analyses. De novo assembly resulted in 32,396 contigs, which were submitted for identification and classified according to their identities. This analysis allowed us to define a principal set of T. harzianum genes that are involved in the degradation of cellulose and hemicellulose and the accessory genes that are involved in the depolymerization of biomass. An additional analysis of expression levels identified a set of carbohydrate-active enzymes that are upregulated under different conditions. The present study provides valuable information for future studies on biomass degradation and contributes to a better understanding of the role of the genes that are involved in this process.

  10. SNP discovery in the transcriptome of white Pacific shrimp Litopenaeus vannamei by next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Yang Yu

    Full Text Available The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.

  11. Genome Assembly of the Fungus Cochliobolus miyabeanus, and Transcriptome Analysis during Early Stages of Infection on American Wildrice (Zizania palustris L..

    Directory of Open Access Journals (Sweden)

    Claudia V Castell-Miller

    Full Text Available The fungus Cochliobolus miyabeanus causes severe leaf spot disease on rice (Oryza sativa and two North American specialty crops, American wildrice (Zizania palustris and switchgrass (Panicum virgatum. Despite the importance of C. miyabeanus as a disease-causing agent in wildrice, little is known about either the mechanisms of pathogenicity or host defense responses. To start bridging these gaps, the genome of C. miyabeanus strain TG12bL2 was shotgun sequenced using Illumina technology. The genome assembly consists of 31.79 Mbp in 2,378 scaffolds with an N50 = 74,921. It contains 11,000 predicted genes of which 94.5% were annotated. Approximately 10% of total gene number is expected to be secreted. The C. miyabeanus genome is rich in carbohydrate active enzymes, and harbors 187 small secreted peptides (SSPs and some fungal effector homologs. Detoxification systems were represented by a variety of enzymes that could offer protection against plant defense compounds. The non-ribosomal peptide synthetases and polyketide synthases (PKS present were common to other Cochliobolus species. Additionally, the fungal transcriptome was analyzed at 48 hours after inoculation in planta. A total of 10,674 genes were found to be expressed, some of which are known to be involved in pathogenicity or response to host defenses including hydrophobins, cutinase, cell wall degrading enzymes, enzymes related to reactive oxygen species scavenging, PKS, detoxification systems, SSPs, and a known fungal effector. This work will facilitate future research on C. miyabeanus pathogen-associated molecular patterns and effectors, and in the identification of their corresponding wildrice defense mechanisms.

  12. Transcriptomic analysis of the oleaginous microalga Neochloris oleoabundans reveals metabolic insights into triacylglyceride accumulation

    Directory of Open Access Journals (Sweden)

    Rismani-Yazdi Hamid

    2012-09-01

    Full Text Available Abstract Background The lack of sequenced genomes for oleaginous microalgae limits our understanding of the mechanisms these organisms utilize to become enriched in triglycerides. Here we report the de novo transcriptome assembly and quantitative gene expression analysis of the oleaginous microalga Neochloris oleoabundans, with a focus on the complex interaction of pathways associated with the production of the triacylglycerol (TAG biofuel precursor. Results After growth under nitrogen replete and nitrogen limiting conditions, we quantified the cellular content of major biomolecules including total lipids, triacylglycerides, starch, protein, and chlorophyll. Transcribed genes were sequenced, the transcriptome was assembled de novo, and the expression of major functional categories, relevant pathways, and important genes was quantified through the mapping of reads to the transcriptome. Over 87 million, 77 base pair high quality reads were produced on the Illumina HiSeq sequencing platform. Metabolite measurements supported by genes and pathway expression results indicated that under the nitrogen-limiting condition, carbon is partitioned toward triglyceride production, which increased fivefold over the nitrogen-replete control. In addition to the observed overexpression of the fatty acid synthesis pathway, TAG production during nitrogen limitation was bolstered by repression of the β-oxidation pathway, up-regulation of genes encoding for the pyruvate dehydrogenase complex which funnels acetyl-CoA to lipid biosynthesis, activation of the pentose phosphate pathway to supply reducing equivalents to inorganic nitrogen assimilation and fatty acid biosynthesis, and the up-regulation of lipases—presumably to reconstruct cell membranes in order to supply additional fatty acids for TAG biosynthesis. Conclusions Our quantitative transcriptome study reveals a broad overview of how nitrogen stress results in excess TAG production in N. oleoabundans, and

  13. De Novo Assembly of Candida sojae and Candida boidinii Genomes, Unexplored Xylose-Consuming Yeasts with Potential for Renewable Biochemical Production

    Science.gov (United States)

    Borelli, Guilherme; José, Juliana; Teixeira, Paulo José Pereira Lima; dos Santos, Leandro Vieira

    2016-01-01

    Candida boidinii and Candida sojae yeasts were isolated from energy cane bagasse and plague-insects. Both have fast xylose uptake rate and produce great amounts of xylitol, which are interesting features for food and 2G ethanol industries. Because they lack published genomes, we have sequenced and assembled them, offering new possibilities for gene prospection. PMID:26769937

  14. Peroxidase gene discovery from the horseradish transcriptome.

    Science.gov (United States)

    Näätsaari, Laura; Krainer, Florian W; Schubert, Michael; Glieder, Anton; Thallinger, Gerhard G

    2014-03-24

    Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes.

  15. Transcriptome Analysis of Drought-Tolerant CAM plants Agave deserti and Agave tequilana

    Energy Technology Data Exchange (ETDEWEB)

    Gross, Stephen M.; Martin, Jeffrey A.; Simpson, June; Wang, Zhong; Visel, Axel

    2013-03-25

    Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development.

  16. Transcriptome datasets of oil palm pathogen Ganoderma boninense

    Directory of Open Access Journals (Sweden)

    Irene Liza Isaac

    2018-04-01

    Full Text Available Ganoderma boninense is known to be the causal agent for basal stem rot (BSR affecting the oil palm industry worldwide thus cumulating to high economic losses every year. Several reports have shown that a compatible monokaryon pair needs to mate; producing dikaryotic mycelia to initiate the infection towards the oil palm. However, the molecular events occurs during mating process are not well understood. We performed transcriptome sequencing using Illumina RNA-seq technology and de novo assembly of the transcripts from monokaryon, mating junction and dikaryon mycelia of G. boninense. Raw reads from these three libraries were deposited in the NCBI database with accession number SRR1745787, SRR1745773 and SRR1745777, respectively.

  17. Comparative analysis of transcriptomes in aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing

    Directory of Open Access Journals (Sweden)

    Taketo Okada

    2016-12-01

    Full Text Available Ephedra plants are taxonomically classified as gymnosperms, and are medicinally important as the botanical origin of crude drugs and as bioresources that contain pharmacologically active chemicals. Here we show a comparative analysis of the transcriptomes of aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing by RNA-Seq. De novo assembly of short cDNA sequence reads generated 23,358, 13,373, and 28,579 contigs longer than 200 bases from aerial stems, roots, or both aerial stems and roots, respectively. The presumed functions encoded by these contig sequences were annotated by BLAST (blastx. Subsequently, these contigs were classified based on gene ontology slims, Enzyme Commission numbers, and the InterPro database. Furthermore, comparative gene expression analysis was performed between aerial stems and roots. These transcriptome analyses revealed differences and similarities between the transcriptomes of aerial stems and roots in E. sinica. Deep transcriptome sequencing of Ephedra should open the door to molecular biological studies based on the entire transcriptome, tissue- or organ-specific transcriptomes, or targeted genes of interest.

  18. The testis and ovary transcriptomes of the rock bream (Oplegnathus fasciatus: A bony fish with a unique neo Y chromosome

    Directory of Open Access Journals (Sweden)

    Dongdong Xu

    2016-03-01

    Full Text Available The rock bream (Oplegnathus fasciatus is considerably one of the most economically important marine fish in East Asia and has a unique neo-Y chromosome system that is a good model to study the sex determination and differentiation in fish. In the present study, we used Illumina sequencing technology (HiSeq2000 to sequence, assemble and annotate the transcriptome of the testis and ovary tissues of rock bream. A total of 40,004,378 (NCBI SRA database SRX1406649 and 53,108,992 (NCBI SRA database SRX1406648 high quality reads were obtained from testis and ovary RNA sequencing, respectively, and 60,421 contigs (with average length of 1301 bp were obtained after de novo assembling with Trinity software. Digital gene expression analysis reveals 14,036 contigs that show gender-enriched expressional profile with either testis-enriched (237 contigs or ovary-enriched (581 contigs with RPKM >100. There are 237 male- and 582 female-abundant expressed genes that show sex dimorphic expression. We hope that the gonad transcriptome and those gender-enriched transcripts of rock bream can provide some insight into the understanding of genome-wide transcriptome profile of teleost gonad tissue and give useful information in fish gonad development. Keywords: Gonad transcriptome, Testis, Ovary, Rock bream

  19. Detailed transcriptome description of the neglected cestode Taenia multiceps.

    Science.gov (United States)

    Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou

    2012-01-01

    The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and

  20. Detailed transcriptome description of the neglected cestode Taenia multiceps.

    Directory of Open Access Journals (Sweden)

    Xuhang Wu

    Full Text Available BACKGROUND: The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. METHODOLOGY/PRINCIPAL FINDINGS: We obtained a total of 31,282 unigenes (mean length 920 bp using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam. We identified 26,110 (83.47% unigenes and inferred 20,896 (66.8% coding sequences (CDS. Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. CONCLUSIONS/SIGNIFICANCE: This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of

  1. Comparative Transcriptomics to Identify Novel Genes and Pathways in Dinoflagellates

    Science.gov (United States)

    Ryan, D.

    2016-02-01

    The unarmored dinoflagellate Karenia brevis is among the most prominent harmful, bloom-forming phytoplankton species in the Gulf of Mexico. During blooms, the polyketides PbTx-1 and PbTx-2 (brevetoxins) are produced by K. brevis. Brevetoxins negatively impact human health and the Gulf shellfish harvest. However, the genes underlying brevetoxin synthesis are currently unknown. Because the K. brevis genome is extremely large ( 1 × 1011 base pairs long), and with a high proportion of repetitive, non-coding DNA, it has not been sequenced. In fact, large, repetitive genomes are common among the dinoflagellate group. High-throughput RNA sequencing technology enabled us to assemble Karenia transcriptomes de novo and investigate potential genes in the brevetoxin pathway through comparative transcriptomics. The brevetoxin profile varies among K. brevis clonal cultures. For example, well-documented Wilson-CCFWC268 typically produces 8-10 pg PbTx per cell, whereas SP1 produces differences in gene expression. Of the 85,000 transcripts in the K. brevis transcriptome, 4,600 transcripts, including novel unannotated orthologs and putative polyketide synthases (PKSs), were only expressed by brevetoxin-producing K. brevis and K. papilionacea, not K. mikimotoi. Examination of gene expression between the typical- and low-toxin Wilson clones identified about 3,500 genes with significantly different expression levels, including 2 putative PKSs. One of the 2 PKSs was only found in the brevetoxin-producing Karenia species. These transcriptomes could not have been characterized without high-throughput RNA sequencing.

  2. Transcriptome analysis of the response of Burmese python to digestion.

    Science.gov (United States)

    Duan, Jinjie; Sanggaard, Kristian Wejse; Schauser, Leif; Lauridsen, Sanne Enok; Enghild, Jan J; Schierup, Mikkel Heide; Wang, Tobias

    2017-08-01

    Exceptional and extreme feeding behaviour makes the Burmese python (Python bivittatus) an interesting model to study physiological remodelling and metabolic adaptation in response to refeeding after prolonged starvation. In this study, we used transcriptome sequencing of 5 visceral organs during fasting as well as 24 hours and 48 hours after ingestion of a large meal to unravel the postprandial changes in Burmese pythons. We first used the pooled data to perform a de novo assembly of the transcriptome and supplemented this with a proteomic survey of enzymes in the plasma and gastric fluid. We constructed a high-quality transcriptome with 34 423 transcripts, of which 19 713 (57%) were annotated. Among highly expressed genes (fragments per kilo base per million sequenced reads > 100 in 1 tissue), we found that the transition from fasting to digestion was associated with differential expression of 43 genes in the heart, 206 genes in the liver, 114 genes in the stomach, 89 genes in the pancreas, and 158 genes in the intestine. We interrogated the function of these genes to test previous hypotheses on the response to feeding. We also used the transcriptome to identify 314 secreted proteins in the gastric fluid of the python. Digestion was associated with an upregulation of genes related to metabolic processes, and translational changes therefore appear to support the postprandial rise in metabolism. We identify stomach-related proteins from a digesting individual and demonstrate that the sensitivity of modern liquid chromatography/tandem mass spectrometry equipment allows the identification of gastric juice proteins that are present during digestion. © The Authors 2017. Published by Oxford University Press.

  3. Transcriptome response to copper heavy metal stress in hard-shelled mussel (Mytilus coruscus

    Directory of Open Access Journals (Sweden)

    Meiying Xu

    2016-03-01

    Full Text Available The hard-shelled mussel (Mytilus coruscus has considerably one of the most economically important marine shellfish worldwide and considered as a good invertebrate model for ecotoxicity study for a long time. In the present study, we used Illumina sequencing technology (HiSeq2000 to sequence, assemble and annotate the transcriptome of the hard-shelled mussel which challenged with copper pollution. A total of 21,723,913 paired-end clean reads (NCBI SRA database SRX1411195 were generated from HiSeq2000 sequencer and 96,403 contigs (with N50 = 1118 bp were obtained after de novo assembling with Trinity software. Digital gene expression analysis reveals 1156 unigenes are upregulated and 1681 unigenes are downregulated when challenged with copper. By KEGG pathway enrichment analysis, we found that unigenes in four KEGG pathways (aminoacyl-tRNA biosynthesis, apoptosis, DNA replication and mismatch repair show significant differential expressed between control and copper treated groups. We hope that the gill transcriptome in copper treated hard-shelled mussel can give useful information to understand how mussel handles with heavy metal stress at molecular level. Keywords: Hard-shelled mussel, Heavy metal, Transcriptome, Ecotoxicity

  4. Transcriptome Sequencing of Chemically Induced Aquilaria sinensis to Identify Genes Related to Agarwood Formation.

    Science.gov (United States)

    Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia

    2016-01-01

    Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various p