WorldWideScience

Sample records for parallel transcriptome sequencing

  1. Massively parallel sequencing and analysis of the Necator americanus transcriptome.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    2010-05-01

    Full Text Available The blood-feeding hookworm Necator americanus infects hundreds of millions of people worldwide. In order to elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of the adult stage of Necator americanus was explored using next-generation sequencing and bioinformatic analyses.A total of 19,997 contigs were assembled from the sequence data; 6,771 of these contigs had known orthologues in the free-living nematode Caenorhabditis elegans, and most of them encoded proteins with WD40 repeats (10.6%, proteinase inhibitors (7.8% or calcium-binding EF-hand proteins (6.7%. Bioinformatic analyses inferred that the C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%, oxidative phosphorylation (63% and/or proteases (60%; most of these molecules were predicted to be involved in more than one biological pathway. Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. For instance, proteinase inhibitors were inferred to be highly represented in the former species, whereas SCP/Tpx-1/Ag5/PR-1/Sc7 proteins ( = SCP/TAPS or Ancylostoma-secreted proteins were predominant in the latter. In N. americanus, essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have homologues in the human host. These candidate targets were inferred to be linked to mitochondrial (e.g., processing proteins or amino acid metabolism (e.g., asparagine t-RNA synthetase.This study has provided detailed insights into the transcriptome of the adult stage of N. americanus and examines similarities and differences between this species and A. caninum. Future efforts should focus on comparative transcriptomic and proteomic investigations of the other predominant human

  2. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

    OpenAIRE

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-01-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resoluti...

  3. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.

    Science.gov (United States)

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-08-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

  4. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    Science.gov (United States)

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  5. Nuclear RNA sequencing of the mouse erythroid cell transcriptome.

    Directory of Open Access Journals (Sweden)

    Jennifer A Mitchell

    Full Text Available In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.

  6. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that...

  7. Sequencing and analysis of the Mediterranean amphioxus (Branchiostoma lanceolatum transcriptome.

    Directory of Open Access Journals (Sweden)

    Silvan Oulion

    Full Text Available BACKGROUND: The basally divergent phylogenetic position of amphioxus (Cephalochordata, as well as its conserved morphology, development and genetics, make it the best proxy for the chordate ancestor. Particularly, studies using the amphioxus model help our understanding of vertebrate evolution and development. Thus, interest for the amphioxus model led to the characterization of both the transcriptome and complete genome sequence of the American species, Branchiostoma floridae. However, recent technical improvements allowing induction of spawning in the laboratory during the breeding season on a daily basis with the Mediterranean species Branchiostoma lanceolatum have encouraged European Evo-Devo researchers to adopt this species as a model even though no genomic or transcriptomic data have been available. To fill this need we used the pyrosequencing method to characterize the B. lanceolatum transcriptome and then compared our results with the published transcriptome of B. floridae. RESULTS: Starting with total RNA from nine different developmental stages of B. lanceolatum, a normalized cDNA library was constructed and sequenced on Roche GS FLX (Titanium mode. Around 1.4 million of reads were produced and assembled into 70,530 contigs (average length of 490 bp. Overall 37% of the assembled sequences were annotated by BlastX and their Gene Ontology terms were determined. These results were then compared to genomic and transcriptomic data of B. floridae to assess similarities and specificities of each species. CONCLUSION: We obtained a high-quality amphioxus (B. lanceolatum reference transcriptome using a high throughput sequencing approach. We found that 83% of the predicted genes in the B. floridae complete genome sequence are also found in the B. lanceolatum transcriptome, while only 41% were found in the B. floridae transcriptome obtained with traditional Sanger based sequencing. Therefore, given the high degree of sequence conservation

  8. Transcriptome sequences resolve deep relationships of the grape family.

    Science.gov (United States)

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  9. Transcriptome sequences resolve deep relationships of the grape family.

    Directory of Open Access Journals (Sweden)

    Jun Wen

    Full Text Available Previous phylogenetic studies of the grape family (Vitaceae yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  10. Multiplexed microsatellite recovery using massively parallel sequencing

    Science.gov (United States)

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  11. Sequencing and characterization of the guppy (Poecilia reticulata transcriptome

    Directory of Open Access Journals (Sweden)

    Rodd F Helen

    2011-04-01

    Full Text Available Abstract Background Next-generation sequencing is providing researchers with a relatively fast and affordable option for developing genomic resources for organisms that are not among the traditional genetic models. Here we present a de novo assembly of the guppy (Poecilia reticulata transcriptome using 454 sequence reads, and we evaluate potential uses of this transcriptome, including detection of sex-specific transcripts and deployment as a reference for gene expression analysis in guppies and a related species. Guppies have been model organisms in ecology, evolutionary biology, and animal behaviour for over 100 years. An annotated transcriptome and other genomic tools will facilitate understanding the genetic and molecular bases of adaptation and variation in a vertebrate species with a uniquely well known natural history. Results We generated approximately 336 Mbp of mRNA sequence data from male brain, male body, female brain, and female body. The resulting 1,162,670 reads assembled into 54,921 contigs, creating a reference transcriptome for the guppy with an average read depth of 28×. We annotated nearly 40% of this reference transcriptome by searching protein and gene ontology databases. Using this annotated transcriptome database, we identified candidate genes of interest to the guppy research community, putative single nucleotide polymorphisms (SNPs, and male-specific expressed genes. We also showed that our reference transcriptome can be used for RNA-sequencing-based analysis of differential gene expression. We identified transcripts that, in juveniles, are regulated differently in the presence and absence of an important predator, Rivulus hartii, including two genes implicated in stress response. For each sample in the RNA-seq study, >50% of high-quality reads mapped to unique sequences in the reference database with high confidence. In addition, we evaluated the use of the guppy reference transcriptome for gene expression analyses in

  12. Characterization of the Zoarces viviparus liver transcriptome using massively parallel pyrosequencing

    Directory of Open Access Journals (Sweden)

    Asker Noomi

    2009-07-01

    Full Text Available Abstract Background The teleost Zoarces viviparus (eelpout lives along the coasts of Northern Europe and has long been an established model organism for marine ecology and environmental monitoring. The scarce information about this species genome has however restrained the use of efficient molecular-level assays, such as gene expression microarrays. Results In the present study we present the first comprehensive characterization of the Zoarces viviparus liver transcriptome. From 400,000 reads generated by massively parallel pyrosequencing, more than 50,000 pieces of putative transcripts were assembled, annotated and functionally classified. The data was estimated to cover roughly 40% of the total transcriptome and homologues for about half of the genes of Gasterosteus aculeatus (stickleback were identified. The sequence data was consequently used to design an oligonucleotide microarray for large-scale gene expression analysis. Conclusion Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates enough genomic information for adequate de novo assembly of a large number of genes in a higher vertebrate. The generated sequence data, including the validated microarray probes, are publicly available to promote genome-wide research in Zoarces viviparus.

  13. Illumina-based de novo transcriptome sequencing and analysis

    Indian Academy of Sciences (India)

    In the present study, we used Illumina HiSeq technology to perform de novo assembly of heart and musk gland transcriptomes from the Chinese forest musk deer. A total of 239,383 transcripts and 176,450 unigenes were obtained, of which 37,329 unigenes were matched to known sequences in the NCBI nonredundant ...

  14. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  15. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.; Bergman, Nicholas; Borodovsky, Mark

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computational predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.

  16. Transcriptome sequencing and comparative transcriptome analysis of the scleroglucan producer Sclerotium rolfsii

    Directory of Open Access Journals (Sweden)

    Stahl Ulf

    2010-05-01

    Full Text Available Abstract Background The plant pathogenic basidiomycete Sclerotium rolfsii produces the industrially exploited exopolysaccharide scleroglucan, a polymer that consists of (1 → 3-β-linked glucose with a (1 → 6-β-glycosyl branch on every third unit. Although the physicochemical properties of scleroglucan are well understood, almost nothing is known about the genetics of scleroglucan biosynthesis. Similarly, the biosynthetic pathway of oxalate, the main by-product during scleroglucan production, has not been elucidated yet. In order to provide a basis for genetic and metabolic engineering approaches, we studied scleroglucan and oxalate biosynthesis in S. rolfsii using different transcriptomic approaches. Results Two S. rolfsii transcriptomes obtained from scleroglucan-producing and scleroglucan-nonproducing conditions were pooled and sequenced using the 454 pyrosequencing technique yielding ~350,000 reads. These could be assembled into 21,937 contigs and 171,833 singletons, for which 6,951 had significant matches in public protein data bases. Sequence data were used to obtain first insights into the genomics of scleroglucan and oxalate production and to predict putative proteins involved in the synthesis of both metabolites. Using comparative transcriptomics, namely Agilent microarray hybridization and suppression subtractive hybridization, we identified ~800 unigenes which are differently expressed under scleroglucan-producing and non-producing conditions. From these, candidate genes were identified which could represent potential leads for targeted modification of the S. rolfsii metabolism for increased scleroglucan yields. Conclusions The results presented in this paper provide for the first time genomic and transcriptomic data about S. rolfsii and demonstrate the power and usefulness of combined transcriptome sequencing and comparative microarray analysis. The data obtained allowed us to predict the biosynthetic pathways of scleroglucan and

  17. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  18. Sequencing and De Novo Transcriptome Assembly of Brachypodium sylvaticum (Poaceae

    Directory of Open Access Journals (Sweden)

    Samuel E. Fox

    2013-03-01

    Full Text Available Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis, 104,950 (Spain, and 177,682 (Greece transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs and >20,000 simple sequence repeat (SSR DNA sites. Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum.

  19. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    Directory of Open Access Journals (Sweden)

    Scoté-Blachon Céline

    2008-09-01

    Full Text Available Abstract Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression, LongSAGE and MPSS (Massively Parallel Signature Sequencing are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method.

  20. Transcriptome sequencing of the Antarctic vascular plant Deschampsia antarctica Desv. under abiotic stress.

    Science.gov (United States)

    Lee, Jungeun; Noh, Eun Kyeung; Choi, Hyung-Seok; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2013-03-01

    Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been studied as an extremophile that has successfully adapted to marginal land with the harshest environment for terrestrial plants. However, limited genetic research has focused on this species due to the lack of genomic resources. Here, we present the first de novo assembly of its transcriptome by massive parallel sequencing and its expression profile using D. antarctica grown under various stress conditions. Total sequence reads generated by pyrosequencing were assembled into 60,765 unigenes (28,177 contigs and 32,588 singletons). A total of 29,173 unique protein-coding genes were identified based on sequence similarities to known proteins. The combined results from all three stress conditions indicated differential expression of 3,110 genes. Quantitative reverse transcription polymerase chain reaction showed that several well-known stress-responsive genes encoding late embryogenesis abundant protein, dehydrin 1, and ice recrystallization inhibition protein were induced dramatically and that genes encoding U-box-domain-containing protein, electron transfer flavoprotein-ubiquinone, and F-box-containing protein were induced by abiotic stressors in a manner conserved with other plant species. We identified more than 2,000 simple sequence repeats that can be developed as functional molecular markers. This dataset is the most comprehensive transcriptome resource currently available for D. antarctica and is therefore expected to be an important foundation for future genetic studies of grasses and extremophiles.

  1. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  2. Using next generation transcriptome sequencing to predict an ectomycorrhizal metabolome

    Directory of Open Access Journals (Sweden)

    Cseke Leland J

    2011-05-01

    Full Text Available Abstract Background Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. Results We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. Conclusions The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.

  3. High-throughput sequencing of black pepper root transcriptome

    Science.gov (United States)

    2012-01-01

    Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782

  4. High-throughput sequencing of black pepper root transcriptome

    Directory of Open Access Journals (Sweden)

    Gordo Sheila MC

    2012-09-01

    Full Text Available Abstract Background Black pepper (Piper nigrum L. is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.

  5. Researches on Transcriptome Sequencing in the Study of Traditional Chinese Medicine

    Science.gov (United States)

    Xin, Jie; Zhang, Rong-chao; Wang, Lei

    2017-01-01

    Due to its incomparable advantages, the application of transcriptome sequencing in the study of traditional Chinese medicine attracts more and more attention of researchers, which greatly promote the development of traditional Chinese medicine. In this paper, the applications of transcriptome sequencing in traditional Chinese medicine were summarized by reviewing recent related papers. PMID:28900463

  6. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  7. Transcriptome sequencing in prostate cancer identifies inter-tumor heterogeneity

    Directory of Open Access Journals (Sweden)

    Janet Mendonca

    2015-06-01

    Full Text Available Given the dearth of gene mutations in prostate cancer, [1] ,[2] it is likely that genomic rearrangements play a significant role in the evolution of prostate cancer. However, in the search for recurrent genomic alterations, "private alterations" have received less attention. Such alterations may provide insights into the evolution, behavior, and clinical outcome of an individual tumor. In a recent report in "Genome Biology" Wyatt et al. [3] defines unique alterations in a cohort of high-risk prostate cancer patient with a lethal phenotype. Utilizing a transcriptome sequencing approach they observe high inter-tumor heterogeneity; however, the genes altered distill into three distinct cancer-relevant pathways. Their analysis reveals the presence of several non-ETS fusions, which may contribute to the phenotype of individual tumors, and have significance for disease progression.

  8. De novo transcriptome sequencing and sequence analysis of the malaria vector Anopheles sinensis (Diptera: Culicidae)

    Science.gov (United States)

    2014-01-01

    Background Anopheles sinensis is the major malaria vector in China and Southeast Asia. Vector control is one of the most effective measures to prevent malaria transmission. However, there is little transcriptome information available for the malaria vector. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to build a transcriptome dataset for functional genomics analysis by large-scale RNA sequencing (RNA-seq). Methods To provide a more comprehensive and complete transcriptome of An. sinensis, eggs, larvae, pupae, male adults and female adults RNA were pooled together for cDNA preparation, sequenced using the Illumina paired-end sequencing technology and assembled into unigenes. These unigenes were then analyzed in their genome mapping, functional annotation, homology, codon usage bias and simple sequence repeats (SSRs). Results Approximately 51.6 million clean reads were obtained, trimmed, and assembled into 38,504 unigenes with an average length of 571 bp, an N50 of 711 bp, and an average GC content 51.26%. Among them, 98.4% of unigenes could be mapped onto the reference genome, and 69% of unigenes could be annotated with known biological functions. Homology analysis identified certain numbers of An. sinensis unigenes that showed homology or being putative 1:1 orthologues with genomes of other Dipteran species. Codon usage bias was analyzed and 1,904 SSRs were detected, which will provide effective molecular markers for the population genetics of this species. Conclusions Our data and analysis provide the most comprehensive transcriptomic resource and characteristics currently available for An. sinensis, and will facilitate genetic, genomic studies, and further vector control of An. sinensis. PMID:25000941

  9. Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture.

    Directory of Open Access Journals (Sweden)

    Alicia R Martin

    2014-08-01

    Full Text Available Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP. The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and

  10. High-resolution analysis of the 5'-end transcriptome using a next generation DNA sequencer.

    Directory of Open Access Journals (Sweden)

    Shin-ichi Hashimoto

    Full Text Available Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza. More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.

  11. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  12. Transcriptome

    Science.gov (United States)

    ... Also: Talking Glossary of Genetic Terms Definitions for genetic terms used on this page En Español: Transcriptoma Transcriptome What is a transcriptome? What can a transcriptome tell us? How can transcriptome data be used to explore gene function? What is ...

  13. Deep RNA Sequencing of the Skeletal Muscle Transcriptome in Swimming Fish

    NARCIS (Netherlands)

    Palstra, A.P.; Beltran, S.; Burgerhout, E.; Brittijn, S.A.; Magnoni, L.J.; Henkel, C.V.; Jansen, A.; Thillart, G.E.E.J.M.; Spaink, H.P.; Planas, J.V.

    2013-01-01

    Deep RNA sequencing (RNA-seq) was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss) with the specific objective to identify expressed genes and quantify the transcriptomic effects of

  14. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    Directory of Open Access Journals (Sweden)

    Tingcai Cheng

    Full Text Available The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG and posterior silk gland (PSG. Three sericin genes (sericin 1, sericin 2, and sericin 3 were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25 were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs and 361 insertion-deletions (INDELs were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research.

  15. Parallel sequencing lives, or what makes large sequencing projects successful.

    Science.gov (United States)

    Quilez, Javier; Vidal, Enrique; Dily, François Le; Serra, François; Cuartero, Yasmina; Stadhouders, Ralph; Graf, Thomas; Marti-Renom, Marc A; Beato, Miguel; Filion, Guillaume

    2017-11-01

    T47D_rep2 and b1913e6c1_51720e9cf were 2 Hi-C samples. They were born and processed at the same time, yet their fates were very different. The life of b1913e6c1_51720e9cf was simple and fruitful, while that of T47D_rep2 was full of accidents and sorrow. At the heart of these differences lies the fact that b1913e6c1_51720e9cf was born under a lab culture of Documentation, Automation, Traceability, and Autonomy and compliance with the FAIR Principles. Their lives are a lesson for those who wish to embark on the journey of managing high-throughput sequencing data. © The Author 2017. Published by Oxford University Press.

  16. Detection of G-Quadruplex Structures Formed by G-Rich Sequences from Rice Genome and Transcriptome Using Combined Probes.

    Science.gov (United States)

    Chang, Tianjun; Li, Weiguo; Ding, Zhan; Cheng, Shaofei; Liang, Kun; Liu, Xiangjun; Bing, Tao; Shangguan, Dihua

    2017-08-01

    Putative G-quadruplex (G4) forming sequences (PQS) are highly prevalent in the genome and transcriptome of various organisms and are considered as potential regulation elements in many biological processes by forming G4 structures. The formation of G4 structures highly depends on the sequences and the environment. In most cases, it is difficult to predict G4 formation by PQS, especially PQS containing G2 tracts. Therefore, the experimental identification of G4 formation is essential in the study of G4-related biological functions. Herein, we report a rapid and simple method for the detection of G4 structures by using a pair of complementary reporters, hemin and BMSP. This method was applied to detect G4 structures formed by PQS (DNA and RNA) searched in the genome and transcriptome of Oryza sativa. Unlike most of the reported G4 probes that only recognize part of G4 structures, the proposed method based on combined probes positively responded to almost all G4 conformations, including parallel, antiparallel, and mixed/hybrid G4, but did not respond to non-G4 sequences. This method shows potential for high-throughput identification of G4 structures in genome and transcriptome. Furthermore, BMSP was observed to drive some PQS to form more stable G4 structures or induce the G4 formation of some PQS that cannot form G4 in normal physiological conditions, which may provide a powerful molecular tool for gene regulation.

  17. Identification and analysis of common bean (Phaseolus vulgaris L. transcriptomes by massively parallel pyrosequencing

    Directory of Open Access Journals (Sweden)

    Thimmapuram Jyothi

    2011-10-01

    Full Text Available Abstract Background Common bean (Phaseolus vulgaris is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean. Results We obtained 1,692,972 reads with an average read length of 207 nucleotides (nt. These reads were assembled into 59,295 unigenes including 39,572 contigs and 19,723 singletons, in addition to 35,328 singletons less than 100 bp. Comparing the unigenes to common bean ESTs deposited in GenBank, we found that 53.40% or 31,664 of these unigenes had no matches to this dataset and can be considered as new common bean transcripts. Functional annotation of the unigenes carried out by Gene Ontology assignments from hits to Arabidopsis and soybean indicated coverage of a broad range of GO categories. The common bean unigenes were also compared to the bean bacterial artificial chromosome (BAC end sequences, and a total of 21% of the unigenes (12,724 including 9,199 contigs and 3,256 singletons match to the 8,823 BAC-end sequences. In addition, a large number of simple sequence repeats (SSRs and transcription factors were also identified in this study. Conclusions This work provides the first large scale identification of the common bean transcriptome derived by 454 pyrosequencing. This research has resulted in a 150% increase in the number of Phaseolus vulgaris ESTs. The dataset obtained through this analysis will provide a platform for functional genomics in common bean and related legumes and

  18. Illumina–based de novo transcriptome sequencing and analysis of ...

    Indian Academy of Sciences (India)

    Administrator

    2017-10-25

    Oct 25, 2017 ... (Shanghai, China) following manufacturer's protocols (Illumina, San .... suggests that pathways involved in musk production are expressed at a ..... Strickler S. R., Aureliano B. and Mueller L. A. 2012 Designing a transcriptome.

  19. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  20. Massively parallel interrogation of aptamer sequence, structure and function.

    Directory of Open Access Journals (Sweden)

    Nicholas O Fischer

    Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  1. Transcriptome sequencing and characterization for the sea cucumber Apostichopus japonicus (Selenka, 1867.

    Directory of Open Access Journals (Sweden)

    Huixia Du

    Full Text Available BACKGROUND: Sea cucumbers are a special group of marine invertebrates. They occupy a taxonomic position that is believed to be important for understanding the origin and evolution of deuterostomes. Some of them such as Apostichopus japonicus represent commercially important aquaculture species in Asian countries. Many efforts have been devoted to increasing the number of expressed sequence tags (ESTs for A. japonicus, but a comprehensive characterization of its transcriptome remains lacking. Here, we performed the large-scale transcriptome profiling and characterization by pyrosequencing diverse cDNA libraries from A. japonicus. RESULTS: In total, 1,061,078 reads were obtained by 454 sequencing of eight cDNA libraries representing different developmental stages and adult tissues in A. japonicus. These reads were assembled into 29,666 isotigs, which were further clustered into 21,071 isogroups. Nearly 40% of the isogroups showed significant matches to known proteins based on sequence similarity. Gene ontology (GO and KEGG pathway analyses recovered diverse biological functions and processes. Candidate genes that were potentially involved in aestivation were identified. Transcriptome comparison with the sea urchin Strongylocentrotus purpuratus revealed similar patterns of GO term representation. In addition, 4,882 putative orthologous genes were identified, of which 202 were not present in the non-echinoderm organisms. More than 700 simple sequence repeats (SSRs and 54,000 single nucleotide polymorphisms (SNPs were detected in the A. japonicus transcriptome. CONCLUSION: Pyrosequencing was proven to be efficient in rapidly identifying a large set of genes for the sea cucumber A. japonicus. Through the large-scale transcriptome sequencing as well as public EST data integration, we performed a comprehensive characterization of the A. japonicus transcriptome and identified candidate aestivation-related genes. A large number of potential genetic

  2. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes)

    Science.gov (United States)

    2011-01-01

    Background Two strains of the silver fox (Vulpes vulpes), with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. Results cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence). Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p fox transcriptome. Conclusions Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information. PMID:21967120

  3. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    Science.gov (United States)

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  4. Increasing the reach of forensic genetics with massively parallel sequencing.

    Science.gov (United States)

    Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R

    2017-09-01

    The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.

  5. Somatic sex-specific transcriptome differences in Drosophila revealed by whole transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Arbeitman Michelle N

    2011-07-01

    Full Text Available Abstract Background Understanding animal development and physiology at a molecular-biological level has been advanced by the ability to determine at high resolution the repertoire of mRNA molecules by whole transcriptome resequencing. This includes the ability to detect and quantify rare abundance transcripts and isoform-specific mRNA variants produced from a gene. The sex hierarchy consists of a pre-mRNA splicing cascade that directs the production of sex-specific transcription factors that specify nearly all sexual dimorphism. We have used deep RNA sequencing to gain insight into how the Drosophila sex hierarchy generates somatic sex differences, by examining gene and transcript isoform expression differences between the sexes in adult head tissues. Results Here we find 1,381 genes that differ in overall expression levels and 1,370 isoform-specific transcripts that differ between males and females. Additionally, we find 512 genes not regulated downstream of transformer that are significantly more highly expressed in males than females. These 512 genes are enriched on the × chromosome and reside adjacent to dosage compensation complex entry sites, which taken together suggests that their residence on the × chromosome might be sufficient to confer male-biased expression. There are no transcription unit structural features, from a set of features, that are robustly significantly different in the genes with significant sex differences in the ratio of isoform-specific transcripts, as compared to random isoform-specific transcripts, suggesting that there is no single molecular mechanism that generates isoform-specific transcript differences between the sexes, even though the sex hierarchy is known to include three pre-mRNA splicing factors. Conclusions We identify thousands of genes that show sex-specific differences in overall gene expression levels, and identify hundreds of additional genes that have differences in the abundance of isoform

  6. Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography.

    Science.gov (United States)

    Hirase, Shotaro; Takeshima, Hirohiko; Nishida, Mutsumi; Iwasaki, Wataru

    2016-04-28

    Reliably rooted phylogenetic trees play irreplaceable roles in clarifying diversification in the patterns of species and populations. However, such trees are often unavailable in phylogeographic studies, particularly when the focus is on rapidly expanded populations that exhibit star-like trees. A fundamental bottleneck is known as the random rooting effect, where a distant outgroup tends to root an unrooted tree "randomly." We investigated whether parallel mitochondrial genome (mitogenome) sequencing alleviates this effect in phylogeography using a case study on the Sea of Japan lineage of the intertidal goby Chaenogobius annularis Eighty-three C. annularis individuals were collected and their mitogenomes were determined by high-throughput and low-cost parallel sequencing. Phylogenetic analysis of these mitogenome sequences was conducted to root the Sea of Japan lineage, which has a star-like phylogeny and had not been reliably rooted. The topologies of the bootstrap trees were investigated to determine whether the use of mitogenomes alleviated the random rooting effect. The mitogenome data successfully rooted the Sea of Japan lineage by alleviating the effect, which hindered phylogenetic analysis that used specific gene sequences. The reliable rooting of the lineage led to the discovery of a novel, northern lineage that expanded during an interglacial period with high bootstrap support. Furthermore, the finding of this lineage suggested the existence of additional glacial refugia and provided a new recent calibration point that revised the divergence time estimation between the Sea of Japan and Pacific Ocean lineages. This study illustrates the effectiveness of parallel mitogenome sequencing for solving the random rooting problem in phylogeographic studies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Directory of Open Access Journals (Sweden)

    Marais Gabriel AB

    2011-07-01

    Full Text Available Abstract Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO terms, and thousands of single-nucleotide polymorphisms (SNPs were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49% that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to

  8. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    Science.gov (United States)

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a

  9. Characterization of Liaoning cashmere goat transcriptome: sequencing, de novo assembly, functional annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Hongliang Liu

    Full Text Available Liaoning cashmere goat is a famous goat breed for cashmere wool. In order to increase the transcriptome data and accelerate genetic improvement for this breed, we performed de novo transcriptome sequencing to generate the first expressed sequence tag dataset for the Liaoning cashmere goat, using next-generation sequencing technology.Transcriptome sequencing of Liaoning cashmere goat on a Roche 454 platform yielded 804,601 high-quality reads. Clustering and assembly of these reads produced a non-redundant set of 117,854 unigenes, comprising 13,194 isotigs and 104,660 singletons. Based on similarity searches with known proteins, 17,356 unigenes were assigned to 6,700 GO categories, and the terms were summarized into three main GO categories and 59 sub-categories. 3,548 and 46,778 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Comparative analysis revealed that 42,254 unigenes were aligned to 17,532 different sequences in NCBI non-redundant nucleotide databases. 97,236 (82.51% unigenes were mapped to the 30 goat chromosomes. 35,551 (30.17% unigenes were matched to 11,438 reported goat protein-coding genes. The remaining non-matched unigenes were further compared with cattle and human reference genes, 67 putative new goat genes were discovered. Additionally, 2,781 potential simple sequence repeats were initially identified from all unigenes.The transcriptome of Liaoning cashmere goat was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the Liaoning cashmere goat transcriptome. The potential simple sequence repeats provide a material basis for future genetic linkage and quantitative trait loci analyses.

  10. Haematobia irritans dataset of raw sequence reads from Illumina-based transcriptome sequencing of specific tissues and life stages

    Science.gov (United States)

    Illumina HiSeq technology was used to sequence the transcriptome from various dissected tissues and life stages from the horn fly, Haematobia irritans. These samples include eggs (0, 2, 4, and 9 hours post-oviposition), adult fly gut, adult fly legs, adult fly malpighian tubule, adult fly ovary, adu...

  11. Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations.

    Directory of Open Access Journals (Sweden)

    Brian B Tuch

    Full Text Available Due to growing throughput and shrinking cost, massively parallel sequencing is rapidly becoming an attractive alternative to microarrays for the genome-wide study of gene expression and copy number alterations in primary tumors. The sequencing of transcripts (RNA-Seq should offer several advantages over microarray-based methods, including the ability to detect somatic mutations and accurately measure allele-specific expression. To investigate these advantages we have applied a novel, strand-specific RNA-Seq method to tumors and matched normal tissue from three patients with oral squamous cell carcinomas. Additionally, to better understand the genomic determinants of the gene expression changes observed, we have sequenced the tumor and normal genomes of one of these patients. We demonstrate here that our RNA-Seq method accurately measures allelic imbalance and that measurement on the genome-wide scale yields novel insights into cancer etiology. As expected, the set of genes differentially expressed in the tumors is enriched for cell adhesion and differentiation functions, but, unexpectedly, the set of allelically imbalanced genes is also enriched for these same cancer-related functions. By comparing the transcriptomic perturbations observed in one patient to his underlying normal and tumor genomes, we find that allelic imbalance in the tumor is associated with copy number mutations and that copy number mutations are, in turn, strongly associated with changes in transcript abundance. These results support a model in which allele-specific deletions and duplications drive allele-specific changes in gene expression in the developing tumor.

  12. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome.

    Science.gov (United States)

    Weisberg, Alexandra J; Kim, Gunjune; Westwood, James H; Jelesko, John G

    2017-11-10

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is "leaves of three, let it be", which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species.

  13. SNP discovery in the transcriptome of white Pacific shrimp Litopenaeus vannamei by next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Yang Yu

    Full Text Available The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.

  14. Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California

    Science.gov (United States)

    Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.

    2016-02-01

    Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.

  15. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes

    Directory of Open Access Journals (Sweden)

    Sun Qi

    2011-10-01

    Full Text Available Abstract Background Two strains of the silver fox (Vulpes vulpes, with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. Results cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence. Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p Conclusions Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information.

  16. Statistical method to compare massive parallel sequencing pipelines.

    Science.gov (United States)

    Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P

    2017-03-01

    Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.

  17. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  18. Global transcriptional profiling of the toxic dinoflagellate Alexandrium fundyense using Massively Parallel Signature Sequencing

    Directory of Open Access Journals (Sweden)

    Anderson Donald M

    2006-04-01

    Full Text Available Abstract Background Dinoflagellates are one of the most important classes of marine and freshwater algae, notable both for their functional diversity and ecological significance. They occur naturally as free-living cells, as endosymbionts of marine invertebrates and are well known for their involvement in "red tides". Dinoflagellates are also notable for their unusual genome content and structure, which suggests that the organization and regulation of dinoflagellate genes may be very different from that of most eukaryotes. To investigate the content and regulation of the dinoflagellate genome, we performed a global analysis of the transcriptome of the toxic dinoflagellate Alexandrium fundyense under nitrate- and phosphate-limited conditions using Massively Parallel Signature Sequencing (MPSS. Results Data from the two MPSS libraries showed that the number of unique signatures found in A. fundyense cells is similar to that of humans and Arabidopsis thaliana, two eukaryotes that have been extensively analyzed using this method. The general distribution, abundance and expression patterns of the A. fundyense signatures were also quite similar to other eukaryotes, and at least 10% of the A. fundyense signatures were differentially expressed between the two conditions. RACE amplification and sequencing of a subset of signatures showed that multiple signatures arose from sequence variants of a single gene. Single signatures also mapped to different sequence variants of the same gene. Conclusion The MPSS data presented here provide a quantitative view of the transcriptome and its regulation in these unusual single-celled eukaryotes. The observed signature abundance and distribution in Alexandrium is similar to that of other eukaryotes that have been analyzed using MPSS. Results of signature mapping via RACE indicate that many signatures result from sequence variants of individual genes. These data add to the growing body of evidence for widespread gene

  19. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    Science.gov (United States)

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition

  20. A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity.

    Science.gov (United States)

    Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter

    2014-09-11

    The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic

  1. A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data.

    Science.gov (United States)

    Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

    2018-02-01

    To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.

  2. Sequencing and de novo assembly of the transcriptome of the glassy-winged sharpshooter (Homalodisca vitripennis.

    Directory of Open Access Journals (Sweden)

    Raja Sekhar Nandety

    Full Text Available BACKGROUND: The glassy-winged sharpshooter Homalodisca vitripennis (Hemiptera: Cicadellidae, is a xylem-feeding leafhopper and important vector of the bacterium Xylella fastidiosa; the causal agent of Pierce's disease of grapevines. The functional complexity of the transcriptome of H. vitripennis has not been elucidated thus far. It is a necessary blueprint for an understanding of the development of H. vitripennis and for designing efficient biorational control strategies including those based on RNA interference. RESULTS: Here we elucidate and explore the transcriptome of adult H. vitripennis using high-throughput paired end deep sequencing and de novo assembly. A total of 32,803,656 paired-end reads were obtained with an average transcript length of 624 nucleotides. We assembled 32.9 Mb of the transcriptome of H. vitripennis that spanned across 47,265 loci and 52,708 transcripts. Comparison of our non-redundant database showed that 45% of the deduced proteins of H. vitripennis exhibit identity (e-value ≤1(-5 with known proteins. We assigned Gene Ontology (GO terms, Kyoto Encyclopedia of Genes and Genomes (KEGG annotations, and potential Pfam domains to each transcript isoform. In order to gain insight into the molecular basis of key regulatory genes of H. vitripennis, we characterized predicted proteins involved in the metabolism of juvenile hormone, and biogenesis of small RNAs (Dicer and Piwi sequences from the transcriptomic sequences. Analysis of transposable element sequences of H. vitripennis indicated that the genome is less expanded in comparison to many other insects with approximately 1% of the transcriptome carrying transposable elements. CONCLUSIONS: Our data significantly enhance the molecular resources available for future study and control of this economically important hemipteran. This transcriptional information not only provides a more nuanced understanding of the underlying biological and physiological mechanisms that

  3. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species

    Directory of Open Access Journals (Sweden)

    Wang Xiao-Wei

    2012-10-01

    Full Text Available Abstract Background Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1 and Mediterranean (MED, respectively. Results More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84% and much higher than that of MEAM1 and MED (0.83%. This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for

  4. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    Directory of Open Access Journals (Sweden)

    Juan Ning

    Full Text Available BACKGROUND: Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. RESULTS: Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs that provide a resource for gene function studies. CONCLUSION: Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  5. Whole transcriptome sequencing enables discovery and analysis of viruses in archived primary central nervous system lymphomas.

    Directory of Open Access Journals (Sweden)

    Christopher DeBoever

    Full Text Available Primary central nervous system lymphomas (PCNSL have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV, JC polyomavirus (JCV, and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples.

  6. [Study on quality evaluation of sequence and SSR information in transcriptome of Astragalus membranacus].

    Science.gov (United States)

    Chang, Yue; Yang, Song; Liu, Zhen-Peng; Ren, Wei-Chao; Liu, Jie; Ma, Wei

    2016-04-01

    In this study, 454/Roche GS FLX sequencing technology was used to obtain the data of the Astragalus membranaceus. Four hundred and fifty-four Sequencing System Software was applied to carry out the transcription of the group from scratch. Using MISA tools, 9 893 unigenes were selected for the sequence of the genome of A. membranaceus, and the information of SSR locus was analyzed. According to the result, the average length of reads was 413 bp, about 86% of the reads was involved in the splicing, the length of the N50 was 1 205 bp, the number of unigenes was measured by the whole transcript. 1 729 SSR loci in the A. membranaceus transcriptome were searched, the occurrence frequency of SSR was 9.24%, the frequency of SSR in the whole transcriptome was 13.42%, the average length of SSR was 7.97 kb. One hundred and twenty-seven kinds of core repeat sequences were found, the dominant type was TG/AC type of dinucleotide, it appeared to account for 4.25% of the total SSR locus. The results of the sequence of the transcription of the A. membranaceus transcriptome revealed the overall expression, and a large number of unigenessequence was obtained, and the SSR locus in the genome of the A. membranaceus is high, and the type is diverse, and the polymorphism of the gene is high. Copyright© by the Chinese Pharmaceutical Association.

  7. Genetic signatures of adaptation revealed from transcriptome sequencing of Arctic and red foxes.

    Science.gov (United States)

    Kumar, Vikas; Kutschera, Verena E; Nilsson, Maria A; Janke, Axel

    2015-08-07

    The genus Vulpes (true foxes) comprises numerous species that inhabit a wide range of habitats and climatic conditions, including one species, the Arctic fox (Vulpes lagopus) which is adapted to the arctic region. A close relative to the Arctic fox, the red fox (Vulpes vulpes), occurs in subarctic to subtropical habitats. To study the genetic basis of their adaptations to different environments, transcriptome sequences from two Arctic foxes and one red fox individual were generated and analyzed for signatures of positive selection. In addition, the data allowed for a phylogenetic analysis and divergence time estimate between the two fox species. The de novo assembly of reads resulted in more than 160,000 contigs/transcripts per individual. Approximately 17,000 homologous genes were identified using human and the non-redundant databases. Positive selection analyses revealed several genes involved in various metabolic and molecular processes such as energy metabolism, cardiac gene regulation, apoptosis and blood coagulation to be under positive selection in foxes. Branch site tests identified four genes to be under positive selection in the Arctic fox transcriptome, two of which are fat metabolism genes. In the red fox transcriptome eight genes are under positive selection, including molecular process genes, notably genes involved in ATP metabolism. Analysis of the three transcriptomes and five Sanger re-sequenced genes in additional individuals identified a lower genetic variability within Arctic foxes compared to red foxes, which is consistent with distribution range differences and demographic responses to past climatic fluctuations. A phylogenomic analysis estimated that the Arctic and red fox lineages diverged about three million years ago. Transcriptome data are an economic way to generate genomic resources for evolutionary studies. Despite not representing an entire genome, this transcriptome analysis identified numerous genes that are relevant to arctic

  8. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

    Science.gov (United States)

    2013-01-01

    Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514

  9. Transcriptome Sequencing of Chemically Induced Aquilaria sinensis to Identify Genes Related to Agarwood Formation.

    Science.gov (United States)

    Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia

    2016-01-01

    Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various pathways, including the sesquiterpenoid (ko00909) and plant-pathogen interaction (ko03040) pathways. These pathways were related to sesquiterpenoid biosynthesis and defensive responses to chemical stimulation. The transcriptome data of the different parts of the chemically induced A. sinensis trunk provide a rich source of materials for discovering and identifying the genes involved in sesquiterpenoid production and in defensive responses to chemical stimulation. This study is the first to use de novo sequencing and transcriptome assembly for different parts of chemically induced A. sinensis. Results demonstrate that the sesquiterpenoid biosynthesis pathway and WRKY transcription factor play important roles in agarwood formation via chemical induction. The comparative analysis of the transcriptome data of agarwood and A. sinensis lays the foundation

  10. Transcriptome sequencing and De Novo analysis of Youngia japonica using the illumina platform.

    Directory of Open Access Journals (Sweden)

    Yulan Peng

    Full Text Available Youngia japonica, a weed species distributed worldwide, has been widely used in traditional Chinese medicine. It is an ideal plant for studying the evolution of Asteraceae plants because of its short life history and abundant source. However, little is known about its evolution and genetic diversity. In this study, de novo transcriptome sequencing was conducted for the first time for the comprehensive analysis of the genetic diversity of Y. japonica. The Y. japonica transcriptome was sequenced using Illumina paired-end sequencing technology. We produced 21,847,909 high-quality reads for Y. japonica and assembled them into contigs. A total of 51,850 unigenes were identified, among which 46,087 were annotated in the NCBI non-redundant protein database and 41,752 were annotated in the Swiss-Prot database. We mapped 9,125 unigenes onto 163 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database. In addition, 3,648 simple sequence repeats (SSRs were detected. Our data provide the most comprehensive transcriptome resource currently available for Y. japonica. C4 photosynthesis unigenes were found in the biological process of Y. japonica. There were 5596 unigenes related to defense response and 1344 ungienes related to signal transduction mechanisms (10.95%. These data provide insights into the genetic diversity of Y. japonica. Numerous SSRs contributed to the development of novel markers. These data may serve as a new valuable resource for genomic studies on Youngia and, more generally, Cichoraceae.

  11. Transcriptome analysis of blueberry using 454 EST sequencing

    Science.gov (United States)

    Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...

  12. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx

    Directory of Open Access Journals (Sweden)

    Colbourne John K

    2009-05-01

    Full Text Available Abstract Background New methods are needed for genomic-scale analysis of emerging model organisms that exemplify important biological questions but lack fully sequenced genomes. For example, there is an urgent need to understand the potential for corals to adapt to climate change, but few molecular resources are available for studying these processes in reef-building corals. To facilitate genomics studies in corals and other non-model systems, we describe methods for transcriptome sequencing using 454, as well as strategies for assembling a useful catalog of genes from the output. We have applied these methods to sequence the transcriptome of planulae larvae from the coral Acropora millepora. Results More than 600,000 reads produced in a single 454 sequencing run were assembled into ~40,000 contigs with five-fold average sequencing coverage. Based on sequence similarity with known proteins, these analyses identified ~11,000 different genes expressed in a range of conditions including thermal stress and settlement induction. Assembled sequences were annotated with gene names, conserved domains, and Gene Ontology terms. Targeted searches using these annotations identified the majority of genes associated with essential metabolic pathways and conserved signaling pathways, as well as novel candidate genes for stress-related processes. Comparisons with the genome of the anemone Nematostella vectensis revealed ~8,500 pairs of orthologs and ~100 candidate coral-specific genes. More than 30,000 SNPs were detected in the coral sequences, and a subset of these validated by re-sequencing. Conclusion The methods described here for deep sequencing of the transcriptome should be widely applicable to generate catalogs of genes and genetic markers in emerging model organisms. Our data provide the most comprehensive sequence resource currently available for reef-building corals, and include an extensive collection of potential genetic markers for association and

  13. Genetic signatures of adaptation revealed from transcriptome sequencing of Arctic and red foxes

    OpenAIRE

    Kumar, Vikas; Kutschera, Verena E.; Nilsson, Maria A.; Janke, Axel

    2015-01-01

    Background The genus Vulpes (true foxes) comprises numerous species that inhabit a wide range of habitats and climatic conditions, including one species, the Arctic fox (Vulpes lagopus) which is adapted to the arctic region. A close relative to the Arctic fox, the red fox (Vulpes vulpes), occurs in subarctic to subtropical habitats. To study the genetic basis of their adaptations to different environments, transcriptome sequences from two Arctic foxes and one red fox individual were generated...

  14. Comparative transcriptome sequencing and de novo analysis of Vaccinium corymbosum during fruit and color development.

    Science.gov (United States)

    Li, Lingli; Zhang, Hehua; Liu, Zhongshuai; Cui, Xiaoyue; Zhang, Tong; Li, Yanfang; Zhang, Lingyun

    2016-10-12

    Blueberry is an economically important fruit crop in Ericaceae family. The substantial quantities of flavonoids in blueberry have been implicated in a broad range of health benefits. However, the information regarding fruit development and flavonoid metabolites based on the transcriptome level is still limited. In the present study, the transcriptome and gene expression profiling over berry development, especially during color development were initiated. A total of approximately 13.67 Gbp of data were obtained and assembled into 186,962 transcripts and 80,836 unigenes from three stages of blueberry fruit and color development. A large number of simple sequence repeats (SSRs) and candidate genes, which are potentially involved in plant development, metabolic and hormone pathways, were identified. A total of 6429 sequences containing 8796 SSRs were characterized from 15,457 unigenes and 1763 unigenes contained more than one SSR. The expression profiles of key genes involved in anthocyanin biosynthesis were also studied. In addition, a comparison between our dataset and other published results was carried out. Our high quality reads produced in this study are an important advancement and provide a new resource for the interpretation of high-throughput data for blueberry species whether regarding sequencing data depth or species extension. The use of this transcriptome data will serve as a valuable public information database for the studies of blueberry genome and would greatly boost the research of fruit and color development, flavonoid metabolisms and regulation and breeding of more healthful blueberries.

  15. De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes.

    Directory of Open Access Journals (Sweden)

    Ingrid Garbus

    Full Text Available A long-standing goal in plant breeding has been the ability to confer apomixis to agriculturally relevant species, which would require a deeper comprehension of the molecular basis of apomictic regulatory mechanisms. Eragrostis curvula (Schrad. Nees is a perennial grass that includes both sexual and apomictic cytotypes. The availability of a reference transcriptome for this species would constitute a very important tool toward the identification of genes controlling key steps of the apomictic pathway. Here, we used Roche/454 sequencing technologies to generate reads from inflorescences of E. curvula apomictic and sexual genotypes that were de novo assembled into a reference transcriptome. Near 90% of the 49568 assembled isotigs showed sequence similarity to sequences deposited in the public databases. A gene ontology analysis categorized 27448 isotigs into at least one of the three main GO categories. We identified 11475 SSRs, and several of them were assayed in E curvula germoplasm using SSR-based primers, providing a valuable set of molecular markers that could allow direct allele selection. The differential contribution to each library of the spliced forms of several transcripts revealed the existence of several isotigs produced via alternative splicing of single genes. The reference transcriptome presented and validated in this work will be useful for the identification of a wide range of gene(s related to agronomic traits of E. curvula, including those controlling key steps of the apomictic pathway in this species, allowing the extrapolation of the findings to other plant species.

  16. Transcriptome sequencing and annotation for the Jamaican fruit bat (Artibeus jamaicensis.

    Directory of Open Access Journals (Sweden)

    Timothy I Shaw

    Full Text Available The Jamaican fruit bat (Artibeus jamaicensis is one of the most common bats in the tropical Americas. It is thought to be a potential reservoir host of Tacaribe virus, an arenavirus closely related to the South American hemorrhagic fever viruses. We performed transcriptome sequencing and annotation from lung, kidney and spleen tissues using 454 and Illumina platforms to develop this species as an animal model. More than 100,000 contigs were assembled, with 25,000 genes that were functionally annotated. Of the remaining unannotated contigs, 80% were found within bat genomes or transcriptomes. Annotated genes are involved in a broad range of activities ranging from cellular metabolism to genome regulation through ncRNAs. Reciprocal BLAST best hits yielded 8,785 sequences that are orthologous to mouse, rat, cattle, horse and human. Species tree analysis of sequences from 2,378 loci was used to achieve 95% bootstrap support for the placement of bat as sister to the clade containing horse, dog, and cattle. Through substitution rate estimation between bat and human, 32 genes were identified with evidence for positive selection. We also identified 466 immune-related genes, which may be useful for studying Tacaribe virus infection of this species. The Jamaican fruit bat transcriptome dataset is a resource that should provide additional candidate markers for studying bat evolution and ecology, and tools for analysis of the host response and pathology of disease.

  17. "Transcriptomics": molecular diagnosis of inborn errors of metabolism via RNA-sequencing.

    Science.gov (United States)

    Kremer, Laura S; Wortmann, Saskia B; Prokisch, Holger

    2018-01-25

    Exome wide sequencing techniques have revolutionized molecular diagnostics in patients with suspected inborn errors of metabolism or neuromuscular disorders. However, the diagnostic yield of 25-60% still leaves a large fraction of individuals without a diagnosis. This indicates a causative role for non-exonic regulatory variants not covered by whole exome sequencing. Here we review how systematic RNA-sequencing analysis (RNA-seq, "transcriptomics") lead to a molecular diagnosis in 10-35% of patients in whom whole exome sequencing failed to do so. Importantly, RNA-sequencing based discoveries cannot only guide molecular diagnosis but might also unravel therapeutic intervention points such as antisense oligonucleotide treatment for splicing defects as recently reported for spinal muscular atrophy.

  18. Application of affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression

    International Nuclear Information System (INIS)

    Oudes, Asa J; Roach, Jared C; Walashek, Laura S; Eichner, Lillian J; True, Lawrence D; Vessella, Robert L; Liu, Alvin Y

    2005-01-01

    Affymetrix GeneChip Array and Massively Parallel Signature Sequencing (MPSS) are two high throughput methodologies used to profile transcriptomes. Each method has certain strengths and weaknesses; however, no comparison has been made between the data derived from Affymetrix arrays and MPSS. In this study, two lineage-related prostate cancer cell lines, LNCaP and C4-2, were used for transcriptome analysis with the aim of identifying genes associated with prostate cancer progression. Affymetrix GeneChip array and MPSS analyses were performed. Data was analyzed with GeneSpring 6.2 and in-house perl scripts. Expression array results were verified with RT-PCR. Comparison of the data revealed that both technologies detected genes the other did not. In LNCaP, 3,180 genes were only detected by Affymetrix and 1,169 genes were only detected by MPSS. Similarly, in C4-2, 4,121 genes were only detected by Affymetrix and 1,014 genes were only detected by MPSS. Analysis of the combined transcriptomes identified 66 genes unique to LNCaP cells and 33 genes unique to C4-2 cells. Expression analysis of these genes in prostate cancer specimens showed CA1 to be highly expressed in bone metastasis but not expressed in primary tumor and EPHA7 to be expressed in normal prostate and primary tumor but not bone metastasis. Our data indicates that transcriptome profiling with a single methodology will not fully assess the expression of all genes in a cell line. A combination of transcription profiling technologies such as DNA array and MPSS provides a more robust means to assess the expression profile of an RNA sample. Finally, genes that were differentially expressed in cell lines were also differentially expressed in primary prostate cancer and its metastases

  19. De novo sequencing, assembly and characterization of antennal transcriptome of Anomala corpulenta Motschulsky (Coleoptera: Rutelidae.

    Directory of Open Access Journals (Sweden)

    Haoliang Chen

    Full Text Available Anomala corpulenta is an important insect pest and can cause enormous economic losses in agriculture, horticulture and forestry. It is widely distributed in China, and both larvae and adults can cause serious damage. It is difficult to control this pest because the larvae live underground. Any new control strategy should exploit alternatives to heavily and frequently used chemical insecticides. However, little genetic research has been carried out on A. corpulenta due to the lack of genomic resources. Genomic resources could be produced by next generation sequencing technologies with low cost and in a short time. In this study, we performed de novo sequencing, assembly and characterization of the antennal transcriptome of A. corpulenta.Illumina sequencing technology was used to sequence the antennal transcriptome of A. corpulenta. Approximately 76.7 million total raw reads and about 68.9 million total clean reads were obtained, and then 35,656 unigenes were assembled. Of these unigenes, 21,463 of them could be annotated in the NCBI nr database, and, among the annotated unigenes, 11,154 and 6,625 unigenes could be assigned to GO and COG, respectively. Additionally, 16,350 unigenes could be annotated in the Swiss-Prot database, and 14,499 unigenes could map onto 258 pathways in the KEGG Pathway database. We also found 24 unigenes related to OBPs, 6 to CSPs, and in total 167 unigenes related to chemodetection. We analyzed 4 OBPs and 3CSPs sequences and their RT-qPCR results agreed well with their FPKM values.We produced the first large-scale antennal transcriptome of A. corpulenta, which is a species that has little genomic information in public databases. The identified chemodetection unigenes can promote the molecular mechanistic study of behavior in A. corpulenta. These findings provide a general sequence resource for molecular genetics research on A. corpulenta.

  20. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    Science.gov (United States)

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as

  1. Parallel epigenomic and transcriptomic responses to viral infection in honey bees (Apis mellifera).

    Science.gov (United States)

    Galbraith, David A; Yang, Xingyu; Niño, Elina Lastro; Yi, Soojin; Grozinger, Christina

    2015-03-01

    Populations of honey bees are declining throughout the world, with US beekeepers losing 30% of their colonies each winter. Though multiple factors are driving these colony losses, it is increasingly clear that viruses play a major role. However, information about the molecular mechanisms mediating antiviral immunity in honey bees is surprisingly limited. Here, we examined the transcriptional and epigenetic (DNA methylation) responses to viral infection in honey bee workers. One-day old worker honey bees were fed solutions containing Israeli Acute Paralysis Virus (IAPV), a virus which causes muscle paralysis and death and has previously been associated with colony loss. Uninfected control and infected, symptomatic bees were collected within 20-24 hours after infection. Worker fat bodies, the primary tissue involved in metabolism, detoxification and immune responses, were collected for analysis. We performed transcriptome- and bisulfite-sequencing of the worker fat bodies to identify genome-wide gene expression and DNA methylation patterns associated with viral infection. There were 753 differentially expressed genes (FDR<0.05) in infected versus control bees, including several genes involved in epigenetic and antiviral pathways. DNA methylation status of 156 genes (FDR<0.1) changed significantly as a result of the infection, including those involved in antiviral responses in humans. There was no significant overlap between the significantly differentially expressed and significantly differentially methylated genes, and indeed, the genomic characteristics of these sets of genes were quite distinct. Our results indicate that honey bees have two distinct molecular pathways, mediated by transcription and methylation, that modulate protein levels and/or function in response to viral infections.

  2. Parallel epigenomic and transcriptomic responses to viral infection in honey bees (Apis mellifera.

    Directory of Open Access Journals (Sweden)

    David A Galbraith

    2015-03-01

    Full Text Available Populations of honey bees are declining throughout the world, with US beekeepers losing 30% of their colonies each winter. Though multiple factors are driving these colony losses, it is increasingly clear that viruses play a major role. However, information about the molecular mechanisms mediating antiviral immunity in honey bees is surprisingly limited. Here, we examined the transcriptional and epigenetic (DNA methylation responses to viral infection in honey bee workers. One-day old worker honey bees were fed solutions containing Israeli Acute Paralysis Virus (IAPV, a virus which causes muscle paralysis and death and has previously been associated with colony loss. Uninfected control and infected, symptomatic bees were collected within 20-24 hours after infection. Worker fat bodies, the primary tissue involved in metabolism, detoxification and immune responses, were collected for analysis. We performed transcriptome- and bisulfite-sequencing of the worker fat bodies to identify genome-wide gene expression and DNA methylation patterns associated with viral infection. There were 753 differentially expressed genes (FDR<0.05 in infected versus control bees, including several genes involved in epigenetic and antiviral pathways. DNA methylation status of 156 genes (FDR<0.1 changed significantly as a result of the infection, including those involved in antiviral responses in humans. There was no significant overlap between the significantly differentially expressed and significantly differentially methylated genes, and indeed, the genomic characteristics of these sets of genes were quite distinct. Our results indicate that honey bees have two distinct molecular pathways, mediated by transcription and methylation, that modulate protein levels and/or function in response to viral infections.

  3. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Gonzalo H Villarino

    Full Text Available Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  4. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    Science.gov (United States)

    Villarino, Gonzalo H; Bombarely, Aureliano; Giovannoni, James J; Scanlon, Michael J; Mattson, Neil S

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  5. Transcriptome profiling of testis during sexual maturation stages in Eriocheir sinensis using Illumina sequencing.

    Directory of Open Access Journals (Sweden)

    Lin He

    Full Text Available The testis is a highly specialized tissue that plays dual roles in ensuring fertility by producing spermatozoa and hormones. Spermatogenesis is a complex process, resulting in the production of mature sperm from primordial germ cells. Significant structural and biochemical changes take place in the seminiferous epithelium of the adult testis during spermatogenesis. The gene expression pattern of testis in Chinese mitten crab (Eriocheir sinensis has not been extensively studied, and limited genetic research has been performed on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for testis of E. sinensis. In two runs, we produced 25,698,778 sequencing reads corresponding with 2.31 Gb total nucleotides. These reads were assembled into 342,753 contigs or 141,861 scaffold sequences, which identified 96,311 unigenes. Based on similarity searches with known proteins, 39,995 unigenes were annotated based on having a Blast hit in the non-redundant database or ESTscan results with a cut-off E-value above 10(-5. This is the first report of a mitten crab transcriptome using high-throughput sequencing technology, and all these testes transcripts can help us understand the molecular mechanisms involved in spermatogenesis and testis maturation.

  6. Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes

    Science.gov (United States)

    Bruskiewich, Richard; Burris, Jason N.; Carrigan, Charlotte T.; Chase, Mark W.; Clarke, Neil D.; Covshoff, Sarah; dePamphilis, Claude W.; Edger, Patrick P.; Goh, Falicia; Graham, Sean; Greiner, Stephan; Hibberd, Julian M.; Jordon-Thaden, Ingrid; Kutchan, Toni M.; Leebens-Mack, James; Melkonian, Michael; Miles, Nicholas; Myburg, Henrietta; Patterson, Jordan; Pires, J. Chris; Ralph, Paula; Rolf, Megan; Sage, Rowan F.; Soltis, Douglas; Soltis, Pamela; Stevenson, Dennis; Stewart, C. Neal; Surek, Barbara; Thomsen, Christina J. M.; Villarreal, Juan Carlos; Wu, Xiaolei; Zhang, Yong; Deyholos, Michael K.; Wong, Gane Ka-Shu

    2012-01-01

    Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers. PMID:23185583

  7. Sequencing and de novo transcriptome assembly of the Chinese giant salamander (Andrias davidianus

    Directory of Open Access Journals (Sweden)

    Yong Huang

    2017-06-01

    Full Text Available Next-generation technologies for determination of genomics and transcriptomics composition have a wide range of applications. Andrias davidianus, has become an endangered amphibian species of salamander endemic in China. However, there is a lack of the molecular information. In this study, we obtained the RNA-Seq data from a pool of A. davidianus tissue including spleen, liver, muscle, kidney, skin, testis, gut and heart using Illumina HiSeq 2500 platform. A total of 15,398,997,600 bp were obtained, corresponding to 102,659,984 raw reads. A total of 102,659,984 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 132,912 unigenes with an average length of 690 bp and N50 of 1263 bp. Unigenes were annotated through number of databases. These transcriptomic data of A. davidianus should open the door to molecular evolution studies based on the entire transcriptome or targeted genes of interest to sequence. The raw data in this study can be available in NCBI SRA database with accession number of SRP099564.

  8. The Pseudomonas aeruginosa transcriptome in planktonic cultures and static biofilms using RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Andreas Dötsch

    Full Text Available In this study, we evaluated how gene expression differs in mature Pseudomonas aeruginosa biofilms as opposed to planktonic cells by the use of RNA sequencing technology that gives rise to both quantitative and qualitative information on the transcriptome. Although a large proportion of genes were consistently regulated in both the stationary phase and biofilm cultures as opposed to the late exponential growth phase cultures, the global biofilm gene expression pattern was clearly distinct indicating that biofilms are not just surface attached cells in stationary phase. A large amount of the genes found to be biofilm specific were involved in adaptation to microaerophilic growth conditions, repression of type three secretion and production of extracellular matrix components. Additionally, we found many small RNAs to be differentially regulated most of them similarly in stationary phase cultures and biofilms. A qualitative analysis of the RNA-seq data revealed more than 3000 putative transcriptional start sites (TSS. By the use of rapid amplification of cDNA ends (5'-RACE we confirmed the presence of three different TSS associated with the pqsABCDE operon, two in the promoter of pqsA and one upstream of the second gene, pqsB. Taken together, this study reports the first transcriptome study on P. aeruginosa that employs RNA sequencing technology and provides insights into the quantitative and qualitative transcriptome including the expression of small RNAs in P. aeruginosa biofilms.

  9. Transcriptome sequencing of two phenotypic mosaic Eucalyptus trees reveals large scale transcriptome re-modelling.

    Directory of Open Access Journals (Sweden)

    Amanda Padovan

    Full Text Available Phenotypic mosaic trees offer an ideal system for studying differential gene expression. We have investigated two mosaic eucalypt trees from two closely related species (Eucalyptus melliodora and E. sideroxylon, which each support two types of leaves: one part of the canopy is resistant to insect herbivory and the remaining leaves are susceptible. Driving this ecological distinction are differences in plant secondary metabolites. We used these phenotypic mosaics to investigate genome wide patterns of foliar gene expression with the aim of identifying patterns of differential gene expression and the somatic mutation(s that lead to this phenotypic mosaicism. We sequenced the mRNA pool from leaves of the resistant and susceptible ecotypes from both mosaic eucalypts using the Illumina HiSeq 2000 platform. We found large differences in pathway regulation and gene expression between the ecotypes of each mosaic. The expression of the genes in the MVA and MEP pathways is reflected by variation in leaf chemistry, however this is not the case for the terpene synthases. Apart from the terpene biosynthetic pathway, there are several other metabolic pathways that are differentially regulated between the two ecotypes, suggesting there is much more phenotypic diversity than has been described. Despite the close relationship between the two species, they show large differences in the global patterns of gene and pathway regulation.

  10. RNA sequencing of the exercise transcriptome in equine athletes.

    Directory of Open Access Journals (Sweden)

    Stefano Capomaccio

    Full Text Available The horse is an optimal model organism for studying the genomic response to exercise-induced stress, due to its natural aptitude for athletic performance and the relative homogeneity of its genetic and environmental backgrounds. Here, we applied RNA-sequencing analysis through the use of SOLiD technology in an experimental framework centered on exercise-induced stress during endurance races in equine athletes. We monitored the transcriptional landscape by comparing gene expression levels between animals at rest and after competition. Overall, we observed a shift from coding to non-coding regions, suggesting that the stress response involves the differential expression of not annotated regions. Notably, we observed significant post-race increases of reads that correspond to repeats, especially the intergenic and intronic L1 and L2 transposable elements. We also observed increased expression of the antisense strands compared to the sense strands in intronic and regulatory regions (1 kb up- and downstream of the genes, suggesting that antisense transcription could be one of the main mechanisms for transposon regulation in the horse under stress conditions. We identified a large number of transcripts corresponding to intergenic and intronic regions putatively associated with new transcriptional elements. Gene expression and pathway analysis allowed us to identify several biological processes and molecular functions that may be involved with exercise-induced stress. Ontology clustering reflected mechanisms that are already known to be stress activated (e.g., chemokine-type cytokines, Toll-like receptors, and kinases, as well as "nucleic acid binding" and "signal transduction activity" functions. There was also a general and transient decrease in the global rates of protein synthesis, which would be expected after strenuous global stress. In sum, our network analysis points toward the involvement of specific gene clusters in equine exercise

  11. Sequencing and Characterization of Divergent Marbling Levels in the Beef Cattle ( Muscle Transcriptome

    Directory of Open Access Journals (Sweden)

    Dong Chen

    2015-02-01

    Full Text Available Marbling is an important trait regarding the quality of beef. Analysis of beef cattle transcriptome and its expression profile data are essential to extend the genetic information resources and would support further studies on beef cattle. RNA sequencing was performed in beef cattle using the Illumina High-Seq2000 platform. Approximately 251.58 million clean reads were generated from a high marbling (H group and low marbling (L group. Approximately 80.12% of the 19,994 bovine genes (protein coding were detected in all samples, and 749 genes exhibited differential expression between the H and L groups based on fold change (>1.5-fold, p<0.05. Multiple gene ontology terms and biological pathways were found significantly enriched among the differentially expressed genes. The transcriptome data will facilitate future functional studies on marbling formation in beef cattle and may be applied to improve breeding programs for cattle and closely related mammals.

  12. Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

    Science.gov (United States)

    Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

    2018-06-03

    Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.

  13. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    Science.gov (United States)

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic

  14. Deep RNA sequencing of the skeletal muscle transcriptome in swimming fish.

    Directory of Open Access Journals (Sweden)

    Arjan P Palstra

    Full Text Available Deep RNA sequencing (RNA-seq was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss with the specific objective to identify expressed genes and quantify the transcriptomic effects of swimming-induced exercise. Pubertal autumn-spawning seawater-raised female rainbow trout were rested (n = 10 or swum (n = 10 for 1176 km at 0.75 body-lengths per second in a 6,000-L swim-flume under reproductive conditions for 40 days. Red and white muscle RNA of exercised and non-exercised fish (4 lanes was sequenced and resulted in 15-17 million reads per lane that, after de novo assembly, yielded 149,159 red and 118,572 white muscle contigs. Most contigs were annotated using an iterative homology search strategy against salmonid ESTs, the zebrafish Danio rerio genome and general Metazoan genes. When selecting for large contigs (>500 nucleotides, a number of novel rainbow trout gene sequences were identified in this study: 1,085 and 1,228 novel gene sequences for red and white muscle, respectively, which included a number of important molecules for skeletal muscle function. Transcriptomic analysis revealed that sustained swimming increased transcriptional activity in skeletal muscle and specifically an up-regulation of genes involved in muscle growth and developmental processes in white muscle. The unique collection of transcripts will contribute to our understanding of red and white muscle physiology, specifically during the long-term reproductive migration of salmonids.

  15. Sequencing and Characterization of the Invasive Sycamore Lace Bug Corythucha ciliata (Hemiptera: Tingidae) Transcriptome

    Science.gov (United States)

    Qu, Cheng; Fu, Ningning; Xu, Yihua

    2016-01-01

    The sycamore lace bug, Corythucha ciliata (Hemiptera: Tingidae), is an invasive forestry pest rapidly expanding in many countries. This pest poses a considerable threat to the urban forestry ecosystem, especially to Platanus spp. However, its molecular biology and biochemistry are poorly understood. This study reports the first C. ciliata transcriptome, encompassing three different life stages (Nymphs, adults female (AF) and adults male (AM)). In total, 26.53 GB of clean data and 60,879 unigenes were obtained from three RNA-seq libraries. These unigenes were annotated and classified by Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), and KO (KEGG Ortholog database). After all pairwise comparisons between these three different samples, a large number of differentially expressed genes were revealed. The dramatic differences in global gene expression profiles were found between distinct life stages (nymphs and AF, nymphs and AM) and sex difference (AF and AM), with some of the significantly differentially expressed genes (DEGs) being related to metamorphosis, digestion, immune and sex difference. The different express of unigenes were validated through quantitative Real-Time PCR (qRT-PCR) for 16 randomly selected unigenes. In addition, 17,462 potential simple sequence repeat molecular markers were identified in these transcriptome resources. These comprehensive C. ciliata transcriptomic information can be utilized to promote the development of environmentally friendly methodologies to disrupt the processes of metamorphosis, digestion, immune and sex differences. PMID:27494615

  16. Analysis of Litopenaeus vannamei transcriptome using the next-generation DNA sequencing technique.

    Directory of Open Access Journals (Sweden)

    Chaozheng Li

    Full Text Available BACKGROUND: Pacific white shrimp (Litopenaeus vannamei, the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. METHODOLOGY/PRINCIPAL FINDINGS: This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG categories, 8171 unigenes were assigned into 51 Gene ontology (GO functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. CONCLUSIONS/SIGNIFICANCE: The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei.

  17. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  18. Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis

    Directory of Open Access Journals (Sweden)

    Roodt-Wilding Rouvay

    2011-03-01

    Full Text Available Abstract Background Worldwide, the genus Haliotis is represented by 56 extant species and several of these are commercially cultured. Among the six abalone species found in South Africa, Haliotis midae is the only aquacultured species. Despite its economic importance, genomic sequence resources for H. midae, and for abalone in general, are still scarce. Next generation sequencing technologies provide a fast and efficient tool to generate large sequence collections that can be used to characterize the transcriptome and identify expressed genes associated with economically important traits like growth and disease resistance. Results More than 25 million short reads generated by the Illumina Genome Analyzer were de novo assembled in 22,761 contigs with an average size of 260 bp. With a stringent E-value threshold of 10-10, 3,841 contigs (16.8% had a BLAST homologous match against the Genbank non-redundant (NR protein database. Most of these sequences were annotated using the gene ontology (GO and eukaryotic orthologous groups of proteins (KOG databases and assigned to various functional categories. According to annotation results, many gene families involved in immune response were identified. Thousands of simple sequence repeats (SSR and single nucleotide polymorphisms (SNP were detected. Setting stringent parameters to ensure a high probability of amplification, 420 primer pairs in 181 contigs containing SSR loci were designed. Conclusion This data represents the most comprehensive genomic resource for the South African abalone H. midae to date. The amount of assembled sequences demonstrated the utility of the Illumina sequencing technology in the transcriptome characterization of a non-model species. It allowed the development of several markers and the identification of promising candidate genes for future studies on population and functional genomics in H. midae and in other abalone species.

  19. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing

    Directory of Open Access Journals (Sweden)

    Tiancheng Liu

    2015-01-01

    Full Text Available High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the “funnel-like” model and the “hourglass” model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  20. Deep Sequencing of Porphyromonas gingivalis and comparative transcriptome analysis of a LuxS mutant

    Directory of Open Access Journals (Sweden)

    Takanoi eHirano

    2012-06-01

    Full Text Available Porphyromonas gingivalis is a major etiological agent and chronic and aggressive forms of periodontal disease. The organism is an assacharolytic anaerobe and is a constituent of mixed species biofilms in a variety of microenvironments in the oral cavity. P. gingivalis expresses a range of virulence factors over which it exerts tight control. High-throughput sequencing technologies provide the opportunity to relate functional genomics to basic biology. In this study we report qualitative and quantitative RNA-Seq analysis of the transcriptome of P. gingivalis. We have also applied RNA-Seq to the transcriptome of a ΔluxS mutant of P. gingivalis deficient in AI-2-mediated bacterial communication. The transcriptome analysis confirmed the expression of all predicted ORFs for strain ATCC 33277, including 854 hypothetical proteins, and allowed the identification of hitherto unknown transcriptional units. Twelve noncoding RNAs were identified, including 11 small RNAs and one cobalamine riboswitch. Fifty seven genes were differentially regulated in the LuxS mutant. Addition of exogenous synthetic 4,5-dihydroxy-2,3-pentanedione (DPD, AI-2 precursor to the ΔluxS mutant culture complemented expression of a subset of genes, indicating that LuxS is involved in both AI-2 signaling and non-signaling dependent systems in P. gingivalis. This work provides an important dataset for future study of P. gingivalis pathophysiology and further defines the LuxS regulon in this oral pathogen.

  1. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    Science.gov (United States)

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  2. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

    Science.gov (United States)

    2013-01-01

    Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255

  3. Integrated analysis of whole-exome sequencing and transcriptome profiling in males with autism spectrum disorders.

    Science.gov (United States)

    Codina-Solà, Marta; Rodríguez-Santiago, Benjamín; Homs, Aïda; Santoyo, Javier; Rigau, Maria; Aznar-Laín, Gemma; Del Campo, Miguel; Gener, Blanca; Gabau, Elisabeth; Botella, María Pilar; Gutiérrez-Arumí, Armand; Antiñolo, Guillermo; Pérez-Jurado, Luis Alberto; Cuscó, Ivon

    2015-01-01

    Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders with high heritability. Recent findings support a highly heterogeneous and complex genetic etiology including rare de novo and inherited mutations or chromosomal rearrangements as well as double or multiple hits. We performed whole-exome sequencing (WES) and blood cell transcriptome by RNAseq in a subset of male patients with idiopathic ASD (n = 36) in order to identify causative genes, transcriptomic alterations, and susceptibility variants. We detected likely monogenic causes in seven cases: five de novo (SCN2A, MED13L, KCNV1, CUL3, and PTEN) and two inherited X-linked variants (MAOA and CDKL5). Transcriptomic analyses allowed the identification of intronic causative mutations missed by the usual filtering of WES and revealed functional consequences of some rare mutations. These included aberrant transcripts (PTEN, POLR3C), deregulated expression in 1.7% of mutated genes (that is, SEMA6B, MECP2, ANK3, CREBBP), allele-specific expression (FUS, MTOR, TAF1C), and non-sense-mediated decay (RIT1, ALG9). The analysis of rare inherited variants showed enrichment in relevant pathways such as the PI3K-Akt signaling and the axon guidance. Integrative analysis of WES and blood RNAseq data has proven to be an efficient strategy to identify likely monogenic forms of ASD (19% in our cohort), as well as additional rare inherited mutations that can contribute to ASD risk in a multifactorial manner. Blood transcriptomic data, besides validating 88% of expressed variants, allowed the identification of missed intronic mutations and revealed functional correlations of genetic variants, including changes in splicing, expression levels, and allelic expression.

  4. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine...... primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution...

  5. A programmable method for massively parallel targeted sequencing

    Science.gov (United States)

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  6. Prevalence of single nucleotide polymorphism among 27 diverse alfalfa genotypes as assessed by transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Li Xuehui

    2012-10-01

    Full Text Available Abstract Background Alfalfa, a perennial, outcrossing species, is a widely planted forage legume producing highly nutritious biomass. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker assisted breeding strategies can enhance alfalfa improvement efforts, particularly if many genome-wide markers are available. Transcriptome sequencing enables efficient high-throughput discovery of single nucleotide polymorphism (SNP markers for a complex polyploid species. Result The transcriptomes of 27 alfalfa genotypes, including elite breeding genotypes, parents of mapping populations, and unimproved wild genotypes, were sequenced using an Illumina Genome Analyzer IIx. De novo assembly of quality-filtered 72-bp reads generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, with an average read depth of 55.9-fold for each genotype. Overall, 21,954 (87.2% of the 25,183 contigs represented 14,878 unique protein accessions. Gene ontology (GO analysis suggested that a broad diversity of genes was represented in the resulting sequences. The realignment of individual reads to the contigs enabled the detection of 872,384 SNPs and 31,760 InDels. High resolution melting (HRM analysis was used to validate 91% of 192 putative SNPs identified by sequencing. Both allelic variants at about 95% of SNP sites identified among five wild, unimproved genotypes are still present in cultivated alfalfa, and all four US breeding programs also contain a high proportion of these SNPs. Thus, little evidence exists among this dataset for loss of significant DNA sequence diversity from either domestication or breeding of alfalfa. Structure analysis indicated that individuals from the subspecies falcata, the diploid subspecies caerulea, and the tetraploid subspecies sativa (cultivated tetraploid alfalfa were clearly separated. Conclusion We used transcriptome sequencing to discover large numbers of SNPs

  7. Sequence protein identification by randomized sequence database and transcriptome mass spectrometry (SPIDER-TMS): from manual to automatic application of a 'de novo sequencing' approach.

    Science.gov (United States)

    Pascale, Raffaella; Grossi, Gerarda; Cruciani, Gabriele; Mecca, Giansalvatore; Santoro, Donatello; Sarli Calace, Renzo; Falabella, Patrizia; Bianco, Giuliana

    Sequence protein identification by a randomized sequence database and transcriptome mass spectrometry software package has been developed at the University of Basilicata in Potenza (Italy) and designed to facilitate the determination of the amino acid sequence of a peptide as well as an unequivocal identification of proteins in a high-throughput manner with enormous advantages of time, economical resource and expertise. The software package is a valid tool for the automation of a de novo sequencing approach, overcoming the main limits and a versatile platform useful in the proteomic field for an unequivocal identification of proteins, starting from tandem mass spectrometry data. The strength of this software is that it is a user-friendly and non-statistical approach, so protein identification can be considered unambiguous.

  8. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.

    Directory of Open Access Journals (Sweden)

    Mukesh Kumar

    Full Text Available The cereal cyst nematode (CCN, Heterodera avenae is a major pest of wheat (Triticum spp that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy revealed fewer glycoside hydrolases (GHs but more glycosyl transferases (GTs and carbohydrate esterases (CEs when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.

  9. Analyzing AbrB-Knockout Effects through Genome and Transcriptome Sequencing of Bacillus licheniformis DW2

    Science.gov (United States)

    Shu, Cheng-Cheng; Wang, Dong; Guo, Jing; Song, Jia-Ming; Chen, Shou-Wen; Chen, Ling-Ling; Gao, Jun-Xiang

    2018-01-01

    As an industrial bacterium, Bacillus licheniformis DW2 produces bacitracin which is an important antibiotic for many pathogenic microorganisms. Our previous study showed AbrB-knockout could significantly increase the production of bacitracin. Accordingly, it was meaningful to understand its genome features, expression differences between wild and AbrB-knockout (ΔAbrB) strains, and the regulation of bacitracin biosynthesis. Here, we sequenced, de novo assembled and annotated its genome, and also sequenced the transcriptomes in three growth phases. The genome of DW2 contained a DNA molecule of 4,468,952 bp with 45.93% GC content and 4,717 protein coding genes. The transcriptome reads were mapped to the assembled genome, and obtained 4,102∼4,536 expressed genes from different samples. We investigated transcription changes in B. licheniformis DW2 and showed that ΔAbrB caused hundreds of genes up-regulation and down-regulation in different growth phases. We identified a complete bacitracin synthetase gene cluster, including the location and length of bacABC, bcrABC, and bacT, as well as their arrangement. The gene cluster bcrABC were significantly up-regulated in ΔAbrB strain, which supported the hypothesis in previous study of bcrABC transporting bacitracin out of the cell to avoid self-intoxication, and was consistent with the previous experimental result that ΔAbrB could yield more bacitracin. This study provided a high quality reference genome for B. licheniformis DW2, and the transcriptome data depicted global alterations across two strains and three phases offered an understanding of AbrB regulation and bacitracin biosynthesis through gene expression. PMID:29599755

  10. De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae.

    Science.gov (United States)

    Kumar, Mukesh; Gantasala, Nagavara Prasad; Roychowdhury, Tanmoy; Thakur, Prasoon Kumar; Banakar, Prakash; Shukla, Rohit N; Jones, Michael G K; Rao, Uma

    2014-01-01

    The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.

  11. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    Science.gov (United States)

    Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

    2001-01-01

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

  12. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    Science.gov (United States)

    Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

    2001-10-09

    Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

  13. Brain transcriptome sequencing and assembly of three songbird model systems for the study of social behavior

    Directory of Open Access Journals (Sweden)

    Christopher N. Balakrishnan

    2014-05-01

    Full Text Available Emberizid sparrows (emberizidae have played a prominent role in the study of avian vocal communication and social behavior. We present here brain transcriptomes for three emberizid model systems, song sparrow Melospiza melodia, white-throated sparrow Zonotrichia albicollis, and Gambel’s white-crowned sparrow Zonotrichia leucophrys gambelii. Each of the assemblies covered fully or in part, over 89% of the previously annotated protein coding genes in the zebra finch Taeniopygia guttata, with 16,846, 15,805, and 16,646 unique BLAST hits in song, white-throated and white-crowned sparrows, respectively. As in previous studies, we find tissue of origin (auditory forebrain versus hypothalamus and whole brain as an important determinant of overall expression profile. We also demonstrate the successful isolation of RNA and RNA-sequencing from post-mortem samples from building strikes and suggest that such an approach could be useful when traditional sampling opportunities are limited. These transcriptomes will be an important resource for the study of social behavior in birds and for data driven annotation of forthcoming whole genome sequences for these and other bird species.

  14. Improving transcriptome assembly through error correction of high-throughput sequence reads

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2013-07-01

    Full Text Available The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.

  15. Sequencing and analysis of the gastrula transcriptome of the brittle star Ophiocoma wendtii

    Directory of Open Access Journals (Sweden)

    Vaughn Roy

    2012-09-01

    Full Text Available Abstract Background The gastrula stage represents the point in development at which the three primary germ layers diverge. At this point the gene regulatory networks that specify the germ layers are established and the genes that define the differentiated states of the tissues have begun to be activated. These networks have been well-characterized in sea urchins, but not in other echinoderms. Embryos of the brittle star Ophiocoma wendtii share a number of developmental features with sea urchin embryos, including the ingression of mesenchyme cells that give rise to an embryonic skeleton. Notable differences are that no micromeres are formed during cleavage divisions and no pigment cells are formed during development to the pluteus larval stage. More subtle changes in timing of developmental events also occur. To explore the molecular basis for the similarities and differences between these two echinoderms, we have sequenced and characterized the gastrula transcriptome of O. wendtii. Methods Development of Ophiocoma wendtii embryos was characterized and RNA was isolated from the gastrula stage. A transcriptome data base was generated from this RNA and was analyzed using a variety of methods to identify transcripts expressed and to compare those transcripts to those expressed at the gastrula stage in other organisms. Results Using existing databases, we identified brittle star transcripts that correspond to 3,385 genes, including 1,863 genes shared with the sea urchin Strongylocentrotus purpuratus gastrula transcriptome. We characterized the functional classes of genes present in the transcriptome and compared them to those found in this sea urchin. We then examined those members of the germ-layer specific gene regulatory networks (GRNs of S. purpuratus that are expressed in the O. wendtii gastrula. Our results indicate that there is a shared ‘genetic toolkit’ central to the echinoderm gastrula, a key stage in embryonic development, though

  16. Whole-genome and Transcriptome Sequencing of Prostate Cancer Identify New Genetic Alterations Driving Disease Progression

    DEFF Research Database (Denmark)

    Ren, Shancheng; Wei, Gong-Hong; Liu, Dongbing

    2018-01-01

    BACKGROUND: Global disparities in prostate cancer (PCa) incidence highlight the urgent need to identify genomic abnormalities in prostate tumors in different ethnic populations including Asian men. OBJECTIVE: To systematically explore the genomic complexity and define disease-driven genetic......-scale and comprehensive genomic data of prostate cancer from Asian population. Identification of these genetic alterations may help advance prostate cancer diagnosis, prognosis, and treatment....... alterations in PCa. DESIGN, SETTING, AND PARTICIPANTS: The study sequenced whole-genome and transcriptome of tumor-benign paired tissues from 65 treatment-naive Chinese PCa patients. Subsequent targeted deep sequencing of 293 PCa-relevant genes was performed in another cohort of 145 prostate tumors. OUTCOME...

  17. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    Science.gov (United States)

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the

  18. Gene discovery in the threatened elkhorn coral: 454 sequencing of the Acropora palmata transcriptome.

    Directory of Open Access Journals (Sweden)

    Nicholas R Polato

    Full Text Available BACKGROUND: Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. RESULTS: A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000. The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. CONCLUSIONS: Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite

  19. Gene discovery in the threatened elkhorn coral: 454 sequencing of the Acropora palmata transcriptome.

    Science.gov (United States)

    Polato, Nicholas R; Vera, J Cristobal; Baums, Iliana B

    2011-01-01

    Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000). The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite considerable exposure to genotoxic stress over long life

  20. The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

    Energy Technology Data Exchange (ETDEWEB)

    Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.

    2010-01-27

    Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in

  1. Fast skeletal muscle transcriptome of the Gilthead sea bream (Sparus aurata determined by next generation sequencing

    Directory of Open Access Journals (Sweden)

    Garcia de la serrana Daniel

    2012-05-01

    Full Text Available Abstract Background The gilthead sea bream (Sparus aurata L. occurs around the Mediterranean and along Eastern Atlantic coasts from Great Britain to Senegal. It is tolerant of a wide range of temperatures and salinities and is often found in brackish coastal lagoons and estuarine areas, particularly early in its life cycle. Gilthead sea bream are extensively cultivated in the Mediterranean with an annual production of 125,000 metric tonnes. Here we present a de novo assembly of the fast skeletal muscle transcriptome of gilthead sea bream using 454 reads and identify gene paralogues, splice variants and microsatellite repeats. An annotated transcriptome of the skeletal muscle will facilitate understanding of the genetic and molecular basis of traits linked to production in this economically important species. Results Around 2.7 million reads of mRNA sequence data were generated from the fast myotomal of adult fish (~2 kg and juvenile fish (~0.09 kg that had been either fed to satiation, fasted for 3-5d or transferred to low (11°C or high (33°C temperatures for 3-5d. Newbler v2.5 assembly resulted in 43,461 isotigs >100 bp. The number of sequences annotated by searching protein and gene ontology databases was 10,465. The average coverage of the annotated isotigs was x40 containing 5655 unique gene IDs and 785 full-length cDNAs coding for proteins containing 58–1536 amino acids. The v2.5 assembly was found to be of good quality based on validation using 200 full-length cDNAs from GenBank. Annotated isotigs from the reference transcriptome were attributable to 344 KEGG pathway maps. We identified 26 gene paralogues (20 of them teleost-specific and 43 splice variants, of which 12 had functional domains missing that were likely to affect their biological function. Many key transcription factors, signaling molecules and structural proteins necessary for myogenesis and muscle growth have been identified. Physiological status affected the

  2. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    Science.gov (United States)

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  3. A parallel reconfigurable platform for efficient sequence alignment ...

    African Journals Online (AJOL)

    Bioinformatics is one of the emerging trends in today's world. The major part of bioinformatics is dealing with DNA. Analysis of DNA requires more memory and high efficient computations to produce accurate outputs. Researchers use various bioinformatics algorithms for sequencing and pattern detection techniques, but still ...

  4. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data.

    Directory of Open Access Journals (Sweden)

    Daniel Ramsköld

    2009-12-01

    Full Text Available The parts of the genome transcribed by a cell or tissue reflect the biological processes and functions it carries out. We characterized the features of mammalian tissue transcriptomes at the gene level through analysis of RNA deep sequencing (RNA-Seq data across human and mouse tissues and cell lines. We observed that roughly 8,000 protein-coding genes were ubiquitously expressed, contributing to around 75% of all mRNAs by message copy number in most tissues. These mRNAs encoded proteins that were often intracellular, and tended to be involved in metabolism, transcription, RNA processing or translation. In contrast, genes for secreted or plasma membrane proteins were generally expressed in only a subset of tissues. The distribution of expression levels was broad but fairly continuous: no support was found for the concept of distinct expression classes of genes. Expression estimates that included reads mapping to coding exons only correlated better with qRT-PCR data than estimates which also included 3' untranslated regions (UTRs. Muscle and liver had the least complex transcriptomes, in that they expressed predominantly ubiquitous genes and a large fraction of the transcripts came from a few highly expressed genes, whereas brain, kidney and testis expressed more complex transcriptomes with the vast majority of genes expressed and relatively small contributions from the most expressed genes. mRNAs expressed in brain had unusually long 3'UTRs, and mean 3'UTR length was higher for genes involved in development, morphogenesis and signal transduction, suggesting added complexity of UTR-based regulation for these genes. Our results support a model in which variable exterior components feed into a large, densely connected core composed of ubiquitously expressed intracellular proteins.

  5. De novo transcriptome sequencing and analysis of the juvenile and adult stages of Fasciola gigantica.

    Science.gov (United States)

    Zhang, Xiao-Xuan; Cong, Wei; Elsheikha, Hany M; Liu, Guo-Hua; Ma, Jian-Gang; Huang, Wei-Yi; Zhao, Quan; Zhu, Xing-Quan

    2017-07-01

    Fasciola gigantica is regarded as the major liver fluke causing fasciolosis in livestock in tropical countries. Despite the significant economic and public health impacts of F. gigantica there are few studies on the pathogenesis of this parasite and our understanding is further limited by the lack of genome and transcriptome information. In this study, de novo Illumina RNA sequencing (RNA-seq) was performed to obtain a comprehensive transcriptome profile of the juvenile (42days post infection) and adult stages of F. gigantica. A total of 49,720 unigenes were produced from juvenile and adult stages of F. gigantica, with an average length of 1286 nucleotides (nt) and N50 of 2076nt. A total of 27,862 (56.03%) unigenes were annotated by BLAST similarity searches against the NCBI non-redundant protein database. Because F. gigantica needs to feed and/or digest host tissues, some proteases (including cysteine proteases and aspartic proteases), which play a role in the degradation of host tissues (protein), have been paid more attention in the present study. A total of 6511 distinct genes were found differentially expressed between juveniles and adults, of which 3993 genes were up-regulated and 2518 genes were down-regulated in adults versus juveniles, respectively. Moreover, stage-specific differentially expressed genes were identified in juvenile (17,009) and adult (6517) F. gigantica. The significantly divergent pathways of differentially expressed genes included cAMP signaling pathway (226; 4.12%), proteoglycans in cancer (256; 4.67%) and focal adhesion (199; 3.63%). The transcription pattern also revealed two egg-laying-associated pathways: cGMP-PKG signaling pathway and TGF-β signaling pathway. This study provides the first comparative transcriptomic data concerning juvenile and adult stages of F. gigantica that will be of great value for future research efforts into understanding parasite pathogenesis and developing vaccines against this important parasite

  6. Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

    Directory of Open Access Journals (Sweden)

    Xiaobai Li

    Full Text Available Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs, 41,690 into 58 gene ontology (GO terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium.

  7. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L..

    Directory of Open Access Journals (Sweden)

    Nan Fu

    Full Text Available BACKGROUND: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. PRINCIPAL FINDINGS: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI non-redundant protein database (Nr and Swiss-Prot database respectively, and 10,473 (24.77% unigenes were assigned to Clusters of Orthologous Groups (COG. 21,126 (49.97% unigenes harboring Interpro domains were annotated, in which 15,409 (36.45% were assigned to Gene Ontology(GO categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG. Large numbers of simple sequence repeats (SSRs were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. CONCLUSIONS: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

  8. Transcriptome analysis of the model protozoan, Tetrahymena thermophila, using Deep RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Jie Xiong

    Full Text Available BACKGROUND: The ciliated protozoan Tetrahymena thermophila is a well-studied single-celled eukaryote model organism for cellular and molecular biology. However, the lack of extensive T. thermophila cDNA libraries or a large expressed sequence tag (EST database limited the quality of the original genome annotation. METHODOLOGY/PRINCIPAL FINDINGS: This RNA-seq study describes the first deep sequencing analysis of the T. thermophila transcriptome during the three major stages of the life cycle: growth, starvation and conjugation. Uniquely mapped reads covered more than 96% of the 24,725 predicted gene models in the somatic genome. More than 1,000 new transcribed regions were identified. The great dynamic range of RNA-seq allowed detection of a nearly six order-of-magnitude range of measurable gene expression orchestrated by this cell. RNA-seq also allowed the first prediction of transcript untranslated regions (UTRs and an updated (larger size estimate of the T. thermophila transcriptome: 57 Mb, or about 55% of the somatic genome. Our study identified nearly 1,500 alternative splicing (AS events distributed over 5.2% of T. thermophila genes. This percentage represents a two order-of-magnitude increase over previous EST-based estimates in Tetrahymena. Evidence of stage-specific regulation of alternative splicing was also obtained. Finally, our study allowed us to completely confirm about 26.8% of the genes originally predicted by the gene finder, to correct coding sequence boundaries and intron-exon junctions for about a third, and to reassign microarray probes and correct earlier microarray data. CONCLUSIONS/SIGNIFICANCE: RNA-seq data significantly improve the genome annotation and provide a fully comprehensive view of the global transcriptome of T. thermophila. To our knowledge, 5.2% of T. thermophila genes with AS is the highest percentage of genes showing AS reported in a unicellular eukaryote. Tetrahymena thus becomes an excellent unicellular

  9. Complete Genome Sequence of Sporisorium scitamineum and Biotrophic Interaction Transcriptome with Sugarcane.

    Directory of Open Access Journals (Sweden)

    Lucas M Taniguti

    Full Text Available Sporisorium scitamineum is a biotrophic fungus responsible for the sugarcane smut, a worldwide spread disease. This study provides the complete sequence of individual chromosomes of S. scitamineum from telomere to telomere achieved by a combination of PacBio long reads and Illumina short reads sequence data, as well as a draft sequence of a second fungal strain. Comparative analysis to previous available sequences of another strain detected few polymorphisms among the three genomes. The novel complete sequence described herein allowed us to identify and annotate extended subtelomeric regions, repetitive elements and the mitochondrial DNA sequence. The genome comprises 19,979,571 bases, 6,677 genes encoding proteins, 111 tRNAs and 3 assembled copies of rDNA, out of our estimated number of copies as 130. Chromosomal reorganizations were detected when comparing to sequences of S. reilianum, the closest smut relative, potentially influenced by repeats of transposable elements. Repetitive elements may have also directed the linkage of the two mating-type loci. The fungal transcriptome profiling from in vitro and from interaction with sugarcane at two time points (early infection and whip emergence revealed that 13.5% of the genes were differentially expressed in planta and particular to each developmental stage. Among them are plant cell wall degrading enzymes, proteases, lipases, chitin modification and lignin degradation enzymes, sugar transporters and transcriptional factors. The fungus also modulates transcription of genes related to surviving against reactive oxygen species and other toxic metabolites produced by the plant. Previously described effectors in smut/plant interactions were detected but some new candidates are proposed. Ten genomic islands harboring some of the candidate genes unique to S. scitamineum were expressed only in planta. RNAseq data was also used to reassure gene predictions.

  10. Deep sequencing of the Camellia chekiangoleosa transcriptome revealed candidate genes for anthocyanin biosynthesis.

    Science.gov (United States)

    Wang, Zhong-Wei; Jiang, Cong; Wen, Qiang; Wang, Na; Tao, Yuan-Yuan; Xu, Li-An

    2014-03-15

    Camellia chekiangoleosa is an important species of genus Camellia. It provides high-quality edible oil and has great ornamental value. The flowers are big and red which bloom between February and March. Flower pigmentation is closely related to the accumulation of anthocyanin. Although anthocyanin biosynthesis has been studied extensively in herbaceous plants, little molecular information on the anthocyanin biosynthesis pathway of C. chekiangoleosa is yet known. In the present study, a cDNA library was constructed to obtain detailed and general data from the flowers of C. chekiangoleosa. To explore the transcriptome of C. chekiangoleosa and investigate genes involved in anthocyanin biosynthesis, a 454 GS FLX Titanium platform was used to generate an EST dataset. About 46,279 sequences were obtained, and 24,593 (53.1%) were annotated. Using Blast search against the AGRIS, 1740 unigenes were found homologous to 599 Arabidopsis transcription factor genes. Based on the transcriptome dataset, nine anthocyanin biosynthesis pathway genes (PAL, CHS1, CHS2, CHS3, CHI, F3H, DFR, ANS, and UFGT) were identified and cloned. The spatio-temporal expression patterns of these genes were also analyzed using quantitative real-time polymerase chain reaction. The study results not only enrich the gene resource but also provide valuable information for further studies concerning anthocyanin biosynthesis. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Characterizing the Genetic Basis for Nicotine Induced Cancer Development: A Transcriptome Sequencing Study.

    Directory of Open Access Journals (Sweden)

    Jasmin H Bavarva

    Full Text Available Nicotine is a known risk factor for cancer development and has been shown to alter gene expression in cells and tissue upon exposure. We used Illumina® Next Generation Sequencing (NGS technology to gain unbiased biological insight into the transcriptome of normal epithelial cells (MCF-10A to nicotine exposure. We generated expression data from 54,699 transcripts using triplicates of control and nicotine stressed cells. As a result, we identified 138 differentially expressed transcripts, including 39 uncharacterized genes. Additionally, 173 transcripts that are primarily associated with DNA replication, recombination, and repair showed evidence for alternative splicing. We discovered the greatest nicotine stress response by HPCAL4 (up-regulated by 4.71 fold and NPAS3 (down-regulated by -2.73 fold; both are genes that have not been previously implicated in nicotine exposure but are linked to cancer. We also discovered significant down-regulation (-2.3 fold and alternative splicing of NEAT1 (lncRNA that may have an important, yet undiscovered regulatory role. Gene ontology analysis revealed nicotine exposure influenced genes involved in cellular and metabolic processes. This study reveals previously unknown consequences of nicotine stress on the transcriptome of normal breast epithelial cells and provides insight into the underlying biological influence of nicotine on normal cells, marking the foundation for future studies.

  12. Transcriptome analysis of the Chinese giant salamander (Andrias davidianus using RNA-sequencing

    Directory of Open Access Journals (Sweden)

    Yong Huang

    2017-12-01

    Full Text Available The Chinese giant salamander (Andrias davidianus is an economically important animal on academic value. However, the genomic information of this species has been less studied. In our study, the transcripts of A. davidianus were obtained by RNA-seq to conduct a transcriptomic analysis. In total 132,912 unigenes were generated with an average length of 690 bp and N50 of 1263 bp by de novo assembly using Trinity software. Using a sequence similarity search against the nine public databases (CDD, KOG, NR, NT, PFAM, Swiss-prot, TrEMBL, GO and KEGG databases, a total of 24,049, 18,406, 36,711, 15,858, 20,500, 27,515, 36,705, 28,879 and 10,958 unigenes were annotated in databases, respectively. Of these, 6323 unigenes were annotated in all database and 39,672 unigenes were annotated in at least one database. Blasted with KEGG pathway, 10,958 unigenes were annotated, and it was divided into 343 categories according to different pathways. In addition, we also identified 29,790 SSRs. This study provided a valuable resource for understanding transcriptomic information of A. davidianus and laid a foundation for further research on functional gene cloning, genomics, genetic diversity analysis and molecular marker exploitation in A. davidianus.

  13. Insight into the transcriptome of Arthrobotrys conoides using high throughput sequencing.

    Science.gov (United States)

    Ramesh, Pandit; Reena, Patel; Amitbikram, Mohapatra; Chaitanya, Joshi; Anju, Kunjadia

    2015-12-01

    Arthrobotrys conoides is a nematode-trapping fungus belonging to Orbiliales, Ascomycota group, and traps prey nematodes by means of adhesive network. Fungus has a potential to be used as a biocontrol agent against plant parasitic nematodes. In the present study, we characterized the transcriptome of A. conoides using high-throughput sequencing technology and characterized its virulence unigenes. Total 7,255 cDNA contigs with an average length of 425 bp were generated and 6184 (61.81%) transcripts were functionally annotated and characterized. Majority of unigenes were found analogous to the genes of plant pathogenic fungi. A total of 1749 transcripts were found to be orthologous with eukaryotic proteins of KOG database. Several carbohydrate active enzymes and peptidases were identified. We also analyzed classically and nonclassically secreted proteins and confirmed by BLASTP against fungal secretome database. A total of 916 contigs were analogous to 556 unique proteins of Pathogen Host Interaction (PHI) database. Further, we identified 91 unigenes homologous to the database of fungal virulence factor (DFVF). A total of 104 putative protein kinases coding transcripts were identified by BLASTP against KinBase database, which are major players in signaling pathways. This study provides a comprehensive look at the transcriptome of A. conoides and the identified unigenes might have a role in catching and killing prey nematodes by A. conoides. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Development of genic SSR markers from transcriptome sequencing of pear buds.

    Science.gov (United States)

    Yue, Xiao-yan; Liu, Guo-qin; Zong, Yu; Teng, Yuan-wen; Cai, Dan-ying

    2014-04-01

    A total of 8375 genic simple sequence repeat (SSR) loci were discovered from a unigene set assembled from 116282 transcriptomic unigenes in this study. Dinucleotide repeat motifs were the most common with a frequency of 65.11%, followed by trinucleotide (32.81%). A total of 4100 primer pairs were designed from the SSR loci. Of these, 343 primer pairs (repeat length ≥15 bp) were synthesized with an M13 tail and tested for stable amplification and polymorphism in four Pyrus accessions. After the preliminary test, 104 polymorphic genic SSR markers were developed; dinucleotide and trinucleotide repeats represented 97.11% (101) of these. Twenty-eight polymorphic genic SSR markers were selected randomly to further validate genetic diversity among 28 Pyrus accessions. These markers displayed a high level of polymorphism. The number of alleles at these SSR loci ranged from 2 to 17, with a mean of 9.43 alleles per locus, and the polymorphism information content (PIC) values ranged from 0.26 to 0.91. The UPGMA (unweighted pair-group method with arithmetic average) cluster analysis grouped the 28 Pyrus accessions into two groups: Oriental pears and Occidental pears, which are congruent to the traditional taxonomy, demonstrating their effectiveness in analyzing Pyrus phylogenetic relationships, enriching rare Pyrus EST-SSR resources, and confirming the potential value of a pear transcriptome database for the development of new SSR markers.

  15. De novo assembly and characterization of the garlic (Allium sativum) bud transcriptome by Illumina sequencing.

    Science.gov (United States)

    Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi

    2012-10-01

    Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.

  16. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

    Science.gov (United States)

    Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

    2010-12-15

    Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  17. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi

    Directory of Open Access Journals (Sweden)

    Huynen Leon

    2010-12-01

    Full Text Available Abstract Background Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Results Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. Conclusions The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  18. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    Science.gov (United States)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less

  19. RNA sequencing atopic dermatitis transcriptome profiling provides insights into novel disease mechanisms with potential therapeutic implications

    DEFF Research Database (Denmark)

    Suárez-Fariñas, Mayte; Ungar, Benjamin; Correa da Rosa, Joel

    2015-01-01

    . These limitations might be lessened with next-generation RNA sequencing (RNA-seq). Objective: We sought to define the lesional AD transcriptome using RNA-seq and compare it using microarrays performed on the same cohort. Methods: RNA-seq and microarrays were performed to identify differentially expressed genes...... RNA-seq showed somewhat better agreement with RT-PCR (intraclass correlation coefficient, 0.57 and 0.70 for microarrays and RNA-seq vs RT-PCR, respectively), bias was not eliminated. Among genes uniquely identified by using RNA-seq were triggering receptor expressed on myeloid cells 1 (TREM-1......) signaling (eg, CCL2, CCL3, and single immunoglobulin domain IL1R1 related [SIGIRR]) and IL-36 isoform genes. TREM-1 is a surface receptor implicated in innate and adaptive immunity that amplifies infection-related inflammation. Conclusions: This is the first report of a lesional AD phenotype using RNA...

  20. Highly parallel translation of DNA sequences into small molecules.

    Directory of Open Access Journals (Sweden)

    Rebecca M Weisinger

    Full Text Available A large body of in vitro evolution work establishes the utility of biopolymer libraries comprising 10(10 to 10(15 distinct molecules for the discovery of nanomolar-affinity ligands to proteins. Small-molecule libraries of comparable complexity will likely provide nanomolar-affinity small-molecule ligands. Unlike biopolymers, small molecules can offer the advantages of cell permeability, low immunogenicity, metabolic stability, rapid diffusion and inexpensive mass production. It is thought that such desirable in vivo behavior is correlated with the physical properties of small molecules, specifically a limited number of hydrogen bond donors and acceptors, a defined range of hydrophobicity, and most importantly, molecular weights less than 500 Daltons. Creating a collection of 10(10 to 10(15 small molecules that meet these criteria requires the use of hundreds to thousands of diversity elements per step in a combinatorial synthesis of three to five steps. With this goal in mind, we have reported a set of mesofluidic devices that enable DNA-programmed combinatorial chemistry in a highly parallel 384-well plate format. Here, we demonstrate that these devices can translate DNA genes encoding 384 diversity elements per coding position into corresponding small-molecule gene products. This robust and efficient procedure yields small molecule-DNA conjugates suitable for in vitro evolution experiments.

  1. Transcriptome sequencing of the naked mole rat (Heterocephalus glaber and identification of hypoxia tolerance genes

    Directory of Open Access Journals (Sweden)

    Bang Xiao

    2017-12-01

    Full Text Available The naked mole rat (NMR; Heterocephalus glaber is a small rodent species found in regions of Kenya, Ethiopia and Somalia. It has a high tolerance for hypoxia and is thus considered one of the most important natural models for studying hypoxia tolerance mechanisms. The various mechanisms underlying the NMR's hypoxia tolerance are beginning to be understood at different levels of organization, and next-generation sequencing methods promise to expand this understanding to the level of gene expression. In this study, we examined the sequence and transcript abundance data of the muscle transcriptome of NMRs exposed to hypoxia using the Illumina HiSeq 2500 system to clarify the possible genomic adaptive responses to the hypoxic underground surroundings. The RNA-seq raw FastQ data were mapped against the NMR genome. We identified 2337 differentially expressed genes (DEGs by comparison of the hypoxic and control groups. Functional annotation of the DEGs by gene ontology (GO analysis revealed enrichment of hypoxia stress-related GO categories, including ‘biological regulation’, ‘cellular process’, ‘ion transport’ and ‘cell-cell signaling’. Enrichment of DEGs in signaling pathways was analyzed against the Kyoto Encyclopedia of Genes and Genomes (KEGG database to identify possible interactions between DEGs. The results revealed significant enrichment of DEGs in focal adhesion, the mitogen-activated protein kinase (MAPK signaling pathway and the glycine, serine and threonine metabolism pathway. Furthermore, inhibition of DEGs (STMN1, MAPK8IP1 and MAPK10 expression induced apoptosis and arrested cell growth in NMR fibroblasts following hypoxia. Thus, this global transcriptome analysis of NMRs can provide an important genetic resource for the study of hypoxia tolerance in mammals. Furthermore, the identified DEGs may provide important molecular targets for biomedical research into therapeutic strategies for stroke and cardiovascular diseases.

  2. Transcriptome Sequencing and Analysis for Culm Elongation of the World's Largest Bamboo (Dendrocalamus sinicus.

    Directory of Open Access Journals (Sweden)

    Kai Cui

    Full Text Available Dendrocalamus sinicus is the world's largest bamboo species with strong woody culms, and known for its fast-growing culms. As an economic bamboo species, it was popularized for multi-functional applications including furniture, construction, and industrial paper pulp. To comprehensively elucidate the molecular processes involved in its culm elongation, Illumina paired-end sequencing was conducted. About 65.08 million high-quality reads were produced, and assembled into 81,744 unigenes with an average length of 723 bp. A total of 64,338 (79% unigenes were annotated for their functions, of which, 56,587 were annotated in the NCBI non-redundant protein database and 35,262 were annotated in the Swiss-Prot database. Also, 42,508 and 21,009 annotated unigenes were allocated to gene ontology (GO categories and clusters of orthologous groups (COG, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG, 33,920 unigenes were assigned to 128 KEGG pathways. Meanwhile, 8,553 simple sequence repeats (SSRs and 81,534 single-nucleotide polymorphism (SNPs were identified, respectively. Additionally, 388 transcripts encoding lignin biosynthesis were detected, among which, 27 transcripts encoding Shikimate O-hydroxycinnamoyltransferase (HCT specifically expressed in D. sinicus when compared to other bamboo species and rice. The phylogenetic relationship between D. sinicus and other plants was analyzed, suggesting functional diversity of HCT unigenes in D. sinicus. We conjectured that HCT might lead to the high lignin content and giant culm. Given that the leaves are not yet formed and culm is covered with sheaths during culm elongation, the existence of photosynthesis of bamboo culm is usually neglected. Surprisedly, 109 transcripts encoding photosynthesis were identified, including photosystem I and II, cytochrome b6/f complex, photosynthetic electron transport and F-type ATPase, and 24 transcripts were characterized

  3. Transcriptome Sequencing and Analysis for Culm Elongation of the World's Largest Bamboo (Dendrocalamus sinicus).

    Science.gov (United States)

    Cui, Kai; Wang, Haiying; Liao, Shengxi; Tang, Qi; Li, Li; Cui, Yongzhong; He, Yuan

    2016-01-01

    Dendrocalamus sinicus is the world's largest bamboo species with strong woody culms, and known for its fast-growing culms. As an economic bamboo species, it was popularized for multi-functional applications including furniture, construction, and industrial paper pulp. To comprehensively elucidate the molecular processes involved in its culm elongation, Illumina paired-end sequencing was conducted. About 65.08 million high-quality reads were produced, and assembled into 81,744 unigenes with an average length of 723 bp. A total of 64,338 (79%) unigenes were annotated for their functions, of which, 56,587 were annotated in the NCBI non-redundant protein database and 35,262 were annotated in the Swiss-Prot database. Also, 42,508 and 21,009 annotated unigenes were allocated to gene ontology (GO) categories and clusters of orthologous groups (COG), respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 33,920 unigenes were assigned to 128 KEGG pathways. Meanwhile, 8,553 simple sequence repeats (SSRs) and 81,534 single-nucleotide polymorphism (SNPs) were identified, respectively. Additionally, 388 transcripts encoding lignin biosynthesis were detected, among which, 27 transcripts encoding Shikimate O-hydroxycinnamoyltransferase (HCT) specifically expressed in D. sinicus when compared to other bamboo species and rice. The phylogenetic relationship between D. sinicus and other plants was analyzed, suggesting functional diversity of HCT unigenes in D. sinicus. We conjectured that HCT might lead to the high lignin content and giant culm. Given that the leaves are not yet formed and culm is covered with sheaths during culm elongation, the existence of photosynthesis of bamboo culm is usually neglected. Surprisedly, 109 transcripts encoding photosynthesis were identified, including photosystem I and II, cytochrome b6/f complex, photosynthetic electron transport and F-type ATPase, and 24 transcripts were characterized as antenna

  4. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    Directory of Open Access Journals (Sweden)

    Kim Jungeun

    2012-11-01

    Full Text Available Abstract Background Roses (Rosa sp., which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO terms, Plant Ontology (PO terms, and MIPS Functional Catalogue (FunCat terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a

  5. Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

    Science.gov (United States)

    Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2013-01-01

    Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799

  6. Transcriptome Sequencing Revealed Significant Alteration of Cortical Promoter Usage and Splicing in Schizophrenia

    Science.gov (United States)

    Wu, Jing Qin; Wang, Xi; Beveridge, Natalie J.; Tooney, Paul A.; Scott, Rodney J.; Carr, Vaughan J.; Cairns, Murray J.

    2012-01-01

    Background While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression. Methodology/Principal Findings The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22) from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDRschizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia. PMID:22558445

  7. [EST-SSR identification, markers development of Ligusticum chuanxiong based on Ligusticum chuanxiong transcriptome sequences].

    Science.gov (United States)

    Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao

    2017-09-01

    Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.

  8. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jonas Binladen

    2007-02-01

    Full Text Available The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources.We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences. Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis.We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%. Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial

  9. ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.

    Science.gov (United States)

    Cai, Yunpeng; Zheng, Wei; Yao, Jin; Yang, Yujie; Mai, Volker; Mao, Qi; Sun, Yijun

    2017-04-01

    The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analysis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then use a new multiple-pair merging criterion to construct clusters in parallel using multiple threads. The new algorithm was tested on the human microbiome project (HMP) dataset, currently one of the largest published microbial 16S rRNA sequence dataset. Our experiment demonstrated that with the power of parallel computing it is now compu- tationally feasible to perform hierarchical clustering analysis of tens of millions of sequences. The software is available at http://www.acsu.buffalo.edu/∼yijunsun/lab/ESPRIT-Forest.html.

  10. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    Directory of Open Access Journals (Sweden)

    Cronn Richard

    2009-12-01

    Full Text Available Abstract Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2, highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling

  11. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    Science.gov (United States)

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Prediction of Toxin Genes from Chinese Yellow Catfish Based on Transcriptomic and Proteomic Sequencing

    Directory of Open Access Journals (Sweden)

    Bing Xie

    2016-04-01

    Full Text Available Fish venom remains a virtually untapped resource. There are so few fish toxin sequences for reference, which increases the difficulty to study toxins from venomous fish and to develop efficient and fast methods to dig out toxin genes or proteins. Here, we utilized Chinese yellow catfish (Pelteobagrus fulvidraco as our research object, since it is a representative species in Siluriformes with its venom glands embedded in the pectoral and dorsal fins. In this study, we set up an in-house toxin database and a novel toxin-discovering protocol to dig out precise toxin genes by combination of transcriptomic and proteomic sequencing. Finally, we obtained 15 putative toxin proteins distributed in five groups, namely Veficolin, Ink toxin, Adamalysin, Za2G and CRISP toxin. It seems that we have developed a novel bioinformatics method, through which we could identify toxin proteins with high confidence. Meanwhile, these toxins can also be useful for comparative studies in other fish and development of potential drugs.

  13. Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles

    Directory of Open Access Journals (Sweden)

    Yanara Marincevic-Zuniga

    2017-08-01

    Full Text Available Abstract Background Structural chromosomal rearrangements that lead to expressed fusion genes are a hallmark of acute lymphoblastic leukemia (ALL. In this study, we performed transcriptome sequencing of 134 primary ALL patient samples to comprehensively detect fusion transcripts. Methods We combined fusion gene detection with genome-wide DNA methylation analysis, gene expression profiling, and targeted sequencing to determine molecular signatures of emerging ALL subtypes. Results We identified 64 unique fusion events distributed among 80 individual patients, of which over 50% have not previously been reported in ALL. Although the majority of the fusion genes were found only in a single patient, we identified several recurrent fusion gene families defined by promiscuous fusion gene partners, such as ETV6, RUNX1, PAX5, and ZNF384, or recurrent fusion genes, such as DUX4-IGH. Our data show that patients harboring these fusion genes displayed characteristic genome-wide DNA methylation and gene expression signatures in addition to distinct patterns in single nucleotide variants and recurrent copy number alterations. Conclusion Our study delineates the fusion gene landscape in pediatric ALL, including both known and novel fusion genes, and highlights fusion gene families with shared molecular etiologies, which may provide additional information for prognosis and therapeutic options in the future.

  14. Development of novel genic microsatellite markers from transcriptome sequencing in sugar maple (Acer saccharum Marsh.).

    Science.gov (United States)

    Harmon, Monica; Lane, Thomas; Staton, Margaret; Coggeshall, Mark V; Best, Teodora; Chen, Chien-Chih; Liang, Haiying; Zembower, Nicole; Drautz-Moses, Daniela I; Hwee, Yap Zhei; Schuster, Stephan C; Schlarbaum, Scott E; Carlson, John E; Gailing, Oliver

    2017-08-08

    Sugar maple (Acer saccharum Marsh.) is a hardwood tree species native to northeastern North America and economically valued for its wood and sap. Yet, few molecular genetic resources have been developed for this species to date. Microsatellite markers have been a useful tool in population genetics, e.g., to monitor genetic variation and to analyze gene flow patterns. The objective of this study is to develop a reference transcriptome and microsatellite markers in sugar maple. A set of 117,861 putative unique transcripts were assembled using 29.2 Gb of RNA sequencing data derived from different tissues and stress treatments. From this set of sequences a total of 1068 microsatellite motifs were identified. Out of 58 genic microsatellite markers tested on a population of 47 sugar maple trees in upper Michigan, 22 amplified well, of which 16 were polymorphic and 6 were monomorphic. Values for expected heterozygosity varied from 0.224 to 0.726 for individual loci. Of the 16 polymorphic markers, 15 exhibited transferability to other Acer L. species. Genic microsatellite markers can be applied to analyze genetic variation in potentially adaptive genes relative to genomic reference markers as a basis for the management of sugar maple genetic resources in the face of climate change.

  15. Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery

    Directory of Open Access Journals (Sweden)

    Benkman Craig W

    2010-03-01

    Full Text Available Abstract Background Massively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp. have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda, genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies. Results We obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs. A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci

  16. Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

    Science.gov (United States)

    Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

    2011-01-01

    Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.

  17. Lactation transcriptomics in the Australian marsupial, Macropus eugenii: transcript sequencing and quantification

    Directory of Open Access Journals (Sweden)

    Whitley Jane C

    2007-11-01

    Full Text Available Abstract Background Lactation is an important aspect of mammalian biology and, amongst mammals, marsupials show one of the most complex lactation cycles. Marsupials, such as the tammar wallaby (Macropus eugenii give birth to a relatively immature newborn and progressive changes in milk composition and milk production regulate early stage development of the young. Results In order to investigate gene expression in the marsupial mammary gland during lactation, a comprehensive set of cDNA libraries was derived from lactating tissues throughout the lactation cycle of the tammar wallaby. A total of 14,837 express sequence tags were produced by cDNA sequencing. Sequence analysis and sequence assembly were used to construct a comprehensive catalogue of mammary transcripts. Sequence data from pregnant and early or late lactating specific cDNA libraries and, data from early or late lactation massively parallel sequencing strategies were combined to analyse the variation of milk protein gene expression during the lactation cycle. Conclusion Results show a steady increase in expression of genes coding for secreted protein during the lactation cycle that is associated with high proportion of transcripts coding for milk proteins. In addition, genes involved in immune function, translation and energy or anabolic metabolism are expressed across the lactation cycle. A number of potential new milk proteins or mammary gland remodelling markers, including noncoding RNAs have been identified.

  18. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    Science.gov (United States)

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  19. Transcriptome sequencing revealed significant alteration of cortical promoter usage and splicing in schizophrenia.

    Directory of Open Access Journals (Sweden)

    Jing Qin Wu

    Full Text Available While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression.The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22 from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDR<0.05. Both types of transcriptional isoforms were exemplified by reads aligned to the neurodevelopmentally significant doublecortin-like kinase 1 (DCLK1 gene.This study provided the first deep and un-biased analysis of schizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia.

  20. Comprehensive transcriptome assembly of Chickpea (Cicer arietinum L. using sanger and next generation sequencing platforms: development and applications.

    Directory of Open Access Journals (Sweden)

    Himabindu Kudapa

    Full Text Available A comprehensive transcriptome assembly of chickpea has been developed using 134.95 million Illumina single-end reads, 7.12 million single-end FLX/454 reads and 139,214 Sanger expressed sequence tags (ESTs from >17 genotypes. This hybrid transcriptome assembly, referred to as Cicer arietinumTranscriptome Assembly version 2 (CaTA v2, available at http://data.comparative-legumes.org/transcriptomes/cicar/lista_cicar-201201, comprising 46,369 transcript assembly contigs (TACs has an N50 length of 1,726 bp and a maximum contig size of 15,644 bp. Putative functions were determined for 32,869 (70.8% of the TACs and gene ontology assignments were determined for 21,471 (46.3%. The new transcriptome assembly was compared with the previously available chickpea transcriptome assemblies as well as to the chickpea genome. Comparative analysis of CaTA v2 against transcriptomes of three legumes - Medicago, soybean and common bean, resulted in 27,771 TACs common to all three legumes indicating strong conservation of genes across legumes. CaTA v2 was also used for identification of simple sequence repeats (SSRs and intron spanning regions (ISRs for developing molecular markers. ISRs were identified by aligning TACs to the Medicago genome, and their putative mapping positions at chromosomal level were identified using transcript map of chickpea. Primer pairs were designed for 4,990 ISRs, each representing a single contig for which predicted positions are inferred and distributed across eight linkage groups. A subset of randomly selected ISRs representing all eight chickpea linkage groups were validated on five chickpea genotypes and showed 20% polymorphism with average polymorphic information content (PIC of 0.27. In summary, the hybrid transcriptome assembly developed and novel markers identified can be used for a variety of applications such as gene discovery, marker-trait association, diversity analysis etc., to advance genetics research and breeding

  1. Elimination of zero sequence circulating current between parallel operating three-level inverters

    DEFF Research Database (Denmark)

    Li, Kai; Wang, Xiaodong; Dong, Zhenhua

    2016-01-01

    In order to suppress the zero sequence circulating currents (ZSCCs) between parallel operating three level voltage source inverters with common AC and DC buses, a common mode voltage reduction PWM (CMVR-PWM) technique and neural point potentials (NPPs) control based method is proposed in this paper...

  2. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    Science.gov (United States)

    Matthew Parks; Richard Cronn; Aaron Liston

    2009-01-01

    We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. We found that 30/33 ingroup nodes resolved wlth > 95-percent bootstrap support; this is a substantial improvement relative...

  3. ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2014-01-01

    -long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency

  4. De novo transcriptome sequencing of two cultivated jute species under salinity stress.

    Directory of Open Access Journals (Sweden)

    Zemao Yang

    Full Text Available Soil salinity, a major environmental stress, reduces agricultural productivity by restricting plant development and growth. Jute (Corchorus spp., a commercially important bast fiber crop, includes two commercially cultivated species, Corchorus capsularis and Corchorus olitorius. We conducted high-throughput transcriptome sequencing of 24 C. capsularis and C. olitorius samples under salt stress and found 127 common differentially expressed genes (DEGs; additionally, 4489 and 492 common DEGs were identified in the root and leaf tissues, respectively, of both Corchorus species. Further, 32, 196, and 11 common differentially expressed transcription factors (DTFs were detected in the leaf, root, or both tissues, respectively. Several Gene Ontology (GO terms were enriched in NY and YY. A Kyoto Encyclopedia of Genes and Genomes analysis revealed numerous DEGs in both species. Abscisic acid and cytokinin signal pathways enriched respectively about 20 DEGs in leaves and roots of both NY and YY. The Ca2+, mitogen-activated protein kinase signaling and oxidative phosphorylation pathways were also found to be related to the plant response to salt stress, as evidenced by the DEGs in the roots of both species. These results provide insight into salt stress response mechanisms in plants as well as a basis for future breeding of salt-tolerant cultivars.

  5. Transcriptome Sequencing in a Tibetan Barley Landrace with High Resistance to Powdery Mildew

    Directory of Open Access Journals (Sweden)

    Xing-Quan Zeng

    2014-01-01

    Full Text Available Hulless barley is an important cereal crop worldwide, especially in Tibet of China. However, this crop is usually susceptible to powdery mildew caused by Blumeria graminis f. sp. hordei. In this study, we aimed to understand the functions and pathways of genes involved in the disease resistance by transcriptome sequencing of a Tibetan barley landrace with high resistance to powdery mildew. A total of 831 significant differentially expressed genes were found in the infected seedlings, covering 19 functions. Either “cell,” “cell part,” and “extracellular region” in the cellular component category or “binding” and “catalytic” in the category of molecular function as well as “metabolic process” and “cellular process” in the biological process category together demonstrated that these functions may be involved in the resistance to powdery mildew of the hulless barley. In addition, 330 KEGG pathways were found using BLASTx with an E-value cut-off of <10−5. Among them, three pathways, namely, “photosynthesis,” “plant-pathogen interaction,” and “photosynthesis-antenna proteins” had significant matches in the database. Significant expressions of the three pathways were detected at 24 h, 48 h, and 96 h after infection, respectively. These results indicated a complex process of barley response to powdery mildew infection.

  6. Retinal transcriptome sequencing sheds light on the adaptation to nocturnal and diurnal lifestyles in raptors.

    Science.gov (United States)

    Wu, Yonghua; Hadly, Elizabeth A; Teng, Wenjia; Hao, Yuyang; Liang, Wei; Liu, Yu; Wang, Haitao

    2016-09-20

    Owls (Strigiformes) represent a fascinating group of birds that are the ecological night-time counterparts to diurnal raptors (Accipitriformes). The nocturnality of owls, unusual within birds, has favored an exceptional visual system that is highly tuned for hunting at night, yet the molecular basis for this adaptation is lacking. Here, using a comparative evolutionary analysis of 120 vision genes obtained by retinal transcriptome sequencing, we found strong positive selection for low-light vision genes in owls, which contributes to their remarkable nocturnal vision. Not surprisingly, we detected gene loss of the violet/ultraviolet-sensitive opsin (SWS1) in all owls we studied, but two other color vision genes, the red-sensitive LWS and the blue-sensitive SWS2, were found to be under strong positive selection, which may be linked to the spectral tunings of these genes toward maximizing photon absorption in crepuscular conditions. We also detected the only other positively selected genes associated with motion detection in falcons and positively selected genes associated with bright-light vision and eye protection in other diurnal raptors (Accipitriformes). Our results suggest the adaptive evolution of vision genes reflect differentiated activity time and distinct hunting behaviors.

  7. De Novo Sequencing and Assembly Analysis of Transcriptome in Pinus bungeana Zucc. ex Endl.

    Directory of Open Access Journals (Sweden)

    Qifei Cai

    2018-03-01

    Full Text Available To enrich the molecular data of Pinus bungeana Zucc. ex Endl. and study the regulating factors of different morphology controled by apical dominance. In this study, de novo assembly of transcriptome annotation was performed for two varieties of Pinus bungeana Zucc. ex Endl. that are obviously different in morphology. More than 147 million reads were produced, which were assembled into 88,092 unigenes. Based on a similarity search, 11,692 unigenes showed significant similarity to proteins from Picea sitchensis (Bong. Carr. From this collection of unigenes, a large number of molecular markers were identified, including 2829 simple sequence repeats (SSRs. A total of 158 unigenes expressed differently between two varieties, including 98 up-regulated and 60 down-regulated unigenes. Furthermore, among the differently expressed genes (DEGs, five genes which may impact the plant morphology were further validated by reverse transcription quantitative polymerase chain reaction (RT-qPCR. The five genes related to cytokinin oxidase/dehydrogenase (CKX, two-component response regulator ARR-A family (ARR-A, plant hormone signal transduction (AHP, and MADS-box transcription factors have a close relationship with apical dominance. This new dataset will be a useful resource for future genetic and genomic studies in Pinus bungeana Zucc. ex Endl.

  8. Transcriptome sequencing and metabolite analysis reveals the role of delphinidin metabolism in flower colour in grape hyacinth

    OpenAIRE

    Lou, Qian; Liu, Yali; Qi, Yinyan; Jiao, Shuzhen; Tian, Feifei; Jiang, Ling; Wang, Yuejin

    2014-01-01

    Grape hyacinth (Muscari) is an important ornamental bulbous plant with an extraordinary blue colour. Muscari armeniacum, whose flowers can be naturally white, provides an opportunity to unravel the complex metabolic networks underlying certain biochemical traits, especially colour. A blue flower cDNA library of M. armeniacum and a white flower library of M. armeniacum f. album were used for transcriptome sequencing. A total of 89 926 uni-transcripts were isolated, 143 of which could be identi...

  9. A simple method for the parallel deep sequencing of full influenza A genomes

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen

    2011-01-01

    Given the major threat of influenza A to human and animal health, and its ability to evolve rapidly through mutation and reassortment, tools that enable its timely characterization are necessary to help monitor its evolution and spread. For this purpose, deep sequencing can be a very valuable tool....... This study reports a comprehensive method that enables deep sequencing of the complete genomes of influenza A subtypes using the Illumina Genome Analyzer IIx (GAIIx). By using this method, the complete genomes of nine viruses were sequenced in parallel, representing the 2009 pandemic H1N1 virus, H5N1 virus...

  10. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  11. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

    Science.gov (United States)

    Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  12. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    Science.gov (United States)

    Meena, Seema; Kumar, Sarma R.; Venkata Rao, D. K.; Dwivedi, Varun; Shilpashree, H. B.; Rastogi, Shubhra; Shasany, Ajit K.; Nagegowda, Dinesh A.

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768

  13. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

    Science.gov (United States)

    Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

    2017-08-09

    It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

  14. In-Depth Transcriptome Sequencing of Mexican Lime Trees Infected with Candidatus Phytoplasma aurantifolia.

    Science.gov (United States)

    Mardi, Mohsen; Karimi Farsad, Laleh; Gharechahi, Javad; Salekdeh, Ghasem Hosseini

    2015-01-01

    Witches' broom disease of acid lime greatly affects the production of Mexican lime in Iran. It is caused by a phytoplasma (Candidatus Phytoplasma aurantifolia). However, the molecular mechanisms that underlie phytoplasma pathogenicity and the mode of interactions with host plants are largely unknown. Here, high-throughput transcriptome sequencing was conducted to explore gene expression signatures associated with phytoplasma infection in Mexican lime trees. We assembled 78,185 unique transcript sequences (unigenes) with an average length of 530 nt. Of these, 41,805 (53.4%) were annotated against the NCBI non-redundant (nr) protein database using a BLASTx search (e-value ≤ 1e-5). When the abundances of unigenes in healthy and infected plants were compared, 2,805 transcripts showed significant differences (false discovery rate ≤ 0.001 and log2 ratio ≥ 1.5). These differentially expressed genes (DEGs) were significantly enriched in 43 KEGG metabolic and regulatory pathways. The up-regulated DEGs were mainly categorized into pathways with possible implication in plant-pathogen interaction, including cell wall biogenesis and degradation, sucrose metabolism, secondary metabolism, hormone biosynthesis and signalling, amino acid and lipid metabolism, while down-regulated DEGs were predominantly enriched in ubiquitin proteolysis and oxidative phosphorylation pathways. Our analysis provides novel insight into the molecular pathways that are deregulated during the host-pathogen interaction in Mexican lime trees infected by phytoplasma. The findings can be valuable for unravelling the molecular mechanisms of plant-phytoplasma interactions and can pave the way for engineering lime trees with resistance to witches' broom disease.

  15. Peripheral blood transcriptome sequencing reveals rejection-relevant genes in long-term heart transplantation.

    Science.gov (United States)

    Chen, Yan; Zhang, Haibo; Xiao, Xue; Jia, Yixin; Wu, Weili; Liu, Licheng; Jiang, Jun; Zhu, Baoli; Meng, Xu; Chen, Weijun

    2013-10-03

    Peripheral blood-based gene expression patterns have been investigated as biomarkers to monitor the immune system and rule out rejection after heart transplantation. Recent advances in the high-throughput deep sequencing (HTS) technologies provide new leads in transcriptome analysis. By performing Solexa/Illumina's digital gene expression (DGE) profiling, we analyzed gene expression profiles of PBMCs from 6 quiescent (grade 0) and 6 rejection (grade 2R&3R) heart transplant recipients at more than 6 months after transplantation. Subsequently, quantitative real-time polymerase chain reaction (qRT-PCR) was carried out in an independent validation cohort of 47 individuals from three rejection groups (ISHLT, grade 0,1R, 2R&3R). Through DGE sequencing and qPCR validation, 10 genes were identified as informative genes for detection of cardiac transplant rejection. A further clustering analysis showed that the 10 genes were not only effective for distinguishing patients with acute cardiac allograft rejection, but also informative for discriminating patients with renal allograft rejection based on both blood and biopsy samples. Moreover, PPI network analysis revealed that the 10 genes were connected to each other within a short interaction distance. We proposed a 10-gene signature for heart transplant patients at high-risk of developing severe rejection, which was found to be effective as well in other organ transplant. Moreover, we supposed that these genes function systematically as biomarkers in long-time allograft rejection. Further validation in broad transplant population would be required before the non-invasive biomarkers can be generally utilized to predict the risk of transplant rejection. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  16. Genome and transcriptome sequencing characterises the gene space of Macadamia integrifolia (Proteaceae).

    Science.gov (United States)

    Nock, Catherine J; Baten, Abdul; Barkla, Bronwyn J; Furtado, Agnelo; Henry, Robert J; King, Graham J

    2016-11-17

    The large Gondwanan plant family Proteaceae is an early-diverging eudicot lineage renowned for its morphological, taxonomic and ecological diversity. Macadamia is the most economically important Proteaceae crop and represents an ancient rainforest-restricted lineage. The family is a focus for studies of adaptive radiation due to remarkable species diversification in Mediterranean-climate biodiversity hotspots, and numerous evolutionary transitions between biomes. Despite a long history of research, comparative analyses in the Proteaceae and macadamia breeding programs are restricted by a paucity of genetic information. To address this, we sequenced the genome and transcriptome of the widely grown Macadamia integrifolia cultivar 741. Over 95 gigabases of DNA and RNA-seq sequence data were de novo assembled and annotated. The draft assembly has a total length of 518 Mb and spans approximately 79% of the estimated genome size. Following annotation, 35,337 protein-coding genes were predicted of which over 90% were expressed in at least one of the leaf, shoot or flower tissues examined. Gene family comparisons with five other eudicot species revealed 13,689 clusters containing macadamia genes and 1005 macadamia-specific clusters, and provides evidence for linage-specific expansion of gene families involved in pathogen recognition, plant defense and monoterpene synthesis. Cyanogenesis is an important defense strategy in the Proteaceae, and a detailed analysis of macadamia gene homologues potentially involved in cyanogenic glycoside biosynthesis revealed several highly expressed candidate genes. The gene space of macadamia provides a foundation for comparative genomics, gene discovery and the acceleration of molecular-assisted breeding. This study presents the first available genomic resources for the large basal eudicot family Proteaceae, access to most macadamia genes and opportunities to uncover the genetic basis of traits of importance for adaptation and crop

  17. Transcriptome Analysis of Flower Sex Differentiation in Jatropha curcas L. Using RNA Sequencing.

    Science.gov (United States)

    Xu, Gang; Huang, Jian; Yang, Yong; Yao, Yin-an

    2016-01-01

    Jatropha curcas is thought to be a promising biofuel material, but its yield is restricted by a low ratio of instaminate/staminate flowers (1/10-1/30). Furthermore, valuable information about flower sex differentiation in this plant is scarce. To explore the mechanism of this process in J. curcas, transcriptome profiling of flower development was carried out, and certain genes related with sex differentiation were obtained through digital gene expression analysis of flower buds from different phases of floral development. After Illumina sequencing and clustering, 57,962 unigenes were identified. A total of 47,423 unigenes were annotated, with 85 being related to carpel and stamen differentiation, 126 involved in carpel and stamen development, and 592 functioning in the later development stage for the maturation of staminate or instaminate flowers. Annotation of these genes provided comprehensive information regarding the sex differentiation of flowers, including the signaling system, hormone biosynthesis and regulation, transcription regulation and ubiquitin-mediated proteolysis. A further expression pattern analysis of 15 sex-related genes using quantitative real-time PCR revealed that gibberellin-regulated protein 4-like protein and AMP-activated protein kinase are associated with stamen differentiation, whereas auxin response factor 6-like protein, AGAMOUS-like 20 protein, CLAVATA1, RING-H2 finger protein ATL3J, auxin-induced protein 22D, and r2r3-myb transcription factor contribute to embryo sac development in the instaminate flower. Cytokinin oxidase, Unigene28, auxin repressed-like protein ARP1, gibberellin receptor protein GID1 and auxin-induced protein X10A are involved in both stages mentioned above. In addition to its function in the differentiation and development of the stamens, the gibberellin signaling pathway also functions in embryo sac development for the instaminate flower. The auxin signaling pathway also participates in both stamen development

  18. Discovery of Putative Herbicide Resistance Genes and Its Regulatory Network in Chickpea Using Transcriptome Sequencing

    Directory of Open Access Journals (Sweden)

    Mir A. Iquebal

    2017-06-01

    Full Text Available Background: Chickpea (Cicer arietinum L. contributes 75% of total pulse production. Being cheaper than animal protein, makes it important in dietary requirement of developing countries. Weed not only competes with chickpea resulting into drastic yield reduction but also creates problem of harboring fungi, bacterial diseases and insect pests. Chemical approach having new herbicide discovery has constraint of limited lead molecule options, statutory regulations and environmental clearance. Through genetic approach, transgenic herbicide tolerant crop has given successful result but led to serious concern over ecological safety thus non-transgenic approach like marker assisted selection is desirable. Since large variability in tolerance limit of herbicide already exists in chickpea varieties, thus the genes offering herbicide tolerance can be introgressed in variety improvement programme. Transcriptome studies can discover such associated key genes with herbicide tolerance in chickpea.Results: This is first transcriptomic studies of chickpea or even any legume crop using two herbicide susceptible and tolerant genotypes exposed to imidazoline (Imazethapyr. Approximately 90 million paired-end reads generated from four samples were processed and assembled into 30,803 contigs using reference based assembly. We report 6,310 differentially expressed genes (DEGs, of which 3,037 were regulated by 980 miRNAs, 1,528 transcription factors associated with 897 DEGs, 47 Hub proteins, 3,540 putative Simple Sequence Repeat-Functional Domain Marker (SSR-FDM, 13,778 genic Single Nucleotide Polymorphism (SNP putative markers and 1,174 Indels. Randomly selected 20 DEGs were validated using qPCR. Pathway analysis suggested that xenobiotic degradation related gene, glutathione S-transferase (GST were only up-regulated in presence of herbicide. Down-regulation of DNA replication genes and up-regulation of abscisic acid pathway genes were observed. Study further reveals

  19. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    Science.gov (United States)

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression

  20. Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids.

    Science.gov (United States)

    Ibarra-Laclette, Enrique; Méndez-Bravo, Alfonso; Pérez-Torres, Claudia Anahí; Albert, Victor A; Mockaitis, Keithanne; Kilaru, Aruna; López-Gómez, Rodolfo; Cervantes-Luevano, Jacob Israel; Herrera-Estrella, Luis

    2015-08-13

    Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information. The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening. A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process.

  1. Simultaneous activation of parallel sensory pathways promotes a grooming sequence in Drosophila

    Science.gov (United States)

    Hampel, Stefanie; McKellar, Claire E

    2017-01-01

    A central model that describes how behavioral sequences are produced features a neural architecture that readies different movements simultaneously, and a mechanism where prioritized suppression between the movements determines their sequential performance. We previously described a model whereby suppression drives a Drosophila grooming sequence that is induced by simultaneous activation of different sensory pathways that each elicit a distinct movement (Seeds et al., 2014). Here, we confirm this model using transgenic expression to identify and optogenetically activate sensory neurons that elicit specific grooming movements. Simultaneous activation of different sensory pathways elicits a grooming sequence that resembles the naturally induced sequence. Moreover, the sequence proceeds after the sensory excitation is terminated, indicating that a persistent trace of this excitation induces the next grooming movement once the previous one is performed. This reveals a mechanism whereby parallel sensory inputs can be integrated and stored to elicit a delayed and sequential grooming response. PMID:28887878

  2. De novo assembly and characterization of the transcriptome of seagrass Zostera marina using Illumina paired-end sequencing.

    Directory of Open Access Journals (Sweden)

    Fanna Kong

    Full Text Available BACKGROUND: The seagrass Zostera marina is a monocotyledonous angiosperm belonging to a polyphyletic group of plants that can live submerged in marine habitats. Zostera marina L. is one of the most common seagrasses and is considered a cornerstone of marine plant molecular ecology research and comparative studies. However, the mechanisms underlying its adaptation to the marine environment still remain poorly understood due to limited transcriptomic and genomic data. PRINCIPAL FINDINGS: Here we explored the transcriptome of Z. marina leaves under different environmental conditions using Illumina paired-end sequencing. Approximately 55 million sequencing reads were obtained, representing 58,457 transcripts that correspond to 24,216 unigenes. A total of 14,389 (59.41% unigenes were annotated by blast searches against the NCBI non-redundant protein database. 45.18% and 46.91% of the unigenes had significant similarity with proteins in the Swiss-Prot database and Pfam database, respectively. Among these, 13,897 unigenes were assigned to 57 Gene Ontology (GO terms and 4,745 unigenes were identified and mapped to 233 pathways via functional annotation against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG. We compared the orthologous gene family of the Z. marina transcriptome to Oryza sativa and Pyropia yezoensis and 11,667 orthologous gene families are specific to Z. marina. Furthermore, we identified the photoreceptors sensing red/far-red light and blue light. Also, we identified a large number of genes that are involved in ion transporters and channels including Na+ efflux, K+ uptake, Cl- channels, and H+ pumping. CONCLUSIONS: Our study contains an extensive sequencing and gene-annotation analysis of Z. marina. This information represents a genetic resource for the discovery of genes related to light sensing and salt tolerance in this species. Our transcriptome can be further utilized in future studies on molecular adaptation to

  3. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    Energy Technology Data Exchange (ETDEWEB)

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real

  4. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    Directory of Open Access Journals (Sweden)

    Chen Qi

    2011-02-01

    Full Text Available Abstract Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs. Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010. Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were

  5. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  6. De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum Seedlings and Identification of Salt Tolerance Genes

    Directory of Open Access Journals (Sweden)

    Jiangtao Liu

    2016-03-01

    Full Text Available Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS. We further identified 1663 differentially-expressed genes (DEGs between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO, using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including “oxidation reduction”, “transcription factor activity”, and “ion channel transporter”. Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress.

  7. Transcriptome Sequencing, De Novo Assembly and Differential Gene Expression Analysis of the Early Development of Acipenser baeri.

    Directory of Open Access Journals (Sweden)

    Wei Song

    Full Text Available The molecular mechanisms that drive the development of the endangered fossil fish species Acipenser baeri are difficult to study due to the lack of genomic data. Recent advances in sequencing technologies and the reducing cost of sequencing offer exclusive opportunities for exploring important molecular mechanisms underlying specific biological processes. This manuscript describes the large scale sequencing and analyses of mRNA from Acipenser baeri collected at five development time points using the Illumina Hiseq2000 platform. The sequencing reads were de novo assembled and clustered into 278167 unigenes, of which 57346 (20.62% had 45837 known homologues proteins in Uniprot protein databases while 11509 proteins matched with at least one sequence of assembled unigenes. The remaining 79.38% of unigenes could stand for non-coding unigenes or unigenes specific to A. baeri. A number of 43062 unigenes were annotated into functional categories via Gene Ontology (GO annotation whereas 29526 unigenes were associated with 329 pathways by mapping to KEGG database. Subsequently, 3479 differentially expressed genes were scanned within developmental stages and clustered into 50 gene expression profiles. Genes preferentially expressed at each stage were also identified. Through GO and KEGG pathway enrichment analysis, relevant physiological variations during the early development of A. baeri could be better cognized. Accordingly, the present study gives insights into the transcriptome profile of the early development of A. baeri, and the information contained in this large scale transcriptome will provide substantial references for A. baeri developmental biology and promote its aquaculture research.

  8. RNA sequencing analysis to capture the transcriptome landscape during skin ulceration syndrome progression in sea cucumber Apostichopus japonicus.

    Science.gov (United States)

    Yang, Aifu; Zhou, Zunchun; Pan, Yongjia; Jiang, Jingwei; Dong, Ying; Guan, Xiaoyan; Sun, Hongjuan; Gao, Shan; Chen, Zhong

    2016-06-14

    Sea cucumber Apostichopus japonicus is an important economic species in China, which is affected by various diseases; skin ulceration syndrome (SUS) is the most serious. In this study, we characterized the transcriptomes in A. japonicus challenged with Vibrio splendidus to elucidate the changes in gene expression throughout the three stages of SUS progression. RNA sequencing of 21 cDNA libraries from various tissues and developmental stages of SUS-affected A. japonicus yielded 553 million raw reads, of which 542 million high-quality reads were generated by deep-sequencing using the Illumina HiSeq™ 2000 platform. The reference transcriptome comprised a combination of the Illumina reads, 454 sequencing data and Sanger sequences obtained from the public database to generate 93,163 unigenes (average length, 1,052 bp; N50 = 1,575 bp); 33,860 were annotated. Transcriptome comparisons between healthy and SUS-affected A. japonicus revealed greater differences in gene expression profiles in the body walls (BW) than in the intestines (Int), respiratory trees (RT) and coelomocytes (C). Clustering of expression models revealed stable up-regulation as the main pattern occurring in the BW throughout the three stages of SUS progression. Significantly affected pathways were associated with signal transduction, immune system, cellular processes, development and metabolism. Ninety-two differentially expressed genes (DEGs) were divided into four functional categories: attachment/pathogen recognition (17), inflammatory reactions (38), oxidative stress response (7) and apoptosis (30). Using quantitative real-time PCR, twenty representative DEGs were selected to validate the sequencing results. The Pearson's correlation coefficient (R) of the 20 DEGs ranged from 0.811 to 0.999, which confirmed the consistency and accuracy between these two approaches. Dynamic changes in global gene expression occur during SUS progression in A. japonicus. Elucidation of these changes is important

  9. Next generation sequencing based transcriptome analysis of septic-injury responsive genes in the beetle Tribolium castaneum.

    Directory of Open Access Journals (Sweden)

    Boran Altincicek

    Full Text Available Beetles (Coleoptera are the most diverse animal group on earth and interact with numerous symbiotic or pathogenic microbes in their environments. The red flour beetle Tribolium castaneum is a genetically tractable model beetle species and its whole genome sequence has recently been determined. To advance our understanding of the molecular basis of beetle immunity here we analyzed the whole transcriptome of T. castaneum by high-throughput next generation sequencing technology. Here, we demonstrate that the Illumina/Solexa sequencing approach of cDNA samples from T. castaneum including over 9.7 million reads with 72 base pairs (bp length (approximately 700 million bp sequence information with about 30× transcriptome coverage confirms the expression of most predicted genes and enabled subsequent qualitative and quantitative transcriptome analysis. This approach recapitulates our recent quantitative real-time PCR studies of immune-challenged and naïve T. castaneum beetles, validating our approach. Furthermore, this sequencing analysis resulted in the identification of 73 differentially expressed genes upon immune-challenge with statistical significance by comparing expression data to calculated values derived by fitting to generalized linear models. We identified up regulation of diverse immune-related genes (e.g. Toll receptor, serine proteinases, DOPA decarboxylase and thaumatin and of numerous genes encoding proteins with yet unknown functions. Of note, septic-injury resulted also in the elevated expression of genes encoding heat-shock proteins or cytochrome P450s supporting the view that there is crosstalk between immune and stress responses in T. castaneum. The present study provides a first comprehensive overview of septic-injury responsive genes in T. castaneum beetles. Identified genes advance our understanding of T. castaneum specific gene expression alteration upon immune-challenge in particular and may help to understand beetle immunity

  10. De novo Transcriptome Sequencing Reveals a Considerable Bias in the Incidence of Simple Sequence Repeats towards the Downstream of ‘Pre-miRNAs’ of Black Pepper

    Science.gov (United States)

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176

  11. De novo Sequencing and Analysis of Lemongrass Transcriptome Provides First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    Directory of Open Access Journals (Sweden)

    Seema Meena

    2016-07-01

    Full Text Available Aromatic grasses of the genus Cymbopogon (Poaceae family represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavour, fragrance, cosmetic and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step towards understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases (TPS, pyrophosphatases (PPase, alcohol dehydrogenases (ADH, aldo-keto reductases (AKR, carotenoid cleavage dioxygenases (CCD, alcohol acetyltransferases (AAT and aldehyde dehydrogenases (ALDH, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes and acetates. Molecular modeling and docking further supported the role of identified enzymes in aroma formation in Cymbopogon. Also, simple sequence repeats (SSRs were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  12. De novo Assembly and Characterization of Cajanus scarabaeoides (L. Thouars Transcriptome by Paired-End Sequencing

    Directory of Open Access Journals (Sweden)

    Deepti Nigam

    2017-07-01

    Full Text Available Pigeonpea [Cajanus cajan (L. Millsp.] is a heat and drought resilient legume crop grown mostly in Asia and Africa. Pigeonpea is affected by various biotic (diseases and insect pests and abiotic stresses (salinity and water logging which limit the yield potential of this crop. However, resistance to all these constraints is not readily available in the cultivated genotypes and some of the wild relatives have been found to withstand these resistances. Thus, the utilization of crop wild relatives (CWR in pigeonpea breeding has been effective in conferring resistance, quality and breeding efficiency traits to this crop. Bud and leaf tissue of Cajanus scarabaeoides, a wild relative of pigeon pea were used for transcriptome profiling. Approximately 30 million clean reads filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (3.02 gigabase pairs were generated by Illumina paired-end RNA-seq technology. All of these clean reads were pooled and assembled de novo into 1,17,007 transcripts using the Trinity. Finally, a total of 98,664 unigenes were derived with mean length of 396 bp and N50 values of 1393. The assembly produced significant mapping results (73.68% in BLASTN searches of the Glycine max CDS sequence database (Ensembl. Further, uniprot database of Viridiplantae was used for unigene annotation; 81,799 of 98,664 (82.90% unigenes were finally annotated with gene descriptions or conserved protein domains. Further, a total of 23,475 SSRs were identified in 27,321 unigenes. This data will provide useful information for mining of functionally important genes and SSR markers for pigeonpea improvement.

  13. Differential genomic arrangements in Caryophyllales through deep transcriptome sequencing of A. hypochondriacus.

    Directory of Open Access Journals (Sweden)

    Meeta Sunil

    Full Text Available Genome duplication event in edible dicots under the orders Rosid and Asterid, common during the oligocene period, is missing for species under the order Caryophyllales. Despite this, grain amaranths not only survived this period but display many desirable traits missing in species under rosids and asterids. For example, grain amaranths display traits like C4 photosynthesis, high-lysine seeds, high-yield, drought resistance, tolerance to infection and resilience to stress. It is, therefore, of interest to look for minor genome rearrangements with potential functional implications that are unique to grain amaranths. Here, by deep sequencing and assembly of 16 transcriptomes (86.8 billion bases we have interrogated differential genome rearrangement unique to Amaranthus hypochondriacus with potential links to these phenotypes. We have predicted 125,581 non-redundant transcripts including 44,529 protein coding transcripts identified based on homology to known proteins and 13,529 predicted as novel/amaranth specific coding transcripts. Of the protein coding de novo assembled transcripts, we have identified 1810 chimeric transcripts. More than 30% and 19% of the gene pairs within the chimeric transcripts are found within the same loci in the genomes of A. hypochondriacus and Beta vulgaris respectively and are considered real positives. Interestingly, one of the chimeric transcripts comprises two important genes, namely DHDPS1, a key enzyme implicated in the biosynthesis of lysine, and alpha-glucosidase, an enzyme involved in sucrose catabolism, in close proximity to each other separated by a distance of 612 bases in the genome of A. hypochondriacus in a convergent configuration. We have experimentally validated that transcripts of these two genes are also overlapping in the 3' UTR with their expression negatively correlated from bud to mature seed, suggesting a potential link between the high seed lysine trait and unique genome organization.

  14. Inference of Interactions in Cyanobacterial-Heterotrophic Co-Cultures via Transcriptome Sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Beliaev, Alex S.; Romine, Margaret F.; Serres, Margaret; Bernstein, Hans C.; Linggi, Bryan E.; Markillie, Lye Meng; Isern, Nancy G.; Chrisler, William B.; Kucek, Leo A.; Hill, Eric A.; Pinchuk, Grigoriy; Bryant, Donald A.; Wiley, H. S.; Fredrickson, Jim K.; Konopka, Allan

    2014-04-29

    We employed deep sequencing technology to identify transcriptional adaptation of the euryhaline unicellular cyanobacterium Synechococcus sp. PCC 7002 and the marine facultative aerobe Shewanella putrefaciens W3-18-1 to growth in a co-culture and infer the effect of carbon flux distributions on photoautotroph-heterotroph interactions. The overall transcriptome response of both organisms to co-cultivation was shaped by their respective physiologies and growth constraints. Carbon limitation resulted in the expansion of metabolic capacities which was manifested through the transcriptional upregulation of transport and catabolic pathways. While growth coupling occurred via lactate oxidation or secretion of photosynthetically fixed carbon, there was evidence of specific metabolic interactions between the two organisms. On one hand, the production and excretion of specific amino acids (methionine and alanine) by the cyanobacterium correlated with the putative downregulation of the corresponding biosynthetic machinery of Shewanella W3-18-1. On the other hand, the broad and consistent decrease of mRNA levels for many Fe-regulated Synechococcus 7002 genes during co-cultivation suggested increased Fe availability as well as more facile and energy-efficient mechanisms for Fe acquisition by the cyanobacterium. Furthermore, evidence pointed at potentially novel interactions between oxygenic photoautotrophs and heterotrophs related to the oxidative stress response as transcriptional patterns suggested that Synechococcus 7002 rather than Shewanella W3-18-1 provided scavenging functions for reactive oxygen species under co-culture conditions. This study provides an initial insight into the complexity of photoautotrophic-heterotrophic interactions and brings new perspectives of their role in the robustness and stability of the association.

  15. Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.

    Directory of Open Access Journals (Sweden)

    Hidekane Yoshimura

    Full Text Available Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1% who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%, which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.

  16. Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.

    Science.gov (United States)

    Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-Ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-Ichi

    2014-01-01

    Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.

  17. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison.

    Science.gov (United States)

    Martins, W S; Del Cuvillo, J B; Useche, F J; Theobald, K B; Gao, G R

    2001-01-01

    This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synchronizations and communication with low overheads and by effectively tolerating latency through the overlapping of computation and communication. We have implemented our scheme on EARTH, a fine-grain event-driven multithreaded execution and architecture model which has been ported to a number of parallel machines with off-the-shelf processors. Our experimental results show that the dynamic programming algorithm can be efficiently implemented on EARTH systems with high performance (e.g., speedup of 90 on 120 nodes), good programmability and reasonable cost.

  18. Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

    Science.gov (United States)

    Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

    2013-08-01

    Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.

  19. Molecular diagnosis of glycogen storage disease and disorders with overlapping clinical symptoms by massive parallel sequencing.

    Science.gov (United States)

    Vega, Ana I; Medrano, Celia; Navarrete, Rosa; Desviat, Lourdes R; Merinero, Begoña; Rodríguez-Pombo, Pilar; Vitoria, Isidro; Ugarte, Magdalena; Pérez-Cerdá, Celia; Pérez, Belen

    2016-10-01

    Glycogen storage disease (GSD) is an umbrella term for a group of genetic disorders that involve the abnormal metabolism of glycogen; to date, 23 types of GSD have been identified. The nonspecific clinical presentation of GSD and the lack of specific biomarkers mean that Sanger sequencing is now widely relied on for making a diagnosis. However, this gene-by-gene sequencing technique is both laborious and costly, which is a consequence of the number of genes to be sequenced and the large size of some genes. This work reports the use of massive parallel sequencing to diagnose patients at our laboratory in Spain using either a customized gene panel (targeted exome sequencing) or the Illumina Clinical-Exome TruSight One Gene Panel (clinical exome sequencing (CES)). Sequence variants were matched against biochemical and clinical hallmarks. Pathogenic mutations were detected in 23 patients. Twenty-two mutations were recognized (mostly loss-of-function mutations), including 11 that were novel in GSD-associated genes. In addition, CES detected five patients with mutations in ALDOB, LIPA, NKX2-5, CPT2, or ANO5. Although these genes are not involved in GSD, they are associated with overlapping phenotypic characteristics such as hepatic, muscular, and cardiac dysfunction. These results show that next-generation sequencing, in combination with the detection of biochemical and clinical hallmarks, provides an accurate, high-throughput means of making genetic diagnoses of GSD and related diseases.Genet Med 18 10, 1037-1043.

  20. Massive parallel sequencing in sarcoma pathobiology: state of the art and perspectives.

    Science.gov (United States)

    Brenca, Monica; Maestro, Roberta

    2015-01-01

    Sarcomas are an aggressive and highly heterogeneous group of mesenchymal malignancies with different morphologies and clinical behavior. Current therapeutic strategies remain unsatisfactory. Cytogenetic and molecular characterization of these tumors is resulting in the breakdown of the classical histopathological categories into molecular subgroups that better define sarcoma pathobiology and pave the way to more precise diagnostic criteria and novel therapeutic opportunities. The purpose of this short review is to summarize the state-of-the-art on the exploitation of massive parallel sequencing technologies, also known as next generation sequencing, in the elucidation of sarcoma pathobiology and to discuss how these applications may impact on diagnosis, prognosis and therapy of these tumors.

  1. ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads.

    Directory of Open Access Journals (Sweden)

    Christopher A Miller

    2011-01-01

    Full Text Available Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/.

  2. Parallel algorithms for testing finite state machines:Generating UIO sequences

    OpenAIRE

    Hierons, RM; Turker, UC

    2016-01-01

    This paper describes an efficient parallel algorithm that uses many-core GPUs for automatically deriving Unique Input Output sequences (UIOs) from Finite State Machines. The proposed algorithm uses the global scope of the GPU's global memory through coalesced memory access and minimises the transfer between CPU and GPU memory. The results of experiments indicate that the proposed method yields considerably better results compared to a single core UIO construction algorithm. Our algorithm is s...

  3. BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm.

    Science.gov (United States)

    Loving, Joshua; Hernandez, Yozen; Benson, Gary

    2014-11-15

    Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7-25 times faster than a standard iterative algorithm. Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html. BitPAl is implemented in C and runs on all major operating systems. jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  4. Application of the whole-transcriptome shotgun sequencing approach to the study of Philadelphia-positive acute lymphoblastic leukemia

    International Nuclear Information System (INIS)

    Iacobucci, I; Ferrarini, A; Sazzini, M; Giacomelli, E; Lonetti, A; Xumerle, L; Ferrari, A; Papayannidis, C; Malerba, G; Luiselli, D; Boattini, A; Garagnani, P; Vitale, A; Soverini, S; Pane, F; Baccarani, M; Delledonne, M; Martinelli, G

    2012-01-01

    Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL

  5. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Dan-Dan Wei

    Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying

  6. Sequencing and de novo assembly of the Asian clam (Corbicula fluminea transcriptome using the Illumina GAIIx method.

    Directory of Open Access Journals (Sweden)

    Huihui Chen

    Full Text Available BACKGROUND: The Asian clam (Corbicula fluminea is currently one of the most economically important aquatic species in China and has been used as a test organism in many environmental studies. However, the lack of genomic resources, such as sequenced genome, expressed sequence tags (ESTs and transcriptome sequences has hindered the research on C. fluminea. Recent advances in large-scale RNA-Seq enable generation of genomic resources in a short time, and provide large expression datasets for functional genomic analysis. METHODOLOGY/PRINCIPAL FINDINGS: We used a next-generation high-throughput DNA sequencing technique with an Illumina GAIIx method to analyze the transcriptome from the whole bodies of C. fluminea. More than 62,250,336 high-quality reads were generated based on the raw data, and 134,684 unigenes with a mean length of 791 bp were assembled using the Velvet and Oases software. All of the assembly unigenes were annotated by running BLASTx and BLASTn similarity searches on the Nt, Nr, Swiss-Prot, COG and KEGG databases. In addition, the Clusters of Orthologous Groups (COGs, Gene Ontology (GO terms and Kyoto Encyclopedia of Gene and Genome (KEGG annotations were also assigned to each unigene transcript. To provide a preliminary verification of the assembly and annotation results, and search for potential environmental pollution biomarkers, 15 functional genes (five antioxidase genes, two cytochrome P450 genes, three GABA receptor-related genes and five heat shock protein genes were cloned and identified. Expressions of the 15 selected genes following fluoxetine exposure confirmed that the genes are indeed linked to environmental stress. CONCLUSIONS/SIGNIFICANCE: The C. fluminea transcriptome advances the underlying molecular understanding of this freshwater clam, provides a basis for further exploration of C. fluminea as an environmental test organism and promotes further studies on other bivalve organisms.

  7. Comparative analysis of transcriptomes in aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing

    Directory of Open Access Journals (Sweden)

    Taketo Okada

    2016-12-01

    Full Text Available Ephedra plants are taxonomically classified as gymnosperms, and are medicinally important as the botanical origin of crude drugs and as bioresources that contain pharmacologically active chemicals. Here we show a comparative analysis of the transcriptomes of aerial stems and roots of Ephedra sinica based on high-throughput mRNA sequencing by RNA-Seq. De novo assembly of short cDNA sequence reads generated 23,358, 13,373, and 28,579 contigs longer than 200 bases from aerial stems, roots, or both aerial stems and roots, respectively. The presumed functions encoded by these contig sequences were annotated by BLAST (blastx. Subsequently, these contigs were classified based on gene ontology slims, Enzyme Commission numbers, and the InterPro database. Furthermore, comparative gene expression analysis was performed between aerial stems and roots. These transcriptome analyses revealed differences and similarities between the transcriptomes of aerial stems and roots in E. sinica. Deep transcriptome sequencing of Ephedra should open the door to molecular biological studies based on the entire transcriptome, tissue- or organ-specific transcriptomes, or targeted genes of interest.

  8. Parallel Sequencing of Expressed Sequence Tags from Two Complementary DNA Libraries for High and Low Phosphorus Adaptation in Common Beans

    Directory of Open Access Journals (Sweden)

    Matthew W. Blair

    2011-11-01

    Full Text Available Expressed sequence tags (ESTs have proven useful for gene discovery in many crops. In this work, our objective was to construct complementary DNA (cDNA libraries from root tissues of common beans ( L. grown under low and high P hydroponic conditions and to conduct EST sequencing and comparative analyses of the libraries. Expressed sequence tag analysis of 3648 clones identified 2372 unigenes, of which 1591 were annotated as known genes while a total of 465 unigenes were not associated with any known gene. Unigenes with hits were categorized according to biological processes, molecular function, and cellular compartmentalization. Given the young tissue used to make the root libraries, genes for catalytic activity and binding were highly expressed. Comparisons with previous root EST sequencing and between the two libraries made here resulted in a set of genes to study further for differential gene expression and adaptation to low P, such as a 14 kDa praline-rich protein, a metallopeptidase, tonoplast intrinsic protein, adenosine triphosphate (ATP citrate synthase, and cell proliferation genes expressed in the low P treated plants. Given that common beans are often grown on acid soils of the tropics and subtropics that are usually low in P these genes and the two parallel libraries will be useful for selection for better uptake of this essential macronutrient. The importance of EST generation for common bean root tissues under low P and other abiotic soil stresses is also discussed.

  9. Application of Massively Parallel Sequencing in the Clinical Diagnostic Testing of Inherited Cardiac Conditions

    Directory of Open Access Journals (Sweden)

    Ivone U. S. Leong

    2014-06-01

    Full Text Available Sudden cardiac death in people between the ages of 1–40 years is a devastating event and is frequently caused by several heritable cardiac disorders. These disorders include cardiac ion channelopathies, such as long QT syndrome, catecholaminergic polymorphic ventricular tachycardia and Brugada syndrome and cardiomyopathies, such as hypertrophic cardiomyopathy and arrhythmogenic right ventricular cardiomyopathy. Through careful molecular genetic evaluation of DNA from sudden death victims, the causative gene mutation can be uncovered, and the rest of the family can be screened and preventative measures implemented in at-risk individuals. The current screening approach in most diagnostic laboratories uses Sanger-based sequencing; however, this method is time consuming and labour intensive. The development of massively parallel sequencing has made it possible to produce millions of sequence reads simultaneously and is potentially an ideal approach to screen for mutations in genes that are associated with sudden cardiac death. This approach offers mutation screening at reduced cost and turnaround time. Here, we will review the current commercially available enrichment kits, massively parallel sequencing (MPS platforms, downstream data analysis and its application to sudden cardiac death in a diagnostic environment.

  10. Quantification of massively parallel sequencing libraries - a comparative study of eight methods

    DEFF Research Database (Denmark)

    Hussing, Christian; Kampmann, Marie-Louise; Mogensen, Helle Smidt

    2018-01-01

    Quantification of massively parallel sequencing libraries is important for acquisition of monoclonal beads or clusters prior to clonal amplification and to avoid large variations in library coverage when multiple samples are included in one sequencing analysis. No gold standard for quantification...... estimates followed by Qubit and electrophoresis-based instruments (Bioanalyzer, TapeStation, GX Touch, and Fragment Analyzer), while SYBR Green and TaqMan based qPCR assays gave the lowest estimates. qPCR gave more accurate predictions of sequencing coverage than Qubit and TapeStation did. Costs, time......-consumption, workflow simplicity, and ability to quantify multiple samples are discussed. Technical specifications, advantages, and disadvantages of the various methods are pointed out....

  11. Transcriptome sequencing of Crucihimalaya himalaica (Brassicaceae) reveals how Arabidopsis close relative adapt to the Qinghai-Tibet Plateau

    Science.gov (United States)

    Qiao, Qin; Wang, Qia; Han, Xi; Guan, Yanlong; Sun, Hang; Zhong, Yang; Huang, Jinling; Zhang, Ticao

    2016-02-01

    The extreme environment of the Qinghai-Tibet Plateau (QTP) provides an ideal natural laboratory for studies on adaptive evolution. Few genome/transcriptome based studies have been conducted on how plants adapt to the environments of QTP compared to numerous studies on vertebrates. Crucihimalaya himalaica is a close relative of Arabidopsis with typical QTP distribution, and is hoped to be a new model system to study speciation and ecological adaptation in extreme environment. In this study, we de novo generated a transcriptome sequence of C. himalaica, with a total of 49,438 unigenes. Compared to five relatives, 10,487 orthogroups were shared by all six species, and 4,286 orthogroups contain putative single copy gene. Further analysis identified 487 extremely significantly positively selected genes (PSGs) in C. himalaica transcriptome. Theses PSGs were enriched in functions related to specific adaptation traits, such as response to radiation, DNA repair, nitrogen metabolism, and stabilization of membrane. These functions are responsible for the adaptation of C. himalaica to the high radiation, soil depletion and low temperature environments on QTP. Our findings indicate that C. himalaica has evolved complex strategies for adapting to the extreme environments on QTP and provide novel insights into genetic mechanisms of highland adaptation in plants.

  12. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  13. Transcriptome Characterization for Non-Model Endangered Lycaenids, Protantigius superans and Spindasis takanosis, Using Illumina HiSeq 2500 Sequencing

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    2015-12-01

    Full Text Available The Lycaenidae butterflies, Protantigius superans and Spindasis takanosis, are endangered insects in Korea known for their symbiotic association with ants. However, necessary genomic and transcriptomics data are lacking in these species, limiting conservation efforts. In this study, the P. superans and S. takanosis transcriptomes were deciphered using Illumina HiSeq 2500 sequencing. The P. superans and S. takanosis transcriptome data included a total of 254,340,693 and 245,110,582 clean reads assembled into 159,074 and 170,449 contigs and 107,950 and 121,140 unigenes, respectively. BLASTX hits (E-value of 1.0 × 10−5 against the known protein databases annotated a total of 46,754 and 51,908 transcripts for P. superans and S. takanosis. Approximately 41.25% and 38.68% of the unigenes for P. superans and S. takanosis found homologous sequences in Protostome DB (PANM-DB. BLAST2GO analysis confirmed 18,611 unigenes representing Gene Ontology (GO terms and a total of 5259 unigenes assigned to 116 pathways for P. superans. For S. takanosis, a total of 6697 unigenes were assigned to 119 pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG pathway database. Additionally, 382,164 and 390,516 Simple Sequence Repeats (SSRs were compiled from the unigenes of P. superans and S. takanosis, respectively. This is the first report to record new genes and their utilization for conservation of lycaenid species population and as a reference information for closely related species.

  14. Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Jian [Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Ministry of Agriculture, Maize Research Institute of Sichuan Agricultural University, Wenjiang, Sichuan (China); Luo, Mao [Drug Discovery Research Center of Luzhou Medical College, Luzhou, Sichuan (China); Zhu, Ye; He, Ying; Wang, Qin [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China); Zhang, Chun, E-mail: zc83good@126.com [Department of Pharmacy of Luzhou Medical College, Luzhou, Sichuan (China)

    2015-03-27

    Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.

  15. Whole transcriptome analysis using next-generation sequencing of model species Setaria viridis to support C4 photosynthesis research.

    Science.gov (United States)

    Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang

    2013-09-01

    Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.

  16. Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

    International Nuclear Information System (INIS)

    Gao, Jian; Luo, Mao; Zhu, Ye; He, Ying; Wang, Qin; Zhang, Chun

    2015-01-01

    Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries of untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR

  17. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    Science.gov (United States)

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  18. De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits

    Science.gov (United States)

    Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng

    2017-01-01

    Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar ‘Fusi-3’. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1–6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism. PMID:29145430

  19. De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits.

    Directory of Open Access Journals (Sweden)

    Haisheng Zhu

    Full Text Available Fresh-cut luffa (Luffa cylindrica fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar 'Fusi-3'. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1-6 h. Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism.

  20. Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Cannon Charles H

    2011-07-01

    Full Text Available Abstract Background Acacia auriculiformis × Acacia mangium hybrids are commercially important trees for the timber and pulp industry in Southeast Asia. Increasing pulp yield while reducing pulping costs are major objectives of tree breeding programs. The general monolignol biosynthesis and secondary cell wall formation pathways are well-characterized but genes in these pathways are poorly characterized in Acacia hybrids. RNA-seq on short-read platforms is a rapid approach for obtaining comprehensive transcriptomic data and to discover informative sequence variants. Results We sequenced transcriptomes of A. auriculiformis and A. mangium from non-normalized cDNA libraries synthesized from pooled young stem and inner bark tissues using paired-end libraries and a single lane of an Illumina GAII machine. De novo assembly produced a total of 42,217 and 35,759 contigs with an average length of 496 bp and 498 bp for A. auriculiformis and A. mangium respectively. The assemblies of A. auriculiformis and A. mangium had a total length of 21,022,649 bp and 17,838,260 bp, respectively, with the largest contig 15,262 bp long. We detected all ten monolignol biosynthetic genes using Blastx and further analysis revealed 18 lignin isoforms for each species. We also identified five contigs homologous to R2R3-MYB proteins in other plant species that are involved in transcriptional regulation of secondary cell wall formation and lignin deposition. We searched the contigs against public microRNA database and predicted the stem-loop structures of six highly conserved microRNA families (miR319, miR396, miR160, miR172, miR162 and miR168 and one legume-specific family (miR2086. Three microRNA target genes were predicted to be involved in wood formation and flavonoid biosynthesis. By using the assemblies as a reference, we discovered 16,648 and 9,335 high quality putative Single Nucleotide Polymorphisms (SNPs in the transcriptomes of A. auriculiformis and A. mangium

  1. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

    DEFF Research Database (Denmark)

    Camargo, A A; Samaia, H P; Dias-Neto, E

    2001-01-01

    ,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most...

  2. Detection of arboviruses and other micro-organisms in experimentally infected mosquitoes using massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Sonja Hall-Mendelin

    Full Text Available Human disease incidence attributed to arbovirus infection is increasing throughout the world, with effective control interventions limited by issues of sustainability, insecticide resistance and the lack of effective vaccines. Several promising control strategies are currently under development, such as the release of mosquitoes trans-infected with virus-blocking Wolbachia bacteria. Implementation of any control program is dependent on effective virus surveillance and a thorough understanding of virus-vector interactions. Massively parallel sequencing has enormous potential for providing comprehensive genomic information that can be used to assess many aspects of arbovirus ecology, as well as to evaluate novel control strategies. To demonstrate proof-of-principle, we analyzed Aedes aegypti or Aedes albopictus experimentally infected with dengue, yellow fever or chikungunya viruses. Random amplification was used to prepare sufficient template for sequencing on the Personal Genome Machine. Viral sequences were present in all infected mosquitoes. In addition, in most cases, we were also able to identify the mosquito species and mosquito micro-organisms, including the bacterial endosymbiont Wolbachia. Importantly, naturally occurring Wolbachia strains could be differentiated from strains that had been trans-infected into the mosquito. The method allowed us to assemble near full-length viral genomes and detect other micro-organisms without prior sequence knowledge, in a single reaction. This is a step toward the application of massively parallel sequencing as an arbovirus surveillance tool. It has the potential to provide insight into virus transmission dynamics, and has applicability to the post-release monitoring of Wolbachia in mosquito populations.

  3. Race: A scalable and elastic parallel system for discovering repeats in very long sequences

    KAUST Repository

    Mansour, Essam

    2013-08-26

    A wide range of applications, including bioinformatics, time series, and log analysis, depend on the identification of repetitions in very long sequences. The problem of finding maximal pairs subsumes most important types of repetition-finding tasks. Existing solutions require both the input sequence and its index (typically an order of magnitude larger than the input) to fit in memory. Moreover, they are serial algorithms with long execution time. Therefore, they are limited to small datasets, despite the fact that modern applications demand orders of magnitude longer sequences. In this paper we present RACE, a parallel system for finding maximal pairs in very long sequences. RACE supports parallel execution on stand-alone multicore systems, in addition to scaling to thousands of nodes on clusters or supercomputers. RACE does not require the input or the index to fit in memory; therefore, it supports very long sequences with limited memory. Moreover, it uses a novel array representation that allows for cache-efficient implementation. RACE is particularly suitable for the cloud (e.g., Amazon EC2) because, based on availability, it can scale elastically to more or fewer machines during its execution. Since scaling out introduces overheads, mainly due to load imbalance, we propose a cost model to estimate the expected speedup, based on statistics gathered through sampling. The model allows the user to select the appropriate combination of cloud resources based on the provider\\'s prices and the required deadline. We conducted extensive experimental evaluation with large real datasets and large computing infrastructures. In contrast to existing methods, RACE can handle the entire human genome on a typical desktop computer with 16GB RAM. Moreover, for a problem that takes 10 hours of serial execution, RACE finishes in 28 seconds using 2,048 nodes on an IBM BlueGene/P supercomputer.

  4. A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes

    Science.gov (United States)

    Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche

    2014-01-01

    The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082

  5. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags.

    Directory of Open Access Journals (Sweden)

    Paul A Hohenlohe

    2010-02-01

    Full Text Available Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus. We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP-based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such

  6. De novo assembly and characterization of the spleen transcriptome of common carp (Cyprinus carpio) using Illumina paired-end sequencing.

    Science.gov (United States)

    Li, Guoxi; Zhao, Yinli; Liu, Zhonghu; Gao, Chunsheng; Yan, Fengbin; Liu, Bianzhi; Feng, Jianxin

    2015-06-01

    Common carp (Cyprinus carpio) is one of the most important aquacultured species of the family Cyprinidae, and breeding this species for disease resistance is becoming more and more important. However, at the genome or transcriptome levels, study of the immunogenetics of disease resistance in the common carp is lacking. In this study, 60,316,906 and 75,200,328 paired-end clean reads were obtained from two cDNA libraries of the common carp spleen by Illumina paired-end sequencing technology. Totally, 130,293 unique transcript fragments (unigenes) were assembled, with an average length of 1400.57 bp. Approximately 105,612 (81.06%) unigenes could be annotated according to their homology with matches in the Nr, Nt, Swiss-Prot, COG, GO, or KEGG databases, and they were found to represent 46,747 non-redundant genes. Comparative analysis showed that 59.82% of the unigenes have significant similarity to zebrafish Refseq proteins. Gene expression comparison revealed that 10,432 and 6889 annotated unigenes were, respectively, up- and down-regulated with at least twofold changes between two developmental stages of the common carp spleen. Gene ontology and KEGG analysis were performed to classify all unigenes into functional categories for understanding gene functions and regulation pathways. In addition, 46,847 simple sequence repeats (SSRs) were detected from 35,618 unigenes, and a large number of single nucleotide polymorphism (SNP) and insertion/deletion (INDEL) sites were identified in the spleen transcriptome of common carp. This study has characterized the spleen transcriptome of the common carp for the first time, providing a valuable resource for a better understanding of the common carp immune system and defense mechanisms. This knowledge will also facilitate future functional studies on common carp immunogenetics that may eventually be applied in breeding programs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts

    Directory of Open Access Journals (Sweden)

    Ouyang Shu

    2005-09-01

    Full Text Available Abstract Background The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale. Results All available ESTs and Expressed Transcripts (ETs, 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana, were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices. Conclusion Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.

  8. Detection of reverse transcriptase termination sites using cDNA ligation and massive parallel sequencing

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz J; Boyd, Mette; Sandelin, Albin

    2013-01-01

    Detection of reverse transcriptase termination sites is important in many different applications, such as structural probing of RNAs, rapid amplification of cDNA 5' ends (5' RACE), cap analysis of gene expression, and detection of RNA modifications and protein-RNA cross-links. The throughput...... of these methods can be increased by applying massive parallel sequencing technologies.Here, we describe a versatile method for detection of reverse transcriptase termination sites based on ligation of an adapter to the 3' end of cDNA with bacteriophage TS2126 RNA ligase (CircLigase™). In the following PCR...

  9. Transcriptome characterization and sequencing-based identification of salt-responsive genes in Millettia pinnata, a semi-mangrove plant.

    Science.gov (United States)

    Huang, Jianzi; Lu, Xiang; Yan, Hao; Chen, Shouyi; Zhang, Wanke; Huang, Rongfeng; Zheng, Yizhi

    2012-04-01

    Semi-mangroves form a group of transitional species between glycophytes and halophytes, and hold unique potential for learning molecular mechanisms underlying plant salt tolerance. Millettia pinnata is a semi-mangrove plant that can survive a wide range of saline conditions in the absence of specialized morphological and physiological traits. By employing the Illumina sequencing platform, we generated ~192 million short reads from four cDNA libraries of M. pinnata and processed them into 108,598 unisequences with a high depth of coverage. The mean length and total length of these unisequences were 606 bp and 65.8 Mb, respectively. A total of 54,596 (50.3%) unisequences were assigned Nr annotations. Functional classification revealed the involvement of unisequences in various biological processes related to metabolism and environmental adaptation. We identified 23,815 candidate salt-responsive genes with significantly differential expression under seawater and freshwater treatments. Based on the reverse transcription-polymerase chain reaction (RT-PCR) and real-time PCR analyses, we verified the changes in expression levels for a number of candidate genes. The functional enrichment analyses for the candidate genes showed tissue-specific patterns of transcriptome remodelling upon salt stress in the roots and the leaves. The transcriptome of M. pinnata will provide valuable gene resources for future application in crop improvement. In addition, this study sets a good example for large-scale identification of salt-responsive genes in non-model organisms using the sequencing-based approach.

  10. De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube.

    Science.gov (United States)

    Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2016-01-01

    In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.

  11. ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

    KAUST Repository

    Sahli, Majed

    2014-10-02

    Modern applications, including bioinformatics, time series, and web log analysis, require the extraction of frequent patterns, called motifs, from one very long (i.e., several gigabytes) sequence. Existing approaches are either heuristics that are error-prone, or exact (also called combinatorial) methods that are extremely slow, therefore, applicable only to very small sequences (i.e., in the order of megabytes). This paper presents ACME, a combinatorial approach that scales to gigabyte-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency, because of the related overheads. To this end, ACME introduces an automatic tuning mechanism that suggests the appropriate number of CPUs to utilize, in order to meet the user constraints in terms of run time, while minimizing the financial cost of cloud resources. Our experiments show that, compared to the state of the art, ACME supports three orders of magnitude longer sequences (e.g., DNA for the entire human genome); handles large alphabets (e.g., English alphabet for Wikipedia); scales out to 16,384 CPUs on a supercomputer; and supports elastic deployment in the cloud.

  12. Massively parallel amplicon sequencing reveals isotype-specific variability of antimicrobial peptide transcripts in Mytilus galloprovincialis.

    Directory of Open Access Journals (Sweden)

    Umberto Rosani

    Full Text Available BACKGROUND: Effective innate responses against potential pathogens are essential in the living world and possibly contributed to the evolutionary success of invertebrates. Taken together, antimicrobial peptide (AMP precursors of defensin, mytilin, myticin and mytimycin can represent about 40% of the hemocyte transcriptome in mussels injected with viral-like and bacterial preparations, and unique profiles of myticin C variants are expressed in single mussels. Based on amplicon pyrosequencing, we have ascertained and compared the natural and Vibrio-induced diversity of AMP transcripts in mussel hemocytes from three European regions. METHODOLOGY/PRINCIPAL FINDINGS: Hemolymph was collected from mussels farmed in the coastal regions of Palavas (France, Vigo (Spain and Venice (Italy. To represent the AMP families known in M. galloprovincialis, nine transcript sequences have been selected, amplified from hemocyte RNA and subjected to pyrosequencing. Hemolymph from farmed (offshore and wild (lagoon Venice mussels, both injected with 10(7 Vibrio cells, were similarly processed. Amplicon pyrosequencing emphasized the AMP transcript diversity, with Single Nucleotide Changes (SNC minimal for mytilin B/C and maximal for arthropod-like defensin and myticin C. Ratio of non-synonymous vs. synonymous changes also greatly differed between AMP isotypes. Overall, each amplicon revealed similar levels of nucleotidic variation across geographical regions, with two main sequence patterns confirmed for mytimycin and no substantial changes after immunostimulation. CONCLUSIONS/SIGNIFICANCE: Barcoding and bidirectional pyrosequencing allowed us to map and compare the transcript diversity of known mussel AMPs. Though most of the genuine cds variation was common to the analyzed samples we could estimate from 9 to 106 peptide variants in hemolymph pools representing 100 mussels, depending on the AMP isoform and sampling site. In this study, no prevailing SNC patterns related

  13. Application of massively parallel sequencing to genetic diagnosis in multiplex families with idiopathic sensorineural hearing impairment.

    Directory of Open Access Journals (Sweden)

    Chen-Chi Wu

    Full Text Available Despite the clinical utility of genetic diagnosis to address idiopathic sensorineural hearing impairment (SNHI, the current strategy for screening mutations via Sanger sequencing suffers from the limitation that only a limited number of DNA fragments associated with common deafness mutations can be genotyped. Consequently, a definitive genetic diagnosis cannot be achieved in many families with discernible family history. To investigate the diagnostic utility of massively parallel sequencing (MPS, we applied the MPS technique to 12 multiplex families with idiopathic SNHI in which common deafness mutations had previously been ruled out. NimbleGen sequence capture array was designed to target all protein coding sequences (CDSs and 100 bp of the flanking sequence of 80 common deafness genes. We performed MPS on the Illumina HiSeq2000, and applied BWA, SAMtools, Picard, GATK, Variant Tools, ANNOVAR, and IGV for bioinformatics analyses. Initial data filtering with allele frequencies (0.95 prioritized 5 indels (insertions/deletions and 36 missense variants in the 12 multiplex families. After further validation by Sanger sequencing, segregation pattern, and evolutionary conservation of amino acid residues, we identified 4 variants in 4 different genes, which might lead to SNHI in 4 families compatible with autosomal dominant inheritance. These included GJB2 p.R75Q, MYO7A p.T381M, KCNQ4 p.S680F, and MYH9 p.E1256K. Among them, KCNQ4 p.S680F and MYH9 p.E1256K were novel. In conclusion, MPS allows genetic diagnosis in multiplex families with idiopathic SNHI by detecting mutations in relatively uncommon deafness genes.

  14. Implementation of Amplicon Parallel Sequencing Leads to Improvement of Diagnosis and Therapy of Lung Cancer Patients.

    Science.gov (United States)

    König, Katharina; Peifer, Martin; Fassunke, Jana; Ihle, Michaela A; Künstlinger, Helen; Heydt, Carina; Stamm, Katrin; Ueckeroth, Frank; Vollbrecht, Claudia; Bos, Marc; Gardizi, Masyar; Scheffler, Matthias; Nogova, Lucia; Leenders, Frauke; Albus, Kerstin; Meder, Lydia; Becker, Kerstin; Florin, Alexandra; Rommerscheidt-Fuss, Ursula; Altmüller, Janine; Kloth, Michael; Nürnberg, Peter; Henkel, Thomas; Bikár, Sven-Ernö; Sos, Martin L; Geese, William J; Strauss, Lewis; Ko, Yon-Dschun; Gerigk, Ulrich; Odenthal, Margarete; Zander, Thomas; Wolf, Jürgen; Merkelbach-Bruse, Sabine; Buettner, Reinhard; Heukamp, Lukas C

    2015-07-01

    The Network Genomic Medicine Lung Cancer was set up to rapidly translate scientific advances into early clinical trials of targeted therapies in lung cancer performing molecular analyses of more than 3500 patients annually. Because sequential analysis of the relevant driver mutations on fixated samples is challenging in terms of workload, tissue availability, and cost, we established multiplex parallel sequencing in routine diagnostics. The aim was to analyze all therapeutically relevant mutations in lung cancer samples in a high-throughput fashion while significantly reducing turnaround time and amount of input DNA compared with conventional dideoxy sequencing of single polymerase chain reaction amplicons. In this study, we demonstrate the feasibility of a 102 amplicon multiplex polymerase chain reaction followed by sequencing on an Illumina sequencer on formalin-fixed paraffin-embedded tissue in routine diagnostics. Analysis of a validation cohort of 180 samples showed this approach to require significantly less input material and to be more reliable, robust, and cost-effective than conventional dideoxy sequencing. Subsequently, 2657 lung cancer patients were analyzed. We observed that comprehensive biomarker testing provided novel information in addition to histological diagnosis and clinical staging. In 2657 consecutively analyzed lung cancer samples, we identified driver mutations at the expected prevalence. Furthermore we found potentially targetable DDR2 mutations at a frequency of 3% in both adenocarcinomas and squamous cell carcinomas. Overall, our data demonstrate the utility of systematic sequencing analysis in a clinical routine setting and highlight the dramatic impact of such an approach on the availability of therapeutic strategies for the targeted treatment of individual cancer patients.

  15. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis

    Directory of Open Access Journals (Sweden)

    Thiele Bernhard

    2011-05-01

    Full Text Available Abstract Background Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4 variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Methods Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Results Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%, and defining a minority cutoff of 5%, the results were concordant in all but one isolate. Conclusions The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  16. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.

    Science.gov (United States)

    Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander

    2011-05-13

    Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  17. Prediction of the neuropeptidomes of members of the Astacidea (Crustacea, Decapoda) using publicly accessible transcriptome shotgun assembly (TSA) sequence data.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan

    2015-12-01

    The decapod infraorder Astacidea is comprised of clawed lobsters and freshwater crayfish. Due to their economic importance and their use as models for investigating neurochemical signaling, much work has focused on elucidating their neurochemistry, particularly their peptidergic systems. Interestingly, no astacidean has been the subject of large-scale peptidomic analysis via in silico transcriptome mining, this despite growing transcriptomic resources for members of this taxon. Here, the publicly accessible astacidean transcriptome shotgun assembly data were mined for putative peptide-encoding transcripts; these sequences were used to predict the structures of mature neuropeptides. One hundred seventy-six distinct peptides were predicted for Procambarus clarkii, including isoforms of adipokinetic hormone-corazonin-like peptide (ACP), allatostatin A (AST-A), allatostatin B, allatostatin C (AST-C) bursicon α, bursicon β, CCHamide, crustacean hyperglycemic hormone (CHH)/ion transport peptide (ITP), diuretic hormone 31 (DH31), eclosion hormone (EH), FMRFamide-like peptide, GSEFLamide, intocin, leucokinin, neuroparsin, neuropeptide F, pigment dispersing hormone, pyrokinin, RYamide, short neuropeptide F (sNPF), SIFamide, sulfakinin and tachykinin-related peptide (TRP). Forty-six distinct peptides, including isoforms of AST-A, AST-C, bursicon α, CCHamide, CHH/ITP, DH31, EH, intocin, myosuppressin, neuroparsin, red pigment concentrating hormone, sNPF and TRP, were predicted for Pontastacus leptodactylus, with a bursicon β and a neuroparsin predicted for Cherax quadricarinatus. The identification of ACP is the first from a decapod, while the predictions of CCHamide, EH, GSEFLamide, intocin, neuroparsin and RYamide are firsts for the Astacidea. Collectively, these data greatly expand the catalog of known astacidean neuropeptides and provide a foundation for functional studies of peptidergic signaling in members of this decapod infraorder. Copyright © 2015 Elsevier Inc

  18. De novo transcriptome sequencing of Isaria cateniannulata and comparative analysis of gene expression in response to heat and cold stresses.

    Directory of Open Access Journals (Sweden)

    Dingfeng Wang

    Full Text Available Isaria cateniannulata is a very important and virulent entomopathogenic fungus that infects many insect pest species. Although I. cateniannulata is commonly exposed to extreme environmental temperature conditions, little is known about its molecular response mechanism to temperature stress. Here, we sequenced and de novo assembled the transcriptome of I. cateniannulata in response to high and low temperature stresses using Illumina RNA-Seq technology. Our assembly encompassed 17,514 unigenes (mean length = 1,197 bp, in which 11,445 unigenes (65.34% showed significant similarities to known sequences in NCBI non-redundant protein sequences (Nr database. Using digital gene expression analysis, 4,483 differentially expressed genes (DEGs were identified after heat treatment, including 2,905 up-regulated genes and 1,578 down-regulated genes. Under cold stress, 1,927 DEGs were identified, including 1,245 up-regulated genes and 682 down-regulated genes. The expression patterns of 18 randomly selected candidate DEGs resulting from quantitative real-time PCR (qRT-PCR were consistent with their transcriptome analysis results. Although DEGs were involved in many pathways, we focused on the genes that were involved in endocytosis: In heat stress, the pathway of clathrin-dependent endocytosis (CDE was active; however at low temperature stresses, the pathway of clathrin-independent endocytosis (CIE was active. Besides, four categories of DEGs acting as temperature sensors were observed, including cell-wall-major-components-metabolism-related (CWMCMR genes, heat shock protein (Hsp genes, intracellular-compatible-solutes-metabolism-related (ICSMR genes and glutathione S-transferase (GST. These results enhance our understanding of the molecular mechanisms of I. cateniannulata in response to temperature stresses and provide a valuable resource for the future investigations.

  19. Genes involved in sex pheromone biosynthesis of Ephestia cautella, an important food storage pest, are determined by transcriptome sequencing

    KAUST Repository

    Antony, Binu

    2015-07-18

    Background Insects use pheromones, chemical signals that underlie all animal behaviors, for communication and for attracting mates. Synthetic pheromones are widely used in pest control strategies because they are environmentally safe. The production of insect pheromones in transgenic plants, which could be more economical and effective in producing isomerically pure compounds, has recently been successfully demonstrated. This research requires information regarding the pheromone biosynthetic pathways and the characterization of pheromone biosynthetic enzymes (PBEs). We used Illumina sequencing to characterize the pheromone gland (PG) transcriptome of the Pyralid moth, Ephestia cautella, a destructive storage pest, to reveal putative candidate genes involved in pheromone biosynthesis, release, transport and degradation. Results We isolated the E. cautella pheromone compound as (Z,E)-9,12-tetradecadienyl acetate, and the major pheromone precursors 16:acyl, 14:acyl, E14-16:acyl, E12-14:acyl and Z9,E12-14:acyl. Based on the abundance of precursors, two possible pheromone biosynthetic pathways are proposed. Both pathways initiate from C16:acyl-CoA, with one involving ∆14 and ∆9 desaturation to generate Z9,E12-14:acyl, and the other involving the chain shortening of C16:acyl-CoA to C14:acyl-CoA, followed by ∆12 and ∆9 desaturation to generate Z9,E12-14:acyl-CoA. Then, a final reduction and acetylation generates Z9,E12-14:OAc. Illumina sequencing yielded 83,792 transcripts, and we obtained a PG transcriptome of ~49.5 Mb. A total of 191 PBE transcripts, which included pheromone biosynthesis activating neuropeptides, fatty acid transport proteins, acetyl-CoA carboxylases, fatty acid synthases, desaturases, β-oxidation enzymes, fatty acyl-CoA reductases (FARs) and fatty acetyltransferases (FATs), were selected from the dataset. A comparison of the E. cautella transcriptome data with three other Lepidoptera PG datasets revealed that 45 % of the sequences were shared

  20. Frequency of Usher syndrome type 1 in deaf children by massively parallel DNA sequencing.

    Science.gov (United States)

    Yoshimura, Hidekane; Miyagawa, Maiko; Kumakawa, Kozo; Nishio, Shin-Ya; Usami, Shin-Ichi

    2016-05-01

    Usher syndrome type 1 (USH1) is the most severe of the three USH subtypes due to its profound hearing loss, absent vestibular response and retinitis pigmentosa appearing at a prepubescent age. Six causative genes have been identified for USH1, making early diagnosis and therapy possible through DNA testing. Targeted exon sequencing of selected genes using massively parallel DNA sequencing (MPS) technology enables clinicians to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using MPS along with direct sequence analysis, we screened 227 unrelated non-syndromic deaf children and detected recessive mutations in USH1 causative genes in five patients (2.2%): three patients harbored MYO7A mutations and one each carried CDH23 or PCDH15 mutations. As indicated by an earlier genotype-phenotype correlation study of the CDH23 and PCDH15 genes, we considered the latter two patients to have USH1. Based on clinical findings, it was also highly likely that one patient with MYO7A mutations possessed USH1 due to a late onset age of walking. This first report describing the frequency (1.3-2.2%) of USH1 among non-syndromic deaf children highlights the importance of comprehensive genetic testing for early disease diagnosis.

  1. Deep sequencing-based transcriptome analysis of chicken spleen in response to avian pathogenic Escherichia coli (APEC infection.

    Directory of Open Access Journals (Sweden)

    Qinghua Nie

    Full Text Available Avian pathogenic Escherichia coli (APEC leads to economic losses in poultry production and is also a threat to human health. The goal of this study was to characterize the chicken spleen transcriptome and to identify candidate genes for response and resistance to APEC infection using Solexa sequencing. We obtained 14422935, 14104324, and 14954692 Solexa read pairs for non-challenged (NC, challenged-mild pathology (MD, and challenged-severe pathology (SV, respectively. A total of 148197 contigs and 98461 unigenes were assembled, of which 134949 contigs and 91890 unigenes match the chicken genome. In total, 12272 annotated unigenes take part in biological processes (11664, cellular components (11927, and molecular functions (11963. Summing three specific contrasts, 13650 significantly differentially expressed unigenes were found in NC Vs. MD (6844, NC Vs. SV (7764, and MD Vs. SV (2320. Some unigenes (e.g. CD148, CD45 and LCK were involved in crucial pathways, such as the T cell receptor (TCR signaling pathway and microbial metabolism in diverse environments. This study facilitates understanding of the genetic architecture of the chicken spleen transcriptome, and has identified candidate genes for host response to APEC infection.

  2. RNA Sequencing Analysis Reveals Transcriptomic Variations in Tobacco (Nicotiana tabacum Leaves Affected by Climate, Soil, and Tillage Factors

    Directory of Open Access Journals (Sweden)

    Bo Lei

    2014-04-01

    Full Text Available The growth and development of plants are sensitive to their surroundings. Although numerous studies have analyzed plant transcriptomic variation, few have quantified the effect of combinations of factors or identified factor-specific effects. In this study, we performed RNA sequencing (RNA-seq analysis on tobacco leaves derived from 10 treatment combinations of three groups of ecological factors, i.e., climate factors (CFs, soil factors (SFs, and tillage factors (TFs. We detected 4980, 2916, and 1605 differentially expressed genes (DEGs that were affected by CFs, SFs, and TFs, which included 2703, 768, and 507 specific and 703 common DEGs (simultaneously regulated by CFs, SFs, and TFs, respectively. GO and KEGG enrichment analyses showed that genes involved in abiotic stress responses and secondary metabolic pathways were overrepresented in the common and CF-specific DEGs. In addition, we noted enrichment in CF-specific DEGs related to the circadian rhythm, SF-specific DEGs involved in mineral nutrient absorption and transport, and SF- and TF-specific DEGs associated with photosynthesis. Based on these results, we propose a model that explains how plants adapt to various ecological factors at the transcriptomic level. Additionally, the identified DEGs lay the foundation for future investigations of stress resistance, circadian rhythm and photosynthesis in tobacco.

  3. Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

    Directory of Open Access Journals (Sweden)

    Matthew D. MacManes

    2014-10-01

    Full Text Available As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript—Slc2a9, likely related to desert osmoregulation—undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

  4. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

    Directory of Open Access Journals (Sweden)

    Qian Ding

    2015-01-01

    Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  5. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    Science.gov (United States)

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  6. Population genetics, phylogenomics and hybrid speciation of Juglans in China determined from whole chloroplast genomes, transcriptomes, and genotyping-by-sequencing (GBS)

    Science.gov (United States)

    Peng Zhao; Hui-Juan Zhou; Daniel Potter; Yi-Heng Hu; Xiao-Jia Feng; Meng Dang; Li Feng; Saman Zulfiqar; Wen-Zhe Liu; Gui-Fang Zhao; Keith Woeste

    2018-01-01

    Genomic data are a powerful tool for elucidating the processes involved in the evolution and divergence of species. The speciation and phylogenetic relationships among Chinese Juglans remain unclear. Here, we used results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing (GBS), and whole chloroplast...

  7. Transcriptome sequencing of different narrow-leafed lupin tissue types provides a comprehensive uni-gene assembly and extensive gene-based molecular markers

    Science.gov (United States)

    Kamphuis, Lars G; Hane, James K; Nelson, Matthew N; Gao, Lingling; Atkins, Craig A; Singh, Karam B

    2015-01-01

    Narrow-leafed lupin (NLL; Lupinus angustifolius L.) is an important grain legume crop that is valuable for sustainable farming and is becoming recognized as a human health food. NLL breeding is directed at improving grain production, disease resistance, drought tolerance and health benefits. However, genetic and genomic studies have been hindered by a lack of extensive genomic resources for the species. Here, the generation, de novo assembly and annotation of transcriptome datasets derived from five different NLL tissue types of the reference accession cv. Tanjil are described. The Tanjil transcriptome was compared to transcriptomes of an early domesticated cv. Unicrop, a wild accession P27255, as well as accession 83A:476, together being the founding parents of two recombinant inbred line (RIL) populations. In silico predictions for transcriptome-derived gene-based length and SNP polymorphic markers were conducted and corroborated using a survey assembly sequence for NLL cv. Tanjil. This yielded extensive indel and SNP polymorphic markers for the two RIL populations. A total of 335 transcriptome-derived markers and 66 BAC-end sequence-derived markers were evaluated, and 275 polymorphic markers were selected to genotype the reference NLL 83A:476 × P27255 RIL population. This significantly improved the completeness, marker density and quality of the reference NLL genetic map. PMID:25060816

  8. Formation of Staphylococcus aureus Biofilm in the Presence of Sublethal Concentrations of Disinfectants Studied via a Transcriptomic Analysis Using Transcriptome Sequencing (RNA-seq)

    Science.gov (United States)

    Oppelt, J.; Cincarova, L.

    2017-01-01

    ABSTRACT Staphylococcus aureus is a common biofilm-forming pathogen. Low doses of disinfectants have previously been reported to promote biofilm formation and to increase virulence. The aim of this study was to use transcriptome sequencing (RNA-seq) analysis to investigate global transcriptional changes in S. aureus in response to sublethal concentrations of the commonly used food industry disinfectants ethanol (EtOH) and chloramine T (ChT) and their combination (EtOH_ChT) in order to better understand the effects of these agents on biofilm formation. Treatment with EtOH and EtOH_ChT resulted in more significantly altered expression profiles than treatment with ChT. Our results revealed that EtOH and EtOH_ChT treatments enhanced the expression of genes responsible for regulation of gene expression (sigB), cell surface factors (clfAB), adhesins (sdrDE), and capsular polysaccharides (cap8EFGL), resulting in more intact biofilm. In addition, in this study we were able to identify the pathways involved in the adaptation of S. aureus to the stress of ChT treatment. Further, EtOH suppressed the effect of ChT on gene expression when these agents were used together at sublethal concentrations. These data show that in the presence of sublethal concentrations of tested disinfectants, S. aureus cells trigger protective mechanisms and try to cope with them. IMPORTANCE So far, the effect of disinfectants is not satisfactorily explained. The presented data will allow a better understanding of the mode of disinfectant action with regard to biofilm formation and the ability of bacteria to survive the treatment. Such an understanding could contribute to the effort to eliminate possible sources of bacteria, making disinfectant application as efficient as possible. Biofilm formation plays significant role in the spread and pathogenesis of bacterial species. PMID:29030437

  9. Design of a 9K illumina BeadChip for polar bears (Ursus maritimus) from RAD and transcriptome sequencing.

    Science.gov (United States)

    Malenfant, René M; Coltman, David W; Davis, Corey S

    2015-05-01

    Single-nucleotide polymorphisms (SNPs) offer numerous advantages over anonymous markers such as microsatellites, including improved estimation of population parameters, finer-scale resolution of population structure and more precise genomic dissection of quantitative traits. However, many SNPs are needed to equal the resolution of a single microsatellite, and reliable large-scale genotyping of SNPs remains a challenge in nonmodel species. Here, we document the creation of a 9K Illumina Infinium BeadChip for polar bears (Ursus maritimus), which will be used to investigate: (i) the fine-scale population structure among Canadian polar bears and (ii) the genomic architecture of phenotypic traits in the Western Hudson Bay subpopulation. To this end, we used restriction-site associated DNA (RAD) sequencing from 38 bears across their circumpolar range, as well as blood/fat transcriptome sequencing of 10 individuals from Western Hudson Bay. Six-thousand RAD SNPs and 3000 transcriptomic SNPs were selected for the chip, based primarily on genomic spacing and gene function respectively. Of the 9000 SNPs ordered from Illumina, 8042 were successfully printed, and - after genotyping 1450 polar bears - 5441 of these SNPs were found to be well clustered and polymorphic. Using this array, we show rapid linkage disequilibrium decay among polar bears, we demonstrate that in a subsample of 78 individuals, our SNPs detect known genetic structure more clearly than 24 microsatellites genotyped for the same individuals and that these results are not driven by the SNP ascertainment scheme. Here, we present one of the first large-scale genotyping resources designed for a threatened species. © 2014 John Wiley & Sons Ltd.

  10. De Novo Transcriptome Sequencing in Passiflora edulis Sims to Identify Genes and Signaling Pathways Involved in Cold Tolerance

    Directory of Open Access Journals (Sweden)

    Sian Liu

    2017-11-01

    Full Text Available The passion fruit (Passiflora edulis Sims, also known as the purple granadilla, is widely cultivated as the new darling of the fruit market throughout southern China. This exotic and perennial climber is adapted to warm and humid climates, and thus is generally intolerant of cold. There is limited information about gene regulation and signaling pathways related to the cold stress response in this species. In this study, two transcriptome libraries (KEDU_AP vs. GX_AP were constructed from the aerial parts of cold-tolerant and cold-susceptible varieties of P. edulis, respectively. Overall, 126,284,018 clean reads were obtained, and 86,880 unigenes with a mean size of 1449 bp were assembled. Of these, there were 64,067 (73.74% unigenes with significant similarity to publicly available plant protein sequences. Expression profiles were generated, and 3045 genes were found to be significantly differentially expressed between the KEDU_AP and GX_AP libraries, including 1075 (35.3% up-regulated and 1970 (64.7% down-regulated. These included 36 genes in enriched pathways of plant hormone signal transduction, and 56 genes encoding putative transcription factors. Six genes involved in the ICE1–CBF–COR pathway were induced in the cold-tolerant variety, and their expression levels were further verified using quantitative real-time PCR. This report is the first to identify genes and signaling pathways involved in cold tolerance using high-throughput transcriptome sequencing in P. edulis. These findings may provide useful insights into the molecular mechanisms regulating cold tolerance and genetic breeding in Passiflora spp.

  11. 454 Transcriptome sequencing suggests a role for two-component signalling in cellularization and differentiation of barley endosperm transfer cells.

    Science.gov (United States)

    Thiel, Johannes; Hollmann, Julien; Rutten, Twan; Weber, Hans; Scholz, Uwe; Weschke, Winfriede

    2012-01-01

    Cell specification and differentiation in the endosperm of cereals starts at the maternal-filial boundary and generates the endosperm transfer cells (ETCs). Besides the importance in assimilate transfer, ETCs are proposed to play an essential role in the regulation of endosperm differentiation by affecting development of proximate endosperm tissues. We attempted to identify signalling elements involved in early endosperm differentiation by using a combination of laser-assisted microdissection and 454 transcriptome sequencing. 454 sequencing of the differentiating ETC region from the syncytial state until functionality in transfer processes captured a high proportion of novel transcripts which are not available in existing barley EST databases. Intriguingly, the ETC-transcriptome showed a high abundance of elements of the two-component signalling (TCS) system suggesting an outstanding role in ETC differentiation. All components and subfamilies of the TCS, including distinct kinds of membrane-bound receptors, have been identified to be expressed in ETCs. The TCS system represents an ancient signal transduction system firstly discovered in bacteria and has previously been shown to be co-opted by eukaryotes, like fungi and plants, whereas in animals and humans this signalling route does not exist. Transcript profiling of TCS elements by qRT-PCR suggested pivotal roles for specific phosphorelays activated in a coordinated time flow during ETC cellularization and differentiation. ETC-specificity of transcriptionally activated TCS phosphorelays was assessed for early differentiation and cellularization contrasting to an extension of expression to other grain tissues at the beginning of ETC maturation. Features of candidate genes of distinct phosphorelays and transcriptional activation of genes putatively implicated in hormone signalling pathways hint at a crosstalk of hormonal influences, putatively ABA and ethylene, and TCS signalling. Our findings suggest an integral

  12. Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica.

    Science.gov (United States)

    Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M

    2015-05-15

    The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.

  13. Identification and characterization of large DNA deletions affecting oil quality traits in soybean seeds through transcriptome sequencing analysis.

    Science.gov (United States)

    Goettel, Wolfgang; Ramirez, Martha; Upchurch, Robert G; An, Yong-Qiang Charles

    2016-08-01

    Identification and characterization of a 254-kb genomic deletion on a duplicated chromosome segment that resulted in a low level of palmitic acid in soybean seeds using transcriptome sequencing. A large number of soybean genotypes varying in seed oil composition and content have been identified. Understanding the molecular mechanisms underlying these variations is important for breeders to effectively utilize them as a genetic resource. Through design and application of a bioinformatics approach, we identified nine co-regulated gene clusters by comparing seed transcriptomes of nine soybean genotypes varying in oil composition and content. We demonstrated that four gene clusters in the genotypes M23, Jack and N0304-303-3 coincided with large-scale genome rearrangements. The co-regulated gene clusters in M23 and Jack mapped to a previously described 164-kb deletion and a copy number amplification of the Rhg1 locus, respectively. The coordinately down-regulated gene clusters in N0304-303-3 were caused by a 254-kb deletion containing 19 genes including a fatty acyl-ACP thioesterase B gene (FATB1a). This deletion was associated with reduced palmitic acid content in seeds and was the molecular cause of a previously reported nonfunctional FATB1a allele, fap nc . The M23 and N0304-304-3 deletions were located in duplicated genome segments retained from the Glycine-specific whole genome duplication that occurred 13 million years ago. The homoeologous genes in these duplicated regions shared a strong similarity in both their encoded protein sequences and transcript accumulation levels, suggesting that they may have conserved and important functions in seeds. The functional conservation of homoeologous genes may result in genetic redundancy and gene dosage effects for their associated seed traits, explaining why the large deletion did not cause lethal effects or completely eliminate palmitic acid in N0304-303-3.

  14. Generation and analysis of blueberry transcriptome sequences from leaves, developing fruit, and flower buds from cold acclimation through deacclimation

    Directory of Open Access Journals (Sweden)

    Rowland Lisa J

    2012-04-01

    Full Text Available Abstract Background There has been increased consumption of blueberries in recent years fueled in part because of their many recognized health benefits. Blueberry fruit is very high in anthocyanins, which have been linked to improved night vision, prevention of macular degeneration, anti-cancer activity, and reduced risk of heart disease. Very few genomic resources have been available for blueberry, however. Further development of genomic resources like expressed sequence tags (ESTs, molecular markers, and genetic linkage maps could lead to more rapid genetic improvement. Marker-assisted selection could be used to combine traits for climatic adaptation with fruit and nutritional quality traits. Results Efforts to sequence the transcriptome of the commercial highbush blueberry (Vaccinium corymbosum cultivar Bluecrop and use the sequences to identify genes associated with cold acclimation and fruit development and develop SSR markers for mapping studies are presented here. Transcriptome sequences were generated from blueberry fruit at different stages of development, flower buds at different stages of cold acclimation, and leaves by next-generation Roche 454 sequencing. Over 600,000 reads were assembled into approximately 15,000 contigs and 124,000 singletons. The assembled sequences were annotated and functionally mapped to Gene Ontology (GO terms. Frequency of the most abundant sequences in each of the libraries was compared across all libraries to identify genes that are potentially differentially expressed during cold acclimation and fruit development. Real-time PCR was performed to confirm their differential expression patterns. Overall, 14 out of 17 of the genes examined had differential expression patterns similar to what was predicted from their reads alone. The assembled sequences were also mined for SSRs. From these sequences, 15,886 blueberry EST-SSR loci were identified. Primers were designed from 7,705 of the SSR-containing sequences

  15. Generation and analysis of blueberry transcriptome sequences from leaves, developing fruit, and flower buds from cold acclimation through deacclimation

    Science.gov (United States)

    2012-01-01

    Background There has been increased consumption of blueberries in recent years fueled in part because of their many recognized health benefits. Blueberry fruit is very high in anthocyanins, which have been linked to improved night vision, prevention of macular degeneration, anti-cancer activity, and reduced risk of heart disease. Very few genomic resources have been available for blueberry, however. Further development of genomic resources like expressed sequence tags (ESTs), molecular markers, and genetic linkage maps could lead to more rapid genetic improvement. Marker-assisted selection could be used to combine traits for climatic adaptation with fruit and nutritional quality traits. Results Efforts to sequence the transcriptome of the commercial highbush blueberry (Vaccinium corymbosum) cultivar Bluecrop and use the sequences to identify genes associated with cold acclimation and fruit development and develop SSR markers for mapping studies are presented here. Transcriptome sequences were generated from blueberry fruit at different stages of development, flower buds at different stages of cold acclimation, and leaves by next-generation Roche 454 sequencing. Over 600,000 reads were assembled into approximately 15,000 contigs and 124,000 singletons. The assembled sequences were annotated and functionally mapped to Gene Ontology (GO) terms. Frequency of the most abundant sequences in each of the libraries was compared across all libraries to identify genes that are potentially differentially expressed during cold acclimation and fruit development. Real-time PCR was performed to confirm their differential expression patterns. Overall, 14 out of 17 of the genes examined had differential expression patterns similar to what was predicted from their reads alone. The assembled sequences were also mined for SSRs. From these sequences, 15,886 blueberry EST-SSR loci were identified. Primers were designed from 7,705 of the SSR-containing sequences with adequate flanking

  16. Generation and analysis of blueberry transcriptome sequences from leaves, developing fruit, and flower buds from cold acclimation through deacclimation.

    Science.gov (United States)

    Rowland, Lisa J; Alkharouf, Nadim; Darwish, Omar; Ogden, Elizabeth L; Polashock, James J; Bassil, Nahla V; Main, Dorrie

    2012-04-02

    There has been increased consumption of blueberries in recent years fueled in part because of their many recognized health benefits. Blueberry fruit is very high in anthocyanins, which have been linked to improved night vision, prevention of macular degeneration, anti-cancer activity, and reduced risk of heart disease. Very few genomic resources have been available for blueberry, however. Further development of genomic resources like expressed sequence tags (ESTs), molecular markers, and genetic linkage maps could lead to more rapid genetic improvement. Marker-assisted selection could be used to combine traits for climatic adaptation with fruit and nutritional quality traits. Efforts to sequence the transcriptome of the commercial highbush blueberry (Vaccinium corymbosum) cultivar Bluecrop and use the sequences to identify genes associated with cold acclimation and fruit development and develop SSR markers for mapping studies are presented here. Transcriptome sequences were generated from blueberry fruit at different stages of development, flower buds at different stages of cold acclimation, and leaves by next-generation Roche 454 sequencing. Over 600,000 reads were assembled into approximately 15,000 contigs and 124,000 singletons. The assembled sequences were annotated and functionally mapped to Gene Ontology (GO) terms. Frequency of the most abundant sequences in each of the libraries was compared across all libraries to identify genes that are potentially differentially expressed during cold acclimation and fruit development. Real-time PCR was performed to confirm their differential expression patterns. Overall, 14 out of 17 of the genes examined had differential expression patterns similar to what was predicted from their reads alone. The assembled sequences were also mined for SSRs. From these sequences, 15,886 blueberry EST-SSR loci were identified. Primers were designed from 7,705 of the SSR-containing sequences with adequate flanking sequence. One hundred

  17. Transcriptome sequencing of the blind subterranean mole rat, Spalax galili: Utility and potential for the discovery of novel evolutionary patterns

    KAUST Repository

    Malik, Assaf

    2011-08-12

    The blind subterranean mole rat (Spalax ehrenbergi superspecies) is a model animal for survival under extreme environments due to its ability to live in underground habitats under severe hypoxic stress and darkness. Here we report the transcriptome sequencing of Spalax galili, a chromosomal type of S. ehrenbergi. cDNA pools from muscle and brain tissues isolated from animals exposed to hypoxic and normoxic conditions were sequenced using Sanger, GS FLX, and GS FLX Titanium technologies. Assembly of the sequences yielded over 51,000 isotigs with homology to ~12,000 mouse, rat or human genes. Based on these results, it was possible to detect large numbers of splice variants, SNPs, and novel transcribed regions. In addition, multiple differential expression patterns were detected between tissues and treatments. The results presented here will serve as a valuable resource for future studies aimed at identifying genes and gene regions evolved during the adaptive radiation associated with underground life of the blind mole rat. 2011 Malik et al.

  18. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae.

    Directory of Open Access Journals (Sweden)

    Blake T Hovde

    Full Text Available Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales, is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales, and transcriptome data collected at seven time points over a 24-hour light/dark cycle. The nuclear genome of C. tobin is small (59 Mb, compact (∼ 40% of the genome is protein coding and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two "red" RuBisCO activases that are shared across many algal lineages. The Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes.

  19. Transcriptome sequencing and metabolite analysis reveals the role of delphinidin metabolism in flower colour in grape hyacinth.

    Science.gov (United States)

    Lou, Qian; Liu, Yali; Qi, Yinyan; Jiao, Shuzhen; Tian, Feifei; Jiang, Ling; Wang, Yuejin

    2014-07-01

    Grape hyacinth (Muscari) is an important ornamental bulbous plant with an extraordinary blue colour. Muscari armeniacum, whose flowers can be naturally white, provides an opportunity to unravel the complex metabolic networks underlying certain biochemical traits, especially colour. A blue flower cDNA library of M. armeniacum and a white flower library of M. armeniacum f. album were used for transcriptome sequencing. A total of 89 926 uni-transcripts were isolated, 143 of which could be identified as putative homologues of colour-related genes in other species. Based on a comprehensive analysis relating colour compounds to gene expression profiles, the mechanism of colour biosynthesis was studied in M. armeniacum. Furthermore, a new hypothesis explaining the lack of colour phenotype of the grape hyacinth flower is proposed. Alteration of the substrate competition between flavonol synthase (FLS) and dihydroflavonol 4-reductase (DFR) may lead to elimination of blue pigmentation while the multishunt from the limited flux in the cyanidin (Cy) synthesis pathway seems to be the most likely reason for the colour change in the white flowers of M. armeniacum. Moreover, mass sequence data obtained by the deep sequencing of M. armeniacum and its white variant provided a platform for future function and molecular biological research on M. armeniacum. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  20. De novo transcriptome sequencing and comparative analysis of differentially expressed genes in dryoperis fragrans under temperature stress

    International Nuclear Information System (INIS)

    Wang, W.Z.; Tong, W.S.; Gao, R.

    2016-01-01

    Dryopteris fragrans is a species of fern and contains flavonoids compounds with medicinal value. This study explain the temperature stress impact flavonoids synthesis in D. fragrans tissue culture seedlings under the low temperature at 4 degree C, high temperature at 35 degree C and moderate temperature at 25 degree C. By using Illumina HiSeq 2000 sequencing, 80.9 million raw sequence reads were de novo assembled into 66,716 non-redundant unigenes. 38,486 unigenes (57.7%) were annotated for their function. 13,973 unigenes and 29,598 unigenes were allocated to gene ontology (GO) and clusters of orthologous group (COG), respectively. 18,989 sequences mapped to 118 Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 204 genes were involved in flavonoid biosynthesis, regulation and transport. 25,292 and 16,817 unigenes exhibited marked differential expression in response to temperature shifts of 25 degree C to 4 degree C and 25 degree C to 35 degree C, respectively. 4CL and CHS genes involved in flavonoid biosynthesis were tested and suggested that they were responsible for biosynthesis of flavonoids. This study provides the first published data to describe the D. fragrans transcriptome and should accelerate understanding of flavonoids biosynthesis, regulation and transport mechanisms. Since most unigenes described here were successfully annotated, these results should facilitate future functional genomic understanding and research of D. fragrans. (author)

  1. Transcriptome sequencing and de novo assembly in arecanut, Areca catechu L elucidates the secondary metabolite pathway genes

    Directory of Open Access Journals (Sweden)

    Ramaswamy Manimekalai

    2018-03-01

    Full Text Available Areca catechu L. belongs to the Arecaceae family which comprises many economically important palms. The palm is a source of alkaloids and carotenoids. The lack of ample genetic information in public databases has been a constraint for the genetic improvement of arecanut. To gain molecular insight into the palm, high throughput RNA sequencing and de novo assembly of arecanut leaf transcriptome was undertaken in the present study. A total 56,321,907 paired end reads of 101 bp length consisting of 11.343 Gb nucleotides were generated. De novo assembly resulted in 48,783 good quality transcripts, of which 67% of transcripts could be annotated against NCBI non – redundant database. The Gene Ontology (GO analysis with UniProt database identified 9222 biological process, 11268 molecular function and 7574 cellular components GO terms. Large scale expression profiling through Fragments per Kilobase per Million mapped reads (FPKM showed major genes involved in different metabolic pathways of the plant. Metabolic pathway analysis of the assembled transcripts identified 124 plant related pathways. The transcripts related to carotenoid and alkaloid biosynthetic pathways had more number of reads and FPKM values suggesting higher expression of these genes. The arecanut transcript sequences generated in the study showed high similarity with coconut, oil palm and date palm sequences retrieved from public domains. We also identified 6853 genic SSR regions in the arecanut. The possible primers were designed for SSR detection and this would simplify the future efforts in genetic characterization of arecanut.

  2. All 37 Mitochondrial Genes of Aphid Aphis craccivora Obtained from Transcriptome Sequencing: Implications for the Evolution of Aphids.

    Directory of Open Access Journals (Sweden)

    Nan Song

    Full Text Available The availability of mitochondrial genome data for Aphididae, one of the economically important insect pest families, in public databases is limited. The advent of next generation sequencing technology provides the potential to generate mitochondrial genome data for many species timely and cost-effectively. In this report, we used transcriptome sequencing technology to determine all the 37 mitochondrial genes of the cowpea aphid, Aphis craccivora. This method avoids the necessity of finding suitable primers for long PCRs or primer-walking amplicons, and is proved to be effective in obtaining the whole set of mitochondrial gene data for insects with difficulty in sequencing mitochondrial genome by PCR-based strategies. Phylogenetic analyses of aphid mitochondrial genome data show clustering based on tribe level, and strongly support the monophyly of the family Aphididae. Within the monophyletic Aphidini, three samples from Aphis grouped together. In another major clade of Aphididae, Pterocomma pilosum was recovered as a potential sister-group of Cavariella salicicola, as part of Macrosiphini.

  3. Bifunctional cis-Abienol Synthase from Abies balsamea Discovered by Transcriptome Sequencing and Its Implications for Diterpenoid Fragrance Production*

    Science.gov (United States)

    Zerbe, Philipp; Chiang, Angela; Yuen, Macaire; Hamberger, Björn; Hamberger, Britta; Draper, Jason A.; Britton, Robert; Bohlmann, Jörg

    2012-01-01

    The labdanoid diterpene alcohol cis-abienol is a major component of the aromatic oleoresin of balsam fir (Abies balsamea) and serves as a valuable bioproduct material for the fragrance industry. Using high-throughput 454 transcriptome sequencing and metabolite profiling of balsam fir bark tissue, we identified candidate diterpene synthase sequences for full-length cDNA cloning and functional characterization. We discovered a bifunctional class I/II cis-abienol synthase (AbCAS), along with the paralogous levopimaradiene/abietadiene synthase and isopimaradiene synthase, all of which are members of the gymnosperm-specific TPS-d subfamily. The AbCAS-catalyzed formation of cis-abienol proceeds via cyclization and hydroxylation at carbon C-8 of a postulated carbocation intermediate in the class II active site, followed by cleavage of the diphosphate group and termination of the reaction sequence without further cyclization in the class I active site. This reaction mechanism is distinct from that of synthases of the isopimaradiene- or levopimaradiene/abietadiene synthase type, which employ deprotonation reactions in the class II active site and secondary cyclizations in the class I active site, leading to tricyclic diterpenes. Comparative homology modeling suggested the active site residues Asp-348, Leu-617, Phe-696, and Gly-723 as potentially important for the specificity of AbCAS. As a class I/II bifunctional enzyme, AbCAS is a promising target for metabolic engineering of cis-abienol production. PMID:22337889

  4. Transcriptome sequence analysis of an ornamental plant, Ananas comosus var. bracteatus, revealed the potential unigenes involved in terpenoid and phenylpropanoid biosynthesis.

    Science.gov (United States)

    Ma, Jun; Kanakala, S; He, Yehua; Zhang, Junli; Zhong, Xiaolan

    2015-01-01

    Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis. The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus.

  5. Transcriptome Sequence Analysis of an Ornamental Plant, Ananas comosus var. bracteatus, Revealed the Potential Unigenes Involved in Terpenoid and Phenylpropanoid Biosynthesis

    OpenAIRE

    Ma, Jun; Kanakala, S.; He, Yehua; Zhang, Junli; Zhong, Xiaolan

    2015-01-01

    Background Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. Results The Anana...

  6. Transcriptome sequence analysis of an ornamental plant, Ananas comosus var. bracteatus, revealed the potential unigenes involved in terpenoid and phenylpropanoid biosynthesis.

    Directory of Open Access Journals (Sweden)

    Jun Ma

    Full Text Available Ananas comosus var. bracteatus (Red Pineapple is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies.The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis.The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus.

  7. Transcriptome sequencing of Mycosphaerella fijiensis during association with Musa acuminata reveals candidate pathogenicity genes.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-08-30

    Mycosphaerella fijiensis, causative agent of the black Sigatoka disease of banana, is considered the most economically damaging banana disease. Despite its importance, the genetics of pathogenicity are poorly understood. Previous studies have characterized polyketide pathways with possible roles in pathogenicity. To identify additional candidate pathogenicity genes, we compared the transcriptome of this fungus during the necrotrophic phase of infection with that during saprophytic growth in medium. Transcriptome analysis was conducted, and the functions of differentially expressed genes were predicted by identifying conserved domains, Gene Ontology (GO) annotation and GO enrichment analysis, Carbohydrate-Active EnZymes (CAZy) annotation, and identification of genes encoding effector-like proteins. The analysis showed that genes commonly involved in secondary metabolism have higher expression in infected leaf tissue, including genes encoding cytochrome P450s, short-chain dehydrogenases, and oxidoreductases in the 2-oxoglutarate and Fe(II)-dependent oxygenase superfamily. Other pathogenicity-related genes with higher expression in infected leaf tissue include genes encoding salicylate hydroxylase-like proteins, hydrophobic surface binding proteins, CFEM domain-containing proteins, and genes encoding secreted cysteine-rich proteins characteristic of effectors. More genes encoding amino acid transporters, oligopeptide transporters, peptidases, proteases, proteinases, sugar transporters, and proteins containing Domain of Unknown Function (DUF) 3328 had higher expression in infected leaf tissue, while more genes encoding inhibitors of peptidases and proteinases had higher expression in medium. Sixteen gene clusters with higher expression in leaf tissue were identified including clusters for the synthesis of a non-ribosomal peptide. A cluster encoding a novel fusicoccane was also identified. Two putative dispensable scaffolds were identified with a large proportion of

  8. Integrated massively parallel sequencing of 15 autosomal STRs and Amelogenin using a simplified library preparation approach.

    Science.gov (United States)

    Xue, Jian; Wu, Riga; Pan, Yajiao; Wang, Shunxia; Qu, Baowang; Qin, Ying; Shi, Yuequn; Zhang, Chuchu; Li, Ran; Zhang, Liyan; Zhou, Cheng; Sun, Hongyu

    2018-04-02

    Massively parallel sequencing (MPS) technologies, also termed as next-generation sequencing (NGS), are becoming increasingly popular in study of short tandem repeats (STR). However, current library preparation methods are usually based on ligation or two-round PCR that requires more steps, making it time-consuming (about 2 days), laborious and expensive. In this study, a 16-plex STR typing system was designed with fusion primer strategy based on the Ion Torrent S5 XL platform which could effectively resolve the above challenges for forensic DNA database-type samples (bloodstains, saliva stains, etc.). The efficiency of this system was tested in 253 Han Chinese participants. The libraries were prepared without DNA isolation and adapter ligation, and the whole process only required approximately 5 h. The proportion of thoroughly genotyped samples in which all the 16 loci were successfully genotyped was 86% (220/256). Of the samples, 99.7% showed 100% concordance between NGS-based STR typing and capillary electrophoresis (CE)-based STR typing. The inconsistency might have been caused by off-ladder alleles and mutations in primer binding sites. Overall, this panel enabled the large-scale genotyping of the DNA samples with controlled quality and quantity because it is a simple, operation-friendly process flow that saves labor, time and costs. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    Science.gov (United States)

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  10. A massively parallel sequencing approach uncovers ancient origins and high genetic variability of endangered Przewalski's horses.

    Science.gov (United States)

    Goto, Hiroki; Ryder, Oliver A; Fisher, Allison R; Schultz, Bryant; Kosakovsky Pond, Sergei L; Nekrutenko, Anton; Makova, Kateryna D

    2011-01-01

    The endangered Przewalski's horse is the closest relative of the domestic horse and is the only true wild horse species surviving today. The question of whether Przewalski's horse is the direct progenitor of domestic horse has been hotly debated. Studies of DNA diversity within Przewalski's horses have been sparse but are urgently needed to ensure their successful reintroduction to the wild. In an attempt to resolve the controversy surrounding the phylogenetic position and genetic diversity of Przewalski's horses, we used massively parallel sequencing technology to decipher the complete mitochondrial and partial nuclear genomes for all four surviving maternal lineages of Przewalski's horses. Unlike single-nucleotide polymorphism (SNP) typing usually affected by ascertainment bias, the present method is expected to be largely unbiased. Three mitochondrial haplotypes were discovered-two similar ones, haplotypes I/II, and one substantially divergent from the other two, haplotype III. Haplotypes I/II versus III did not cluster together on a phylogenetic tree, rejecting the monophyly of Przewalski's horse maternal lineages, and were estimated to split 0.117-0.186 Ma, significantly preceding horse domestication. In the phylogeny based on autosomal sequences, Przewalski's horses formed a monophyletic clade, separate from the Thoroughbred domestic horse lineage. Our results suggest that Przewalski's horses have ancient origins and are not the direct progenitors of domestic horses. The analysis of the vast amount of sequence data presented here suggests that Przewalski's and domestic horse lineages diverged at least 0.117 Ma but since then have retained ancestral genetic polymorphism and/or experienced gene flow.

  11. RNA sequencing of the human milk fat layer transcriptome reveals distinct gene expression profiles at three stages of lactation.

    Directory of Open Access Journals (Sweden)

    Danielle G Lemay

    Full Text Available Aware of the important benefits of human milk, most U.S. women initiate breastfeeding but difficulties with milk supply lead some to quit earlier than intended. Yet, the contribution of maternal physiology to lactation difficulties remains poorly understood. Human milk fat globules, by enveloping cell contents during their secretion into milk, are a rich source of mammary cell RNA. Here, we pair this non-invasive mRNA source with RNA-sequencing to probe the milk fat layer transcriptome during three stages of lactation: colostral, transitional, and mature milk production. The resulting transcriptomes paint an exquisite portrait of human lactation. The resulting transcriptional profiles cluster not by postpartum day, but by milk Na:K ratio, indicating that women sampled during similar postpartum time frames could be at markedly different stages of gene expression. Each stage of lactation is characterized by a dynamic range (10(5-fold in transcript abundances not previously observed with microarray technology. We discovered that transcripts for isoferritins and cathepsins are strikingly abundant during colostrum production, highlighting the potential importance of these proteins for neonatal health. Two transcripts, encoding β-casein (CSN2 and α-lactalbumin (LALBA, make up 45% of the total pool of mRNA in mature lactation. Genes significantly expressed across all stages of lactation are associated with making, modifying, transporting, and packaging milk proteins. Stage-specific transcripts are associated with immune defense during the colostral stage, up-regulation of the machinery needed for milk protein synthesis during the transitional stage, and the production of lipids during mature lactation. We observed strong modulation of key genes involved in lactose synthesis and insulin signaling. In particular, protein tyrosine phosphatase, receptor type, F (PTPRF may serve as a biomarker linking insulin resistance with insufficient milk supply. This

  12. De novo transcriptome sequencing and comparative analysis to discover genes involved in ovarian maturity in Strongylocentrotus nudus.

    Science.gov (United States)

    Jia, Zhiying; Wang, Qiai; Wu, Kaikai; Wei, Zhenlin; Zhou, Zunchun; Liu, Xiaolin

    2017-09-01

    Strongylocentrotus nudus is an edible sea urchin, mainly harvested in China. Correlation studies indicated that S. nudus with larger diameter have a prolonged marketing time and better palatability owing to their precocious gonads and extended maturation process. However, the molecular mechanism underlying this phenomenon is still unknown. Here, transcriptome sequencing was applied to study the ovaries of adult S. nudus with different shell diameters to explore the possible mechanism. In this study, four independent cDNA libraries were constructed, including two from the big size urchins and two from the small ones using a HiSeq™2500 platform. A total of 88,581 unigenes were acquired with a mean length of 1354bp, of which 66,331 (74.88%) unigenes could be annotated using six major publicly available databases. Comparative analysis revealed that 353 unigenes were differentially expressed (with log2(ratio)≥1, FDR≤0.001) between the two groups. Of these, 20 differentially expressed genes (DEGs) were selected to confirm the accuracy of RNA-seq data by quantitative real-time RT-PCR. Furthermore, gene ontology and KEGG pathway enrichment analyses were performed to find the putative genes and pathways related to ovarian maturity. Eight unigenes were identified as significant DEGs involved in reproduction related pathways; these included Mos, Cdc20, Rec8, YP30, cytochrome P450 2U1, ovoperoxidase, proteoliaisin, and rendezvin. Our research fills the gap in the studies on the S. nudus ovaries using transcriptome analysis. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Development of SSR Markers Based on Transcriptome Sequencing and Association Analysis with Drought Tolerance in Perennial Grass Miscanthus from China

    Directory of Open Access Journals (Sweden)

    Gang Nie

    2017-05-01

    Full Text Available Drought has become a critical environmental stress affecting on plant in temperate area. As one of the promising bio-energy crops to sustainable biomass production, the genus Miscanthus has been widely studied around the world. However, the most widely used hybrid cultivar among this genus, Miscanthus × giganteus is proved poor drought tolerance compared to some parental species. Here we mainly focused on Miscanthus sinensis, which is one of the progenitors of M. × giganteus providing a comparable yield and well abiotic stress tolerance in some places. The main objectives were to characterize the physiological and photosynthetic respond to drought stress and to develop simple sequence repeats (SSRs markers associated with drought tolerance by transcriptome sequencing within an originally collection of 44 Miscanthus genotypes from southwest China. Significant phenotypic differences were observed among genotypes, and the average of leaf relative water content (RWC were severely affected by drought stress decreasing from 88.27 to 43.21%, which could well contribute to separating the drought resistant and drought sensitive genotype of Miscanthus. Furthermore, a total of 16,566 gene-associated SSRs markers were identified based on Illumina RNA sequencing under drought conditions, and 93 of them were randomly selected to validate. In total, 70 (75.3% SSRs were successfully amplified and the generated loci from 30 polymorphic SSRs were used to estimate the genetic differentiation and population structure. Finally, two optimum subgroups of the population were determined by structure analysis and based on association analysis, seven significant associations were identified including two markers with leaf RWC and five markers with photosynthetic traits. With the rich sequencing resources annotation, such associations would serve an efficient tool for Miscanthus drought response mechanism study and facilitate genetic improvement of drought resistant for

  14. Integrated mRNA and microRNA transcriptome sequencing characterizes sequence variants and mRNA–microRNA regulatory network in nasopharyngeal carcinoma model systems

    Directory of Open Access Journals (Sweden)

    Carol Ying-Ying Szeto

    2014-01-01

    Full Text Available Nasopharyngeal carcinoma (NPC is a prevalent malignancy in Southeast Asia among the Chinese population. Aberrant regulation of transcripts has been implicated in many types of cancers including NPC. Herein, we characterized mRNA and miRNA transcriptomes by RNA sequencing (RNASeq of NPC model systems. Matched total mRNA and small RNA of undifferentiated Epstein–Barr virus (EBV-positive NPC xenograft X666 and its derived cell line C666, well-differentiated NPC cell line HK1, and the immortalized nasopharyngeal epithelial cell line NP460 were sequenced by Solexa technology. We found 2812 genes and 149 miRNAs (human and EBV to be differentially expressed in NP460, HK1, C666 and X666 with RNASeq; 533 miRNA–mRNA target pairs were inversely regulated in the three NPC cell lines compared to NP460. Integrated mRNA/miRNA expression profiling and pathway analysis show extracellular matrix organization, Beta-1 integrin cell surface interactions, and the PI3K/AKT, EGFR, ErbB, and Wnt pathways were potentially deregulated in NPC. Real-time quantitative PCR was performed on selected mRNA/miRNAs in order to validate their expression. Transcript sequence variants such as short insertions and deletions (INDEL, single nucleotide variant (SNV, and isomiRs were characterized in the NPC model systems. A novel TP53 transcript variant was identified in NP460, HK1, and C666. Detection of three previously reported novel EBV-encoded BART miRNAs and their isomiRs were also observed. Meta-analysis of a model system to a clinical system aids the choice of different cell lines in NPC studies. This comprehensive characterization of mRNA and miRNA transcriptomes in NPC cell lines and the xenograft provides insights on miRNA regulation of mRNA and valuable resources on transcript variation and regulation in NPC, which are potentially useful for mechanistic and preclinical studies.

  15. A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

    Science.gov (United States)

    Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola

    2015-01-01

    New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable

  16. Discovery of J chain in African lungfish (Protopterus dolloi, Sarcopterygii using high throughput transcriptome sequencing: implications in mucosal immunity.

    Directory of Open Access Journals (Sweden)

    Luca Tacchi

    Full Text Available J chain is a small polypeptide responsible for immunoglobulin (Ig polymerization and transport of Igs across mucosal surfaces in higher vertebrates. We identified a J chain in dipnoid fish, the African lungfish (Protopterus dolloi by high throughput sequencing of the transcriptome. P. dolloi J chain is 161 aa long and contains six of the eight Cys residues present in mammalian J chain. Phylogenetic studies place the lungfish J chain closer to tetrapod J chain than to the coelacanth or nurse shark sequences. J chain expression occurs in all P. dolloi immune tissues examined and it increases in the gut and kidney in response to an experimental bacterial infection. Double fluorescent in-situ hybridization shows that 88.5% of IgM⁺ cells in the gut co-express J chain, a significantly higher percentage than in the pre-pyloric spleen. Importantly, J chain expression is not restricted to the B-cell compartment since gut epithelial cells also express J chain. These results improve our current view of J chain from a phylogenetic perspective.

  17. Molecular adaptation in the world's deepest-living animal: Insights from transcriptome sequencing of the hadal amphipod Hirondellea gigas.

    Science.gov (United States)

    Lan, Yi; Sun, Jin; Tian, Renmao; Bartlett, Douglas H; Li, Runsheng; Wong, Yue Him; Zhang, Weipeng; Qiu, Jian-Wen; Xu, Ting; He, Li-Sheng; Tabata, Harry G; Qian, Pei-Yuan

    2017-07-01

    The Challenger Deep in the Mariana Trench is the deepest point in the oceans of our planet. Understanding how animals adapt to this harsh environment characterized by high hydrostatic pressure, food limitation, dark and cold is of great scientific interest. Of the animals dwelling in the Challenger Deep, amphipods have been captured using baited traps. In this study, we sequenced the transcriptome of the amphipod Hirondellea gigas collected at a depth of 10,929 m from the East Pond of the Challenger Deep. Assembly of these sequences resulted in 133,041 contigs and 22,046 translated proteins. Functional annotation of these contigs was made using the go and kegg databases. Comparison of these translated proteins with those of four shallow-water amphipods revealed 10,731 gene families, of which 5659 were single-copy orthologs. Base substitution analysis on these single-copy orthologs showed that 62 genes are positively selected in H. gigas, including genes related to β-alanine biosynthesis, energy metabolism and genetic information processing. For multiple-copy orthologous genes, gene family expansion analysis revealed that cold-inducible proteins (i.e., transcription factors II A and transcription elongation factor 1) as well as zinc finger domains are expanded in H. gigas. Overall, our results indicate that genetic adaptation to the hadal environment by H. gigas may be mediated by both gene family expansion and amino acid substitutions of specific proteins. © 2017 John Wiley & Sons Ltd.

  18. Transcriptome Sequencing and Development of Genic SSR Markers of an Endangered Chinese Endemic Genus Dipteronia Oliver (Aceraceae).

    Science.gov (United States)

    Zhou, Tao; Li, Zhong-Hu; Bai, Guo-Qing; Feng, Li; Chen, Chen; Wei, Yue; Chang, Yong-Xia; Zhao, Gui-Fang

    2016-02-23

    Dipteronia Oliver (Aceraceae) is an endangered Chinese endemic genus consisting of two living species, Dipteronia sinensis and Dipteronia dyeriana. However, studies on the population genetics and evolutionary analyses of Dipteronia have been hindered by limited genomic resources and genetic markers. Here, the generation, de novo assembly and annotation of transcriptome datasets, and a large set of microsatellite or simple sequence repeat (SSR) markers derived from Dipteronia have been described. After Illumina pair-end sequencing, approximately 93.2 million reads were generated and assembled to yield a total of 99,358 unigenes. A majority of these unigenes (53%, 52,789) had at least one blast hit against the public protein databases. Further, 12,377 SSR loci were detected and 4179 primer pairs were designed for experimental validation. Of these 4179 primer pairs, 435 primer pairs were randomly selected to test polymorphism. Our results show that products from 132 primer pairs were polymorphic, in which 97 polymorphic SSR markers were further selected to analyze the genetic diversity of 10 natural populations of Dipteronia. The identification of SSR markers during our research will provide the much valuable data for population genetic analyses and evolutionary studies in Dipteronia.

  19. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    Science.gov (United States)

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  20. Transcriptome analysis of carnation (Dianthus caryophyllus L. based on next-generation sequencing technology

    Directory of Open Access Journals (Sweden)

    Tanase Koji

    2012-07-01

    Full Text Available Abstract Background Carnation (Dianthus caryophyllus L., in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380 of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  1. Salmon louse (Lepeophtheirus salmonis transcriptomes during post molting maturation and egg production, revealed using EST-sequencing and microarray analysis

    Directory of Open Access Journals (Sweden)

    Jonassen Inge

    2008-03-01

    Full Text Available Abstract Background Lepeophtheirus salmonis is an ectoparasitic copepod feeding on skin, mucus and blood from salmonid hosts. Initial analysis of EST sequences from pre adult and adult stages of L. salmonis revealed a large proportion of novel transcripts. In order to link unknown transcripts to biological functions we have combined EST sequencing and microarray analysis to characterize female salmon louse transcriptomes during post molting maturation and egg production. Results EST sequence analysis shows that 43% of the ESTs have no significant hits in GenBank. Sequenced ESTs assembled into 556 contigs and 1614 singletons and whenever homologous genes were identified no clear correlation with homologous genes from any specific animal group was evident. Sequence comparison of 27 L. salmonis proteins with homologous proteins in humans, zebrafish, insects and crustaceans revealed an almost identical sequence identity with all species. Microarray analysis of maturing female adult salmon lice revealed two major transcription patterns; up-regulation during the final molting followed by down regulation and female specific up regulation during post molting growth and egg production. For a third minor group of ESTs transcription decreased during molting from pre-adult II to immature adults. Genes regulated during molting typically gave hits with cuticula proteins whilst transcripts up regulated during post molting growth were female specific, including two vitellogenins. Conclusion The copepod L.salmonis contains high a level of novel genes. Among analyzed L.salmonis proteins, sequence identities with homologous proteins in crustaceans are no higher than to homologous proteins in humans. Three distinct processes, molting, post molting growth and egg production correlate with transcriptional regulation of three groups of transcripts; two including genes related to growth, one including genes related to egg production. The function of the regulated

  2. Next-generation transcriptome assembly

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey A.; Wang, Zhong

    2011-09-01

    Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalog of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies-along with some perspectives on transcriptome assembly in the near future.

  3. Transcriptome sequencing for identification of diapause-associated genes in fall webworm, Hyphantria cunea Drury.

    Science.gov (United States)

    Deng, Yu; Li, Fei; Rieske, Lynne K; Sun, Li-Li; Sun, Shou-Hui

    2018-08-20

    Fall webworm, Hyphantria cunea Drury (Lepidoptera: Arctiidae) is extremely adaptable and highly invasive in China as a defoliator of ornamental and forest trees. Both voltinism and diapause strategies of fall webworm in China are variable, and this variability contributes to it invasiveness. Little is known about molecular regulation of diapause in fall webworm. To gain insight into possible mechanisms of diapause induction, high-throughput RNA-seq data were generated from non-diapause pupae (NDP) and diapause pupae (DP). A total of 58,151 unigenes were assembled and researched against nine public databases. In total, 29,013 up-regulated and 3451 down-regulated unigenes were differentially expressed by DP when compared with those of NDP. Genes encoding proteins such as UDP-glycosyl transferase (UGT), cytochrome P450 and Hsp70 were predicted to be involved in diapause. Moreover, GO function and KEGG pathway enrichments were performed on all differentially expressed genes (DEGs) and showed that cell cycle and insulin signaling pathways may be related to the diapause of the fall webworm. This study provides valuable information about the fall webworm transcriptome for future gene function research, especially as it relates to diapause. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Genome sequence and transcriptome analyses of the thermophilic zygomycete fungus Rhizomucor miehei.

    Science.gov (United States)

    Zhou, Peng; Zhang, Guoqiang; Chen, Shangwu; Jiang, Zhengqiang; Tang, Yanbin; Henrissat, Bernard; Yan, Qiaojuan; Yang, Shaoqing; Chen, Chin-Fu; Zhang, Bing; Du, Zhenglin

    2014-04-21

    The zygomycete fungi like Rhizomucor miehei have been extensively exploited for the production of various enzymes. As a thermophilic fungus, R. miehei is capable of growing at temperatures that approach the upper limits for all eukaryotes. To date, over hundreds of fungal genomes are publicly available. However, Zygomycetes have been rarely investigated both genetically and genomically. Here, we report the genome of R. miehei CAU432 to explore the thermostable enzymatic repertoire of this fungus. The assembled genome size is 27.6-million-base (Mb) with 10,345 predicted protein-coding genes. Even being thermophilic, the G + C contents of fungal whole genome (43.8%) and coding genes (47.4%) are less than 50%. Phylogenetically, R. miehei is more closerly related to Phycomyces blakesleeanus than to Mucor circinelloides and Rhizopus oryzae. The genome of R. miehei harbors a large number of genes encoding secreted proteases, which is consistent with the characteristics of R. miehei being a rich producer of proteases. The transcriptome profile of R. miehei showed that the genes responsible for degrading starch, glucan, protein and lipid were highly expressed. The genome information of R. miehei will facilitate future studies to better understand the mechanisms of fungal thermophilic adaptation and the exploring of the potential of R. miehei in industrial-scale production of thermostable enzymes. Based on the existence of a large repertoire of amylolytic, proteolytic and lipolytic genes in the genome, R. miehei has potential in the production of a variety of such enzymes.

  5. Novel Kunitz-like Peptides Discovered in the Zoanthid Palythoa caribaeorum through Transcriptome Sequencing.

    Science.gov (United States)

    Liao, Qiwen; Li, Shengnan; Siu, Shirley Weng In; Yang, Binrui; Huang, Chen; Chan, Judy Yuet-Wa; Morlighem, Jean-Étienne R L; Wong, Clarence Tsun Ting; Rádis-Baptista, Gandhi; Lee, Simon Ming-Yuen

    2018-02-02

    Palythoa caribaeorum (class Anthozoa) is a zoanthid that together jellyfishes, hydra, and sea anemones, which are venomous and predatory, belongs to the Phyllum Cnidaria. The distinguished feature in these marine animals is the cnidocytes in the body tissues, responsible for toxin production and injection that are used majorly for prey capture and defense. With exception for other anthozoans, the toxin cocktails of zoanthids have been scarcely studied and are poorly known. Here, on the basis of the analysis of P. caribaeorum transcriptome, numerous predicted venom-featured polypeptides were identified including allergens, neurotoxins, membrane-active, and Kunitz-like peptides (PcKuz). The three predicted PcKuz isotoxins (1-3) were selected for functional studies. Through computational processing comprising structural phylogenetic analysis, molecular docking, and dynamics simulation, PcKuz3 was shown to be a potential voltage gated potassium-channel inhibitor. PcKuz3 fitted well as new functional Kunitz-type toxins with strong antilocomotor activity as in vivo assessed in zebrafish larvae, with weak inhibitory effect toward proteases, as evaluated in vitro. Notably, PcKuz3 can suppress, at low concentration, the 6-OHDA-induced neurotoxicity on the locomotive behavior of zebrafish, which indicated PcKuz3 may have a neuroprotective effect. Taken together, PcKuz3 figures as a novel neurotoxin structure, which differs from known homologous peptides expressed in sea anemone. Moreover, the novel PcKuz3 provides an insightful hint for biodrug development for prospective neurodegenerative disease treatment.

  6. Comprehensive microRNA profiling in B-cells of human centenarians by massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Gombar Saurabh

    2012-07-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are small, non-coding RNAs that regulate gene expression and play a critical role in development, homeostasis, and disease. Despite their demonstrated roles in age-associated pathologies, little is known about the role of miRNAs in human aging and longevity. Results We employed massively parallel sequencing technology to identify miRNAs expressed in B-cells from Ashkenazi Jewish centenarians, i.e., those living to a hundred and a human model of exceptional longevity, and younger controls without a family history of longevity. With data from 26.7 million reads comprising 9.4 × 108 bp from 3 centenarian and 3 control individuals, we discovered a total of 276 known miRNAs and 8 unknown miRNAs ranging several orders of magnitude in expression levels, a typical characteristics of saturated miRNA-sequencing. A total of 22 miRNAs were found to be significantly upregulated, with only 2 miRNAs downregulated, in centenarians as compared to controls. Gene Ontology analysis of the predicted and validated targets of the 24 differentially expressed miRNAs indicated enrichment of functional pathways involved in cell metabolism, cell cycle, cell signaling, and cell differentiation. A cross sectional expression analysis of the differentially expressed miRNAs in B-cells from Ashkenazi Jewish individuals between the 50th and 100th years of age indicated that expression levels of miR-363* declined significantly with age. Centenarians, however, maintained the youthful expression level. This result suggests that miR-363* may be a candidate longevity-associated miRNA. Conclusion Our comprehensive miRNA data provide a resource for further studies to identify genetic pathways associated with aging and longevity in humans.

  7. Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.

    Directory of Open Access Journals (Sweden)

    Francesca Cordero

    Full Text Available BACKGROUND: Massive Parallel Sequencing methods (MPS can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and short non-coding RNAs, e.g. miRNAs. The processing methods used to extract and interpret the information are an important aspect of dealing with the vast amounts of data generated from short read sequencing. Although the number of computational tools for MPS data analysis is constantly growing, their strengths and weaknesses as part of a complex analytical pipe-line have not yet been well investigated. PRIMARY FINDINGS: A benchmark MPS miRNA dataset, resembling a situation in which miRNAs are spiked in biological replication experiments was assembled by merging a publicly available MPS spike-in miRNAs data set with MPS data derived from healthy donor peripheral blood mononuclear cells. Using this data set we observed that short reads counts estimation is strongly under estimated in case of duplicates miRNAs, if whole genome is used as reference. Furthermore, the sensitivity of miRNAs detection is strongly dependent by the primary tool used in the analysis. Within the six aligners tested, specifically devoted to miRNA detection, SHRiMP and MicroRazerS show the highest sensitivity. Differential expression estimation is quite efficient. Within the five tools investigated, two of them (DESseq, baySeq show a very good specificity and sensitivity in the detection of differential expression. CONCLUSIONS: The results provided by our analysis allow the definition of a clear and simple analytical optimized workflow for miRNAs digital quantitative analysis.

  8. Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.

    Science.gov (United States)

    Cordero, Francesca; Beccuti, Marco; Arigoni, Maddalena; Donatelli, Susanna; Calogero, Raffaele A

    2012-01-01

    Massive Parallel Sequencing methods (MPS) can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and short non-coding RNAs, e.g. miRNAs. The processing methods used to extract and interpret the information are an important aspect of dealing with the vast amounts of data generated from short read sequencing. Although the number of computational tools for MPS data analysis is constantly growing, their strengths and weaknesses as part of a complex analytical pipe-line have not yet been well investigated. A benchmark MPS miRNA dataset, resembling a situation in which miRNAs are spiked in biological replication experiments was assembled by merging a publicly available MPS spike-in miRNAs data set with MPS data derived from healthy donor peripheral blood mononuclear cells. Using this data set we observed that short reads counts estimation is strongly under estimated in case of duplicates miRNAs, if whole genome is used as reference. Furthermore, the sensitivity of miRNAs detection is strongly dependent by the primary tool used in the analysis. Within the six aligners tested, specifically devoted to miRNA detection, SHRiMP and MicroRazerS show the highest sensitivity. Differential expression estimation is quite efficient. Within the five tools investigated, two of them (DESseq, baySeq) show a very good specificity and sensitivity in the detection of differential expression. The results provided by our analysis allow the definition of a clear and simple analytical optimized workflow for miRNAs digital quantitative analysis.

  9. De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing

    Science.gov (United States)

    Zhuang Hu; Tian Zhang; Xiao-Xiao Gao; Yang Wang; Qiang Zhang; Hui-Juan Zhou; Gui-Fang Zhao; Ma-Li Wang; Keith E. Woeste; Peng Zhao

    2016-01-01

    Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome...

  10. Robust Parallel Machine Scheduling Problem with Uncertainties and Sequence-Dependent Setup Time

    Directory of Open Access Journals (Sweden)

    Hongtao Hu

    2016-01-01

    Full Text Available A parallel machine scheduling problem in plastic production is studied in this paper. In this problem, the processing time and arrival time are uncertain but lie in their respective intervals. In addition, each job must be processed together with a mold while jobs which belong to one family can share the same mold. Therefore, time changing mold is required for two consecutive jobs that belong to different families, which is known as sequence-dependent setup time. This paper aims to identify a robust schedule by min–max regret criterion. It is proved that the scenario incurring maximal regret for each feasible solution lies in finite extreme scenarios. A mixed integer linear programming formulation and an exact algorithm are proposed to solve the problem. Moreover, a modified artificial bee colony algorithm is developed to solve large-scale problems. The performance of the presented algorithm is evaluated through extensive computational experiments and the results show that the proposed algorithm surpasses the exact method in terms of objective value and computational time.

  11. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Development of a parallel zoomed EVI sequence for high temporal resolution analysis of the BOLD response

    International Nuclear Information System (INIS)

    Rabrait, C.

    2006-01-01

    The hemodynamic impulse response to any short stimulus typically lasts around 20 seconds. Thus, the detection of the Blood Oxygenation Level Dependent (BOLD) effect is usually performed using a 2D Echo Planar Imaging (EPI) sequence, with repetition times on the order of 1 or 2 seconds. This temporal resolution is generally enough for detection purposes. Nevertheless, when trying to accurately estimate the hemodynamic response functions (HRF), higher scanning rates represent a real advantage. Thus, in order to reach a temporal resolution around 200 ms, we developed a new acquisition method, based on Echo Volumar Imaging and 2D parallel acquisition (1). Echo Volumar Imaging (EVI) has been proposed in 1977 by Mansfield (2). EVI intrinsically possesses a lot of advantages for functional neuroimaging, as a 3 D single shot acquisition method. Nevertheless, to date, only a few applications have been reported (3, 4). Actually, very restricting hardware requirements make EVI difficult to perform in satisfactory experimental conditions, even today. The critical point in EVI is the echo train duration, which is longer than in EPI, due to 3D acquisition. Indeed, at equal field of view and spatial resolutions, EVI echo train duration must be approximately equal to EPI echo train duration multiplied by the number of slices acquired in EPI. Consequently, EVI is much more sensitive than EPI to geometric distortions, which are related to phase errors, and also to signal losses, which are due to long echo times (TE). Thus, a first improvement has been brought by 'zoomed' or 'localized' EVI (5), which allows to focus on a small volume of interest and thus limit echo train durations compared to full FOV acquisitions.To reduce echo train durations, we chose to apply parallel acquisition. Moreover, since EVI is a 3D acquisition method, we are able to perform parallel acquisition and SENSE reconstruction along the two phase directions (6). The R = 4 under-sampling consists in the

  13. Genome and Transcriptome sequence of Finger millet (Eleusine coracana (L.) Gaertn.) provides insights into drought tolerance and nutraceutical properties.

    Science.gov (United States)

    Hittalmani, Shailaja; Mahesh, H B; Shirke, Meghana Deepak; Biradar, Hanamareddy; Uday, Govindareddy; Aruna, Y R; Lohithaswa, H C; Mohanrao, A

    2017-06-15

    Finger millet (Eleusine coracana (L.) Gaertn.) is an important staple food crop widely grown in Africa and South Asia. Among the millets, finger millet has high amount of calcium, methionine, tryptophan, fiber, and sulphur containing amino acids. In addition, it has C4 photosynthetic carbon assimilation mechanism, which helps to utilize water and nitrogen efficiently under hot and arid conditions without severely affecting yield. Therefore, development and utilization of genomic resources for genetic improvement of this crop is immensely useful. Experimental results from whole genome sequencing and assembling process of ML-365 finger millet cultivar yielded 1196 Mb covering approximately 82% of total estimated genome size. Genome analysis showed the presence of 85,243 genes and one half of the genome is repetitive in nature. The finger millet genome was found to have higher colinearity with foxtail millet and rice as compared to other Poaceae species. Mining of simple sequence repeats (SSRs) yielded abundance of SSRs within the finger millet genome. Functional annotation and mining of transcription factors revealed finger millet genome harbors large number of drought tolerance related genes. Transcriptome analysis of low moisture stress and non-stress samples revealed the identification of several drought-induced candidate genes, which could be used in drought tolerance breeding. This genome sequencing effort will strengthen plant breeders for allele discovery, genetic mapping, and identification of candidate genes for agronomically important traits. Availability of genomic resources of finger millet will enhance the novel breeding possibilities to address potential challenges of finger millet improvement.

  14. Sex chromosomes and germline transcriptomics explored by single-cell sequencing and RNA-tomography

    NARCIS (Netherlands)

    Vértesy, Ábel

    2018-01-01

    In our study of germ cell differentiation, we applied two recently developed technologies on the germline of various model organisms: single-cell mRNA sequencing and RNA-tomography. For the first time we could look at gene expression with such a high resolution, and this led us to discover the

  15. Comparison of Pre-Analytical FFPE Sample Preparation Methods and Their Impact on Massively Parallel Sequencing in Routine Diagnostics

    Science.gov (United States)

    Heydt, Carina; Fassunke, Jana; Künstlinger, Helen; Ihle, Michaela Angelika; König, Katharina; Heukamp, Lukas Carl; Schildhaus, Hans-Ulrich; Odenthal, Margarete; Büttner, Reinhard; Merkelbach-Bruse, Sabine

    2014-01-01

    Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE) material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany) seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3–24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can be used for

  16. Comparison of pre-analytical FFPE sample preparation methods and their impact on massively parallel sequencing in routine diagnostics.

    Directory of Open Access Journals (Sweden)

    Carina Heydt

    Full Text Available Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3-24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA. No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can

  17. High Rhodotorula sequences in skin transcriptome of patients with diffuse systemic sclerosis.

    Science.gov (United States)

    Arron, Sarah T; Dimon, Michelle T; Li, Zhenghui; Johnson, Michael E; Wood, Tammara A; Feeney, Luzviminda; Angeles, Jorge G; Lafyatis, Robert; Whitfield, Michael L

    2014-08-01

    Previous studies have suggested a role for pathogens as a trigger of systemic sclerosis (SSc), although neither a pathogen nor a mechanism of pathogenesis is known. Here we show enrichment of Rhodotorula sequences in the skin of patients with early, diffuse SSc compared with that in normal controls. RNA-seq was performed on four SSc patients and four controls, to a depth of 200 million reads per patient. Data were analyzed to quantify the nonhuman sequence reads in each sample. We found little difference between bacterial microbiome and viral read counts, but found a significant difference between the read counts for a mycobiome component, R. glutinis. Normal samples contained almost no detected R. glutinis or other Rhodotorula sequence reads (mean score 0.021 for R. glutinis, 0.024 for all Rhodotorula). In contrast, SSc samples had a mean score of 5.039 for R. glutinis (5.232 for Rhodotorula). We were able to assemble the D1-D2 hypervariable region of the 28S ribosomal RNA (rRNA) of R. glutinis from each of the SSc samples. Taken together, these results suggest that R. glutinis may be present in the skin of early SSc patients at higher levels than in normal skin, raising the possibility that it may be triggering the inflammatory response found in SSc.

  18. Transcriptome Sequencing of Dianthus spiculifolius and Analysis of the Genes Involved in Responses to Combined Cold and Drought Stress.

    Science.gov (United States)

    Zhou, Aimin; Ma, Hongping; Liu, Enhui; Jiang, Tongtong; Feng, Shuang; Gong, Shufang; Wang, Jingang

    2017-04-17

    Dianthus spiculifolius , a perennial herbaceous flower and a member of the Caryophyllaceae family, has strong resistance to cold and drought stresses. To explore the transcriptional responses of D. spiculifolius to individual and combined stresses, we performed transcriptome sequencing of seedlings under normal conditions or subjected to cold treatment (CT), simulated drought treatment (DT), or their combination (CTDT). After de novo assembly of the obtained reads, 112,015 unigenes were generated. Analysis of differentially expressed genes (DEGs) showed that 2026, 940, and 2346 genes were up-regulated and 1468, 707, and 1759 were down-regulated in CT, DT, and CTDT samples, respectively. Among all the DEGs, 182 up-regulated and 116 down-regulated genes were identified in all the treatment groups. Analysis of metabolic pathways and regulatory networks associated with the DEGs revealed overlaps and cross-talk between cold and drought stress response pathways. The expression profiles of the selected DEGs in CT, DT, and CTDT samples were characterized and confirmed by quantitative RT-PCR. These DEGs and metabolic pathways may play important roles in the response of D. spiculifolius to the combined stress. Functional characterization of these genes and pathways will provide new targets for enhancement of plant stress tolerance through genetic manipulation.

  19. De novo sequencing, assembly, and analysis of Iris lactea var. chinensis roots' transcriptome in response to salt stress.

    Science.gov (United States)

    Gu, Chunsun; Xu, Sheng; Wang, Zhiquan; Liu, Liangqin; Zhang, Yongxia; Deng, Yanming; Huang, Suzhen

    2018-04-01

    As a halophyte, Iris lactea var. chinensis (I. lactea var. chinensis) is widely distributed and has good drought and heavy metal resistance. Moreover, it is an excellent ornamental plant. I. lactea var. chinensis has extensive application prospects owing to the global impacts of salinization. To better understand its molecular mechanism involved in salt resistance, the de novo sequencing, assembly, and analysis of I. lactea var. chinensis roots' transcriptome in response to salt-stress conditions was performed. On average, 74.17% of the clean reads were mapped to unigenes. A total of 121,093 unigenes were constructed and 56,398 (46.57%) were annotated. Among these, 13,522 differentially expressed genes (DEGs) were identified between salt-treated and control samples Compared to the transcriptional level of control, 7037 DEGs were up-regulated and 6539 down-regulated. In addition, 129 up-regulated and 1609 down-regulated genes were simultaneously detected in all three pairwise comparisons between control and salt-stressed libraries. At least 247 and 250 DEGs encoding transcription factors and transporter proteins were identified. Meanwhile, 130 DEGs regarding reactive oxygen species (ROS) scavenging system were also summarized. Based on real-time quantitative RT-PCR, we verified the changes in the expression patterns of 10 unigenes. Our study identified potential salt-responsive candidate genes and increased the understanding of halophyte responses to salinity stress. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  20. Transcriptome sequencing and de novo analysis of a cytoplasmic male sterile line and its near-isogenic restorer line in chili pepper (Capsicum annuum L..

    Directory of Open Access Journals (Sweden)

    Chen Liu

    Full Text Available BACKGROUND: The use of cytoplasmic male sterility (CMS in F1 hybrid seed production of chili pepper is increasingly popular. However, the molecular mechanisms of cytoplasmic male sterility and fertility restoration remain poorly understood due to limited transcriptomic and genomic data. Therefore, we analyzed the difference between a CMS line 121A and its near-isogenic restorer line 121C in transcriptome level using next generation sequencing technology (NGS, aiming to find out critical genes and pathways associated with the male sterility. RESULTS: We generated approximately 53 million sequencing reads and assembled de novo, yielding 85,144 high quality unigenes with an average length of 643 bp. Among these unigenes, 27,191 were identified as putative homologs of annotated sequences in the public protein databases, 4,326 and 7,061 unigenes were found to be highly abundant in lines 121A and 121C, respectively. Many of the differentially expressed unigenes represent a set of potential candidate genes associated with the formation or abortion of pollen. CONCLUSIONS: Our study profiled anther transcriptomes of a chili pepper CMS line and its restorer line. The results shed the lights on the occurrence and recovery of the disturbances in nuclear-mitochondrial interaction and provide clues for further investigations.

  1. Transcriptome sequencing and de novo analysis of cytoplasmic male sterility and maintenance in JA-CMS cotton.

    Science.gov (United States)

    Yang, Peng; Han, Jinfeng; Huang, Jinling

    2014-01-01

    Cytoplasmic male sterility (CMS) is the failure to produce functional pollen, which is inherited maternally. And it is known that anther development is modulated through complicated interactions between nuclear and mitochondrial genes in sporophytic and gametophytic tissues. However, an unbiased transcriptome sequencing analysis of CMS in cotton is currently lacking in the literature. This study compared differentially expressed (DE) genes of floral buds at the sporogenous cells stage (SS) and microsporocyte stage (MS) (the two most important stages for pollen abortion in JA-CMS) between JA-CMS and its fertile maintainer line JB cotton plants, using the Illumina HiSeq 2000 sequencing platform. A total of 709 (1.8%) DE genes including 293 up-regulated and 416 down-regulated genes were identified in JA-CMS line comparing with its maintainer line at the SS stage, and 644 (1.6%) DE genes with 263 up-regulated and 381 down-regulated genes were detected at the MS stage. By comparing the two stages in the same material, there were 8 up-regulated and 9 down-regulated DE genes in JA-CMS line and 29 up-regulated and 9 down-regulated DE genes in JB maintainer line at the MS stage. Quantitative RT-PCR was used to validate 7 randomly selected DE genes. Bioinformatics analysis revealed that genes involved in reduction-oxidation reactions and alpha-linolenic acid metabolism were down-regulated, while genes pertaining to photosynthesis and flavonoid biosynthesis were up-regulated in JA-CMS floral buds compared with their JB counterparts at the SS and/or MS stages. All these four biological processes play important roles in reactive oxygen species (ROS) homeostasis, which may be an important factor contributing to the sterile trait of JA-CMS. Further experiments are warranted to elucidate molecular mechanisms of these genes that lead to CMS.

  2. Deep sequencing reveals transcriptome re-programming of Taxus × media cells to the elicitation with methyl jasmonate.

    Science.gov (United States)

    Sun, Guiling; Yang, Yanfang; Xie, Fuliang; Wen, Jian-Fan; Wu, Jianqiang; Wilson, Iain W; Tang, Qi; Liu, Hongwei; Qiu, Deyou

    2013-01-01

    Plant cell culture represents an alternative source for producing high-value secondary metabolites including paclitaxel (Taxol®), which is mainly produced in Taxus and has been widely used in cancer chemotherapy. The phytohormone methyl jasmonate (MeJA) can significantly increase the production of paclitaxel, which is induced in plants as a secondary metabolite possibly in defense against herbivores and pathogens. In cell culture, MeJA also elicits the accumulation of paclitaxel; however, the mechanism is still largely unknown. To obtain insight into the global regulation mechanism of MeJA in the steady state of paclitaxel production (7 days after MeJA addition), especially on paclitaxel biosynthesis, we sequenced the transcriptomes of MeJA-treated and untreated Taxus × media cells and obtained ∼ 32.5 M high quality reads, from which 40,348 unique sequences were obtained by de novo assembly. Expression level analysis indicated that a large number of genes were associated with transcriptional regulation, DNA and histone modification, and MeJA signaling network. All the 29 known genes involved in the biosynthesis of terpenoid backbone and paclitaxel were found with 18 genes showing increased transcript abundance following elicitation of MeJA. The significantly up-regulated changes of 9 genes in paclitaxel biosynthesis were validated by qRT-PCR assays. According to the expression changes and the previously proposed enzyme functions, multiple candidates for the unknown steps in paclitaxel biosynthesis were identified. We also found some genes putatively involved in the transport and degradation of paclitaxel. Potential target prediction of miRNAs indicated that miRNAs may play an important role in the gene expression regulation following the elicitation of MeJA. Our results shed new light on the global regulation mechanism by which MeJA regulates the physiology of Taxus cells and is helpful to understand how MeJA elicits other plant species besides Taxus.

  3. Next-generation sequencing-based transcriptome analysis of Helicoverpa armigera Larvae immune-primed with Photorhabdus luminescens TT01.

    Directory of Open Access Journals (Sweden)

    Zengyang Zhao

    Full Text Available Although invertebrates are incapable of adaptive immunity, immunal reactions which are functionally similar to the adaptive immunity of vertebrates have been described in many studies of invertebrates including insects. The phenomenon was termed immune priming. In order to understand the molecular mechanism of immune priming, we employed Illumina/Solexa platform to investigate the transcriptional changes of the hemocytes and fat body of Helicoverpa armigera larvae immune-primed with the pathogenic bacteria Photorhabdus luminescens TT01. A total of 43.6 and 65.1 million clean reads with 4.4 and 6.5 gigabase sequence data were obtained from the TT01 (the immune-primed and PBS (non-primed cDNA libraries and assembled into 35,707 all-unigenes (non-redundant transcripts, which has a length varied from 201 to 16,947 bp and a N50 length of 1,997 bp. For 35,707 all-unigenes, 20,438 were functionally annotated and 2,494 were differentially expressed after immune priming. The differentially expressed genes (DEGs are mainly related to immunity, detoxification, development and metabolism of the host insect. Analysis on the annotated immune related DEGs supported a hypothesis that we proposed previously: the immune priming phenomenon observed in H. armigera larvae was achieved by regulation of key innate immune elements. The transcriptome profiling data sets (especially the sequences of 1,022 unannotated DEGs and the clues (such as those on immune-related signal and regulatory pathways obtained from this study will facilitate immune-related novel gene discovery and provide valuable information for further exploring the molecular mechanism of immune priming of invertebrates. All these will increase our understanding of invertebrate immunity which may provide new approaches to control insect pests or prevent epidemic of infectious diseases in economic invertebrates in the future.

  4. A comprehensive assessment of the transcriptome of cork oak (Quercus suber) through EST sequencing.

    Science.gov (United States)

    Pereira-Leal, José B; Abreu, Isabel A; Alabaça, Cláudia S; Almeida, Maria Helena; Almeida, Paulo; Almeida, Tânia; Amorim, Maria Isabel; Araújo, Susana; Azevedo, Herlânder; Badia, Aleix; Batista, Dora; Bohn, Andreas; Capote, Tiago; Carrasquinho, Isabel; Chaves, Inês; Coelho, Ana Cristina; Costa, Maria Manuela Ribeiro; Costa, Rita; Cravador, Alfredo; Egas, Conceição; Faro, Carlos; Fortes, Ana M; Fortunato, Ana S; Gaspar, Maria João; Gonçalves, Sónia; Graça, José; Horta, Marília; Inácio, Vera; Leitão, José M; Lino-Neto, Teresa; Marum, Liliana; Matos, José; Mendonça, Diogo; Miguel, Andreia; Miguel, Célia M; Morais-Cecílio, Leonor; Neves, Isabel; Nóbrega, Filomena; Oliveira, Maria Margarida; Oliveira, Rute; Pais, Maria Salomé; Paiva, Jorge A; Paulo, Octávio S; Pinheiro, Miguel; Raimundo, João A P; Ramalho, José C; Ribeiro, Ana I; Ribeiro, Teresa; Rocheta, Margarida; Rodrigues, Ana Isabel; Rodrigues, José C; Saibo, Nelson J M; Santo, Tatiana E; Santos, Ana Margarida; Sá-Pereira, Paula; Sebastiana, Mónica; Simões, Fernanda; Sobral, Rómulo S; Tavares, Rui; Teixeira, Rita; Varela, Carolina; Veloso, Maria Manuela; Ricardo, Cândido P P

    2014-05-15

    Cork oak (Quercus suber) is one of the rare trees with the ability to produce cork, a material widely used to make wine bottle stoppers, flooring and insulation materials, among many other uses. The molecular mechanisms of cork formation are still poorly understood, in great part due to the difficulty in studying a species with a long life-cycle and for which there is scarce molecular/genomic information. Cork oak forests are of great ecological importance and represent a major economic and social resource in Southern Europe and Northern Africa. However, global warming is threatening the cork oak forests by imposing thermal, hydric and many types of novel biotic stresses. Despite the economic and social value of the Q. suber species, few genomic resources have been developed, useful for biotechnological applications and improved forest management. We generated in excess of 7 million sequence reads, by pyrosequencing 21 normalized cDNA libraries derived from multiple Q. suber tissues and organs, developmental stages and physiological conditions. We deployed a stringent sequence processing and assembly pipeline that resulted in the identification of ~159,000 unigenes. These were annotated according to their similarity to known plant genes, to known Interpro domains, GO classes and E.C. numbers. The phylogenetic extent of this ESTs set was investigated, and we found that cork oak revealed a significant new gene space that is not covered by other model species or EST sequencing projects. The raw data, as well as the full annotated assembly, are now available to the community in a dedicated web portal at http://www.corkoakdb.org. This genomic resource represents the first trancriptome study in a cork producing species. It can be explored to develop new tools and approaches to understand stress responses and developmental processes in forest trees, as well as the molecular cascades underlying cork differentiation and disease response.

  5. Transcriptomic Modification in the Cerebral Cortex following Noninvasive Brain Stimulation: RNA-Sequencing Approach

    Science.gov (United States)

    2017-04-20

    2016; Accepted 14 November 2016 Academic Editor: Ming-Kuei Lu Copyright © 2016 Ben Holmes et al. This is an open access article distributed under the...associative learning in behav- ing rabbits,” Proceedings of the National Academy of Sciences of the United States of America, vol. 109, no. 17, pp. 6710...P. T. Pyl, and W. Huber, “HTSeq—a Python framework to work with high-throughput sequencing data,” Bioinformatics, vol. 31, no. 2, pp. 166–169, 2015

  6. Transcriptome profiling of the cancer, adjacent non-tumor and distant normal tissues from a colorectal cancer patient by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Yan'an Wu

    Full Text Available Colorectal cancer (CRC is one of the most commonly diagnosed cancers in the world. A genome-wide screening of transcriptome dysregulation between cancer and normal tissue would provide insight into the molecular basis of CRC initiation and progression. Compared with microarray technology, which is commonly used to identify transcriptional changes, the recently developed RNA-seq technique has the ability to detect other abnormal regulations in the cancer transcriptome, such as alternative splicing, novel transcripts or gene fusion. In this study, we performed high-throughput transcriptome sequencing at ~50× coverage on CRC, adjacent non-tumor and distant normal tissue. The results revealed cancer-specific, differentially expressed genes and differential alternative splicing, suggesting that the extracellular matrix and metabolic pathways are activated and the genes related to cell homeostasis are suppressed in CRC. In addition, one tumor-restricted gene fusion, PRTEN-NOTCH2, was also detected and experimentally confirmed. This study reveals some common features in tumor invasion and provides a comprehensive survey of the CRC transcriptome, which provides better insight into the complexity of regulatory changes during tumorigenesis.

  7. Transcriptomic characterization of soybean (Glycine max) roots in response to rhizobium infection by RNA sequencing

    International Nuclear Information System (INIS)

    He, Q.; Li, Z.; Wang, S.; Huang, S.; Yang, H.

    2018-01-01

    Legumes interacting with rhizobium to convert N2 into ammonia for plant use has attracted worldwide interest. However, the plant basal nitrogen fixation mechanisms induced in response to Rhizobium, giving differential gene expression of plants, have not yet been fully realized. The differential expressed genes of soybean between inoculated and mock-inoculated were analyzed by a RNA-Seq. The results of the sequencing were aligned against the Williams 82 genome sequence, which contain 55787 transcripts; 280 and 316 transcripts were found to be up- and down-regulated, respectively, for inoculated and mock-inoculated soybean roots at stage V1. Gene ontology (GO) analyses detected 104, 182 and 178 genes associated with the cell component category, molecular function category and biological process category, respectively. Pathway analysis revealed that 98 differentially expressed genes (115 transcripts) were involved in 169 biological pathways. We selected 19 differentially expressed genes and analyzed their expressions in mock-inoculated, inoculated USDA110 and CCBAU45436 using qRT-PCR. The results were in accordance with those obtained from rhizobia infected RNA-Seq data. These showed that the results of RNA-Seq had reliability and universality. Additionally, this study showed some novel genes associated with the nitrogen fixation process in comparison to previously identified QTLs. (author)

  8. Parallel Algorithm for Solving TOV Equations for Sequence of Cold and Dense Nuclear Matter Models

    Science.gov (United States)

    Ayriyan, Alexander; Buša, Ján; Grigorian, Hovik; Poghosyan, Gevorg

    2018-04-01

    We have introduced parallel algorithm simulation of neutron star configurations for set of equation of state models. The performance of the parallel algorithm has been investigated for testing set of EoS models on two computational systems. It scales when using with MPI on modern CPUs and this investigation allowed us also to compare two different types of computational nodes.

  9. Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation

    DEFF Research Database (Denmark)

    Czaban, Adrian; Sharma, Sapna; Byrne, Stephen

    2015-01-01

    species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family...... clusters for the four species using OrthoMCL and analyzed their inferred phylogenetic relationships. Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex. Grouping of the gene families based on their BLAST identity...... enabled us to divide ortholog groups into those that are very conserved and those that are more evolutionarily relaxed. The ratio of the non-synonumous to synonymous substitutions enabled us to pinpoint protein sequences evolving in response to positive selection. These proteins may explain some...

  10. Genome-wide analysis of SRSF10-regulated alternative splicing by deep sequencing of chicken transcriptome

    Directory of Open Access Journals (Sweden)

    Xuexia Zhou

    2014-12-01

    Full Text Available Splicing factor SRSF10 is known to function as a sequence-specific splicing activator that is capable of regulating alternative splicing both in vitro and in vivo. We recently used an RNA-seq approach coupled with bioinformatics analysis to identify the extensive splicing network regulated by SRSF10 in chicken cells. We found that SRSF10 promoted both exon inclusion and exclusion. Functionally, many of the SRSF10-verified alternative exons are linked to pathways of response to external stimulus. Here we describe in detail the experimental design, bioinformatics analysis and GO/pathway enrichment analysis of SRSF10-regulated genes to correspond with our data in the Gene Expression Omnibus with accession number GSE53354. Our data thus provide a resource for studying regulation of alternative splicing in vivo that underlines biological functions of splicing regulatory proteins in cells.

  11. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    Directory of Open Access Journals (Sweden)

    Xiang Li-xin

    2010-08-01

    Full Text Available Abstract Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host

  12. Genomic Aberrations in Crizotinib Resistant Lung Adenocarcinoma Samples Identified by Transcriptome Sequencing.

    Directory of Open Access Journals (Sweden)

    Ali Saber

    Full Text Available ALK-break positive non-small cell lung cancer (NSCLC patients initially respond to crizotinib, but resistance occurs inevitably. In this study we aimed to identify fusion genes in crizotinib resistant tumor samples. Re-biopsies of three patients were subjected to paired-end RNA sequencing to identify fusion genes using deFuse and EricScript. The IGV browser was used to determine presence of known resistance-associated mutations. Sanger sequencing was used to validate fusion genes and digital droplet PCR to validate mutations. ALK fusion genes were detected in all three patients with EML4 being the fusion partner. One patient had no additional fusion genes. Another patient had one additional fusion gene, but without a predicted open reading frame (ORF. The third patient had three additional fusion genes, of which two were derived from the same chromosomal region as the EML4-ALK. A predicted ORF was identified only in the CLIP4-VSNL1 fusion product. The fusion genes validated in the post-treatment sample were also present in the biopsy before crizotinib. ALK mutations (p.C1156Y and p.G1269A detected in the re-biopsies of two patients, were not detected in pre-treatment biopsies. In conclusion, fusion genes identified in our study are unlikely to be involved in crizotinib resistance based on presence in pre-treatment biopsies. The detection of ALK mutations in post-treatment tumor samples of two patients underlines their role in crizotinib resistance.

  13. Integrated analysis of 454 and Illumina transcriptomic sequencing characterizes carbon flux and energy source for fatty acid synthesis in developing Lindera glauca fruits for woody biodiesel.

    Science.gov (United States)

    Lin, Zixin; An, Jiyong; Wang, Jia; Niu, Jun; Ma, Chao; Wang, Libing; Yuan, Guanshen; Shi, Lingling; Liu, Lili; Zhang, Jinsong; Zhang, Zhixiang; Qi, Ji; Lin, Shanzhi

    2017-01-01

    Lindera glauca fruit with high quality and quantity of oil has emerged as a novel potential source of biodiesel in China, but the molecular regulatory mechanism of carbon flux and energy source for oil biosynthesis in developing fruits is still unknown. To better develop fruit oils of L. glauca as woody biodiesel, a combination of two different sequencing platforms (454 and Illumina) and qRT-PCR analysis was used to define a minimal reference transcriptome of developing L. glauca fruits, and to construct carbon and energy metabolic model for regulation of carbon partitioning and energy supply for FA biosynthesis and oil accumulation. We first analyzed the dynamic patterns of growth tendency, oil content, FA compositions, biodiesel properties, and the contents of ATP and pyridine nucleotide of L. glauca fruits from seven different developing stages. Comprehensive characterization of transcriptome of the developing L. glauca fruit was performed using a combination of two different next-generation sequencing platforms, of which three representative fruit samples (50, 125, and 150 DAF) and one mixed sample from seven developing stages were selected for Illumina and 454 sequencing, respectively. The unigenes separately obtained from long and short reads (201, and 259, respectively, in total) were reconciled using TGICL software, resulting in a total of 60,031 unigenes (mean length = 1061.95 bp) to describe a transcriptome for developing L. glauca fruits. Notably, 198 genes were annotated for photosynthesis, sucrose cleavage, carbon allocation, metabolite transport, acetyl-CoA formation, oil synthesis, and energy metabolism, among which some specific transporters, transcription factors, and enzymes were identified to be implicated in carbon partitioning and energy source for oil synthesis by an integrated analysis of transcriptomic sequencing and qRT-PCR. Importantly, the carbon and energy metabolic model was well established for oil biosynthesis of developing L

  14. Gene Expression Profiles in Paired Gingival Biopsies from Periodontitis-Affected and Healthy Tissues Revealed by Massively Parallel Sequencing

    Science.gov (United States)

    Båge, Tove; Lagervall, Maria; Jansson, Leif; Lundeberg, Joakim; Yucel-Lindberg, Tülay

    2012-01-01

    Periodontitis is a chronic inflammatory disease affecting the soft tissue and bone that surrounds the teeth. Despite extensive research, distinctive genes responsible for the disease have not been identified. The objective of this study was to elucidate transcriptome changes in periodontitis, by investigating gene expression profiles in gingival tissue obtained from periodontitis-affected and healthy gingiva from the same patient, using RNA-sequencing. Gingival biopsies were obtained from a disease-affected and a healthy site from each of 10 individuals diagnosed with periodontitis. Enrichment analysis performed among uniquely expressed genes for the periodontitis-affected and healthy tissues revealed several regulated pathways indicative of inflammation for the periodontitis-affected condition. Hierarchical clustering of the sequenced biopsies demonstrated clustering according to the degree of inflammation, as observed histologically in the biopsies, rather than clustering at the individual level. Among the top 50 upregulated genes in periodontitis-affected tissues, we investigated two genes which have not previously been demonstrated to be involved in periodontitis. These included interferon regulatory factor 4 and chemokine (C-C motif) ligand 18, which were also expressed at the protein level in gingival biopsies from patients with periodontitis. In conclusion, this study provides a first step towards a quantitative comprehensive insight into the transcriptome changes in periodontitis. We demonstrate for the first time site-specific local variation in gene expression profiles of periodontitis-affected and healthy tissues obtained from patients with periodontitis, using RNA-seq. Further, we have identified novel genes expressed in periodontitis tissues, which may constitute potential therapeutic targets for future treatment strategies of periodontitis. PMID:23029519

  15. Identification of the bovine Arachnomelia mutation by massively parallel sequencing implicates sulfite oxidase (SUOX in bone development.

    Directory of Open Access Journals (Sweden)

    Cord Drögemüller

    2010-08-01

    Full Text Available Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development.

  16. Identification of the Bovine Arachnomelia Mutation by Massively Parallel Sequencing Implicates Sulfite Oxidase (SUOX) in Bone Development

    Science.gov (United States)

    Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso

    2010-01-01

    Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119

  17. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  18. Discovery and validation of Barrett's esophagus microRNA transcriptome by next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Ajay Bansal

    Full Text Available Barrett's esophagus (BE is transition from squamous to columnar mucosa as a result of gastroesophageal reflux disease (GERD. The role of microRNA during this transition has not been systematically studied.For initial screening, total RNA from 5 GERD and 6 BE patients was size fractionated. RNA <70 nucleotides was subjected to SOLiD 3 library preparation and next generation sequencing (NGS. Bioinformatics analysis was performed using R package "DEseq". A p value<0.05 adjusted for a false discovery rate of 5% was considered significant. NGS-identified miRNA were validated using qRT-PCR in an independent group of 40 GERD and 27 BE patients. MicroRNA expression of human BE tissues was also compared with three BE cell lines.NGS detected 19.6 million raw reads per sample. 53.1% of filtered reads mapped to miRBase version 18. NGS analysis followed by qRT-PCR validation found 10 differentially expressed miRNA; several are novel (-708-5p, -944, -224-5p and -3065-5p. Up- or down- regulation predicted by NGS was matched by qRT-PCR in every case. Human BE tissues and BE cell lines showed a high degree of concordance (70-80% in miRNA expression. Prediction analysis identified targets that mapped to developmental signaling pathways such as TGFβ and Notch and inflammatory pathways such as toll-like receptor signaling and TGFβ. Cluster analysis found similarly regulated (up or down miRNA to share common targets suggesting coordination between miRNA.Using highly sensitive next-generation sequencing, we have performed a comprehensive genome wide analysis of microRNA in BE and GERD patients. Differentially expressed miRNA between BE and GERD have been further validated. Expression of miRNA between BE human tissues and BE cell lines are highly correlated. These miRNA should be studied in biological models to further understand BE development.

  19. Global Transcriptome Sequencing Identifies Chlamydospore Specific Markers in Candida albicans and Candida dubliniensis

    LENUS (Irish Health Repository)

    Palige, Katja

    2013-04-15

    Candida albicans and Candida dubliniensis are pathogenic fungi that are highly related but differ in virulence and in some phenotypic traits. During in vitro growth on certain nutrient-poor media, C. albicans and C. dubliniensis are the only yeast species which are able to produce chlamydospores, large thick-walled cells of unknown function. Interestingly, only C. dubliniensis forms pseudohyphae with abundant chlamydospores when grown on Staib medium, while C. albicans grows exclusively as a budding yeast. In order to further our understanding of chlamydospore development and assembly, we compared the global transcriptional profile of both species during growth in liquid Staib medium by RNA sequencing. We also included a C. albicans mutant in our study which lacks the morphogenetic transcriptional repressor Nrg1. This strain, which is characterized by its constitutive pseudohyphal growth, specifically produces masses of chlamydospores in Staib medium, similar to C. dubliniensis. This comparative approach identified a set of putatively chlamydospore-related genes. Two of the homologous C. albicans and C. dubliniensis genes (CSP1 and CSP2) which were most strongly upregulated during chlamydospore development were analysed in more detail. By use of the green fluorescent protein as a reporter, the encoded putative cell wall related proteins were found to exclusively localize to C. albicans and C. dubliniensis chlamydospores. Our findings uncover the first chlamydospore specific markers in Candida species and provide novel insights in the complex morphogenetic development of these important fungal pathogens.

  20. Pseudo-random Trees: Multiple Independent Sequence Generators for Parallel and Branching Computations

    Science.gov (United States)

    Halton, John H.

    1989-09-01

    A class of families of linear congruential pseudo-random sequences is defined, for which it is possible to branch at any event without changing the sequence of random numbers used in the original random walk and for which the sequences in different branches show properties analogous to mutual statistical independence. This is a hitherto unavailable, and computationally desirable, tool.

  1. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L. reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    Directory of Open Access Journals (Sweden)

    Chen Jun

    2012-11-01

    Full Text Available Abstract Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10

  2. Characterization of the Burkholderia thailandensis SOS response by using whole-transcriptome shotgun sequencing.

    Science.gov (United States)

    Ulrich, Ricky L; Deshazer, David; Kenny, Tara A; Ulrich, Melanie P; Moravusova, Anna; Opperman, Timothy; Bavari, Sina; Bowlin, Terry L; Moir, Donald T; Panchal, Rekha G

    2013-10-01

    The bacterial SOS response is a well-characterized regulatory network encoded by most prokaryotic bacterial species and is involved in DNA repair. In addition to nucleic acid repair, the SOS response is involved in pathogenicity, stress-induced mutagenesis, and the emergence and dissemination of antibiotic resistance. Using high-throughput sequencing technology (SOLiD RNA-Seq), we analyzed the Burkholderia thailandensis global SOS response to the fluoroquinolone antibiotic, ciprofloxacin (CIP), and the DNA-damaging chemical, mitomycin C (MMC). We demonstrate that a B. thailandensis recA mutant (RU0643) is ∼4-fold more sensitive to CIP in contrast to the parental strain B. thailandensis DW503. Our RNA-Seq results show that CIP and MMC treatment (P SOS response were induced and include lexA, uvrA, dnaE, dinB, recX, and recA. At the genome-wide level, we found an overall decrease in gene expression, especially for genes involved in amino acid and carbohydrate transport and metabolism, following both CIP and MMC exposure. Interestingly, we observed the upregulation of several genes involved in bacterial motility and enhanced transcription of a B. thailandensis genomic island encoding a Siphoviridae bacteriophage designated E264. Using B. thailandensis plaque assays and PCR with B. mallei ATCC 23344 as the host, we demonstrate that CIP and MMC exposure in B. thailandensis DW503 induces the transcription and translation of viable bacteriophage in a RecA-dependent manner. This is the first report of the SOS response in Burkholderia spp. to DNA-damaging agents. We have identified both common and unique adaptive responses of B. thailandensis to chemical stress and DNA damage.

  3. Exploration of the gene fusion landscape of glioblastoma using transcriptome sequencing and copy number data.

    Science.gov (United States)

    Shah, Nameeta; Lankerovich, Michael; Lee, Hwahyung; Yoon, Jae-Geun; Schroeder, Brett; Foltz, Greg

    2013-11-22

    RNA-seq has spurred important gene fusion discoveries in a number of different cancers, including lung, prostate, breast, brain, thyroid and bladder carcinomas. Gene fusion discovery can potentially lead to the development of novel treatments that target the underlying genetic abnormalities. In this study, we provide comprehensive view of gene fusion landscape in 185 glioblastoma multiforme patients from two independent cohorts. Fusions occur in approximately 30-50% of GBM patient samples. In the Ivy Center cohort of 24 patients, 33% of samples harbored fusions that were validated by qPCR and Sanger sequencing. We were able to identify high-confidence gene fusions from RNA-seq data in 53% of the samples in a TCGA cohort of 161 patients. We identified 13 cases (8%) with fusions retaining a tyrosine kinase domain in the TCGA cohort and one case in the Ivy Center cohort. Ours is the first study to describe recurrent fusions involving non-coding genes. Genomic locations 7p11 and 12q14-15 harbor majority of the fusions. Fusions on 7p11 are formed in focally amplified EGFR locus whereas 12q14-15 fusions are formed by complex genomic rearrangements. All the fusions detected in this study can be further visualized and analyzed using our website: http://ivygap.swedish.org/fusions. Our study highlights the prevalence of gene fusions as one of the major genomic abnormalities in GBM. The majority of the fusions are private fusions, and a minority of these recur with low frequency. A small subset of patients with fusions of receptor tyrosine kinases can benefit from existing FDA approved drugs and drugs available in various clinical trials. Due to the low frequency and rarity of clinically relevant fusions, RNA-seq of GBM patient samples will be a vital tool for the identification of patient-specific fusions that can drive personalized therapy.

  4. Deep sequencing whole transcriptome exploration of the σE regulon in Neisseria meningitidis.

    Directory of Open Access Journals (Sweden)

    Robert Antonius Gerhardus Huis in 't Veld

    Full Text Available Bacteria live in an ever-changing environment and must alter protein expression promptly to adapt to these changes and survive. Specific response genes that are regulated by a subset of alternative σ(70-like transcription factors have evolved in order to respond to this changing environment. Recently, we have described the existence of a σ(E regulon including the anti-σ-factor MseR in the obligate human bacterial pathogen Neisseria meningitidis. To unravel the complete σ(E regulon in N. meningitidis, we sequenced total RNA transcriptional content of wild type meningococci and compared it with that of mseR mutant cells (ΔmseR in which σ(E is highly expressed. Eleven coding genes and one non-coding gene were found to be differentially expressed between H44/76 wildtype and H44/76ΔmseR cells. Five of the 6 genes of the σ(E operon, msrA/msrB, and the gene encoding a pepSY-associated TM helix family protein showed enhanced transcription, whilst aniA encoding a nitrite reductase and nspA encoding the vaccine candidate Neisserial surface protein A showed decreased transcription. Analysis of differential expression in IGRs showed enhanced transcription of a non-coding RNA molecule, identifying a σ(E dependent small non-coding RNA. Together this constitutes the first complete exploration of an alternative σ-factor regulon in N. meningitidis. The results direct to a relatively small regulon indicative for a strictly defined response consistent with a relatively stable niche, the human throat, where N. meningitidis resides.

  5. Genome Sequencing and Comparative Transcriptomics of the Model Entomopathogenic Fungi Metarhizium anisopliae and M. acridum

    Science.gov (United States)

    Shang, Yanfang; Duan, Zhibing; Hu, Xiao; Xie, Xue-Qin; Zhou, Gang; Peng, Guoxiong; Luo, Zhibing; Huang, Wei; Wang, Bing; Fang, Weiguo; Wang, Sibao; Zhong, Yi; Ma, Li-Jun; St. Leger, Raymond J.; Zhao, Guo-Ping; Pei, Yan; Feng, Ming-Guang; Xia, Yuxian; Wang, Chengshu

    2011-01-01

    Metarhizium spp. are being used as environmentally friendly alternatives to chemical insecticides, as model systems for studying insect-fungus interactions, and as a resource of genes for biotechnology. We present a comparative analysis of the genome sequences of the broad-spectrum insect pathogen Metarhizium anisopliae and the acridid-specific M. acridum. Whole-genome analyses indicate that the genome structures of these two species are highly syntenic and suggest that the genus Metarhizium evolved from plant endophytes or pathogens. Both M. anisopliae and M. acridum have a strikingly larger proportion of genes encoding secreted proteins than other fungi, while ∼30% of these have no functionally characterized homologs, suggesting hitherto unsuspected interactions between fungal pathogens and insects. The analysis of transposase genes provided evidence of repeat-induced point mutations occurring in M. acridum but not in M. anisopliae. With the help of pathogen-host interaction gene database, ∼16% of Metarhizium genes were identified that are similar to experimentally verified genes involved in pathogenicity in other fungi, particularly plant pathogens. However, relative to M. acridum, M. anisopliae has evolved with many expanded gene families of proteases, chitinases, cytochrome P450s, polyketide synthases, and nonribosomal peptide synthetases for cuticle-degradation, detoxification, and toxin biosynthesis that may facilitate its ability to adapt to heterogenous environments. Transcriptional analysis of both fungi during early infection processes provided further insights into the genes and pathways involved in infectivity and specificity. Of particular note, M. acridum transcribed distinct G-protein coupled receptors on cuticles from locusts (the natural hosts) and cockroaches, whereas M. anisopliae transcribed the same receptor on both hosts. This study will facilitate the identification of virulence genes and the development of improved biocontrol strains

  6. SNP detection from de novo transcriptome sequencing in the bivalve Macoma balthica: marker development for evolutionary studies.

    Directory of Open Access Journals (Sweden)

    Eric Pante

    Full Text Available Hybrid zones are noteworthy systems for the study of environmental adaptation to fast-changing environments, as they constitute reservoirs of polymorphism and are key to the maintenance of biodiversity. They can move in relation to climate fluctuations, as temperature can affect both selection and migration, or remain trapped by environmental and physical barriers. There is therefore a very strong incentive to study the dynamics of hybrid zones subjected to climate variations. The infaunal bivalve Macoma balthica emerges as a noteworthy model species, as divergent lineages hybridize, and its native NE Atlantic range is currently contracting to the North. To investigate the dynamics and functioning of hybrid zones in M. balthica, we developed new molecular markers by sequencing the collective transcriptome of 30 individuals. Ten individuals were pooled for each of the three populations sampled at the margins of two hybrid zones. A single 454 run generated 277 Mb from which 17K SNPs were detected. SNP density averaged 1 polymorphic site every 14 to 19 bases, for mitochondrial and nuclear loci, respectively. An [Formula: see text] scan detected high genetic divergence among several hundred SNPs, some of them involved in energetic metabolism, cellular respiration and physiological stress. The high population differentiation, recorded for nuclear-encoded ATP synthase and NADH dehydrogenase as well as most mitochondrial loci, suggests cytonuclear genetic incompatibilities. Results from this study will help pave the way to a high-resolution study of hybrid zone dynamics in M. balthica, and the relative importance of endogenous and exogenous barriers to gene flow in this system.

  7. Transcriptome analysis of salinity responsiveness in contrasting genotypes of finger millet (Eleusine coracana L.) through RNA-sequencing.

    Science.gov (United States)

    Rahman, Hifzur; Jagadeeshselvam, N; Valarmathi, R; Sachin, B; Sasikala, R; Senthil, N; Sudhakar, D; Robin, S; Muthurajan, Raveendran

    2014-07-01

    Finger millet (Eleusine coracana L.) is a hardy cereal known for its superior level of tolerance against drought, salinity, diseases and its nutritional properties. In this study, attempts were made to unravel the physiological and molecular basis of salinity tolerance in two contrasting finger millet genotypes viz., CO 12 and Trichy 1. Physiological studies revealed that the tolerant genotype Trichy 1 had lower Na(+) to K(+) ratio in leaves and shoots, higher growth rate (osmotic tolerance) and ability to accumulate higher amount of total soluble sugar in leaves under salinity stress. We sequenced the salinity responsive leaf transcriptome of contrasting finger millet genotypes using IonProton platform and generated 27.91 million reads. Mapping and annotation of finger millet transcripts against rice gene models led to the identification of salinity responsive genes and genotype specific responses. Several functional groups of genes like transporters, transcription factors, genes involved in cell signaling, osmotic homeostasis and biosynthesis of compatible solutes were found to be highly up-regulated in the tolerant Trichy 1. Salinity stress inhibited photosynthetic capacity and photosynthesis related genes in the susceptible genotype CO 12. Several genes involved in cell growth and differentiation were found to be up-regulated in both the genotypes but more specifically in tolerant genotype. Genes involved in flavonoid biosynthesis were found to be down-regulated specifically in the salinity tolerant Trichy 1. This study provides a genome-wide transcriptional analysis of two finger millet genotypes differing in their level of salinity tolerance during a gradually progressing salinity stress under greenhouse conditions.

  8. Transcriptome profiling and digital gene expression by deep sequencing in early somatic embryogenesis of endangered medicinal Eleutherococcus senticosus Maxim.

    Science.gov (United States)

    Tao, Lei; Zhao, Yue; Wu, Ying; Wang, Qiuyu; Yuan, Hongmei; Zhao, Lijuan; Guo, Wendong; You, Xiangling

    2016-03-01

    Somatic embryogenesis (SE) has been studied as a model system to understand molecular events in physiology, biochemistry, and cytology during plant embryo development. In particular, it is exceedingly difficult to access the morphological and early regulatory events in zygotic embryos. To understand the molecular mechanisms regulating early SE in Eleutherococcus senticosus Maxim., we used high-throughput RNA-Seq technology to investigate its transcriptome. We obtained 58,327,688 reads, which were assembled into 75,803 unique unigenes. To better understand their functions, the unigenes were annotated using the Clusters of Orthologous Groups, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes databases. Digital gene expression libraries revealed differences in gene expression profiles at different developmental stages (embryogenic callus, yellow embryogenic callus, global embryo). We obtained a sequencing depth of >5.6 million tags per sample and identified many differentially expressed genes at various stages of SE. The initiation of SE affected gene expression in many KEGG pathways, but predominantly that in metabolic pathways, biosynthesis of secondary metabolites, and plant hormone signal transduction. This information on the changes in the multiple pathways related to SE induction in E. senticosus Maxim. embryogenic tissue will contribute to a more comprehensive understanding of the mechanisms involved in early SE. Additionally, the differentially expressed genes may act as molecular markers and could play very important roles in the early stage of SE. The results are a comprehensive molecular biology resource for investigating SE of E. senticosus Maxim. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. High-throughput sequencing of microRNA transcriptome and expression assay in the sturgeon, Acipenser schrenckii.

    Directory of Open Access Journals (Sweden)

    Lihong Yuan

    Full Text Available Sturgeons are considered as living fossils and have very high evolutionary, economical and conservation values. The multiploidy of sturgeon that has been caused by chromosome duplication may lead to the emergence of new microRNAs (miRNAs involved in the ploidy and physiological processes. In the present study, we performed the first sturgeon miRNAs analysis by RNA-seq high-throughput sequencing combined with expression assay of microarray and real-time PCR, and aimed to discover the sturgeon-specific miRNAs, confirm the expressed pattern of miRNAs and illustrate the potential role of miRNAs-targets on sturgeon biological processes. A total of 103 miRNAs were identified, including 58 miRNAs with strongly detected signals (signal >500 and P≤0.01, which were detected by microarray. Real-time PCR assay supported the expression pattern obtained by microarray. Moreover, co-expression of 21 miRNAs in all five tissues and tissue-specific expression of 16 miRNAs implied the crucial and particular function of them in sturgeon physiological processes. Target gene prediction, especially the enriched functional gene groups (369 GO terms and pathways (37 KEGG regulated by 58 miRNAs (P<0.05, illustrated the interaction of miRNAs and putative mRNAs, and also the potential mechanism involved in these biological processes. Our new findings of sturgeon miRNAs expand the public database of transcriptome information for this species, contribute to our understanding of sturgeon biology, and also provide invaluable data that may be applied in sturgeon breeding.

  10. High-throughput sequencing of microRNA transcriptome and expression assay in the sturgeon, Acipenser schrenckii.

    Science.gov (United States)

    Yuan, Lihong; Zhang, Xiujuan; Li, Linmiao; Jiang, Haiying; Chen, Jinping

    2014-01-01

    Sturgeons are considered as living fossils and have very high evolutionary, economical and conservation values. The multiploidy of sturgeon that has been caused by chromosome duplication may lead to the emergence of new microRNAs (miRNAs) involved in the ploidy and physiological processes. In the present study, we performed the first sturgeon miRNAs analysis by RNA-seq high-throughput sequencing combined with expression assay of microarray and real-time PCR, and aimed to discover the sturgeon-specific miRNAs, confirm the expressed pattern of miRNAs and illustrate the potential role of miRNAs-targets on sturgeon biological processes. A total of 103 miRNAs were identified, including 58 miRNAs with strongly detected signals (signal >500 and P≤0.01), which were detected by microarray. Real-time PCR assay supported the expression pattern obtained by microarray. Moreover, co-expression of 21 miRNAs in all five tissues and tissue-specific expression of 16 miRNAs implied the crucial and particular function of them in sturgeon physiological processes. Target gene prediction, especially the enriched functional gene groups (369 GO terms) and pathways (37 KEGG) regulated by 58 miRNAs (P<0.05), illustrated the interaction of miRNAs and putative mRNAs, and also the potential mechanism involved in these biological processes. Our new findings of sturgeon miRNAs expand the public database of transcriptome information for this species, contribute to our understanding of sturgeon biology, and also provide invaluable data that may be applied in sturgeon breeding.

  11. Transcriptome Profiling Using Single-Molecule Direct RNA Sequencing Approach for In-depth Understanding of Genes in Secondary Metabolism Pathways of Camellia sinensis

    Directory of Open Access Journals (Sweden)

    Qingshan Xu

    2017-07-01

    Full Text Available Characteristic secondary metabolites, including flavonoids, theanine and caffeine, are important components of Camellia sinensis, and their biosynthesis has attracted widespread interest. Previous studies on the biosynthesis of these major secondary metabolites using next-generation sequencing technologies limited the accurately prediction of full-length (FL splice isoforms. Herein, we applied single-molecule sequencing to pooled tea plant tissues, to provide a more complete transcriptome of C. sinensis. Moreover, we identified 94 FL transcripts and four alternative splicing events for enzyme-coding genes involved in the biosynthesis of flavonoids, theanine and caffeine. According to the comparison between long-read isoforms and assemble transcripts, we improved the quality and accuracy of genes sequenced by short-read next-generation sequencing technology. The resulting FL transcripts, together with the improved assembled transcripts and identified alternative splicing events, enhance our understanding of genes involved in the biosynthesis of characteristic secondary metabolites in C. sinensis.

  12. Transcriptome Analysis of Ceriops tagal in Saline Environments Using RNA-Sequencing.

    Directory of Open Access Journals (Sweden)

    Xiaorong Xiao

    Full Text Available Identification of genes involved in mangrove species' adaptation to salt stress can provide valuable information for developing salt-tolerant crops and understanding the molecular evolution of salt tolerance in halophiles. Ceriops tagal is a salt-tolerant mangrove tree growing in mudflats and marshes in tropical and subtropical areas, without any prior genome information. In this study, we assessed the biochemical and transcriptional responses of C. tagal to high salt treatment (500 mmol/L NaCl by hydroponic experiments and RNA-seq. In C. tagal root tissues under salt stress, proline accumulated strongly from 3 to 12 h of treatment; meanwhile, malondialdehyde content progressively increased from 0 to 9 h, then dropped to lower than control levels by 24 h. These implied that C. tagal plants could survive salt stress through biochemical modification. Using the Illumina sequencing platform, approximately 27.39 million RNA-seq reads were obtained from three salt-treated and control (untreated root samples. These reads were assembled into 47,111 transcripts with an average length of 514 bp and an N50 of 632 bp. Approximately 78% of the transcripts were annotated, and a total of 437 genes were putative transcription factors. Digital gene expression analysis was conducted by comparing transcripts from the untreated control to the three salt treated samples, and 7,330 differentially expressed transcripts were identified. Using k-means clustering, these transcripts were divided into six clusters that differed in their expression patterns across four treatment time points. The genes identified as being up- or downregulated are involved in salt stress responses, signal transduction, and DNA repair. Our study shows the main adaptive pathway of C. tagal in saline environments, under short-term and long-term treatments of salt stress. This provides vital clues as to which genes may be candidates for breeding salt-tolerant crops and clarifying molecular

  13. De novo transcriptome sequencing of the Octopus vulgaris hemocytes using Illumina RNA-Seq technology: response to the infection by the gastrointestinal parasite Aggregata octopiana.

    Science.gov (United States)

    Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino

    2014-01-01

    Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus' well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers for octopus resistance against

  14. Transcriptome Sequencing and Differential Gene Expression Analysis of Delayed Gland Morphogenesis in Gossypium australe during Seed Germination

    Science.gov (United States)

    Tao, Tao; Zhao, Liang; Lv, Yuanda; Chen, Jiedan; Hu, Yan; Zhang, Tianzhen; Zhou, Baoliang

    2013-01-01

    The genus Gossypium is a globally important crop that is used to produce textiles, oil and protein. However, gossypol, which is found in cultivated cottonseed, is toxic to humans and non-ruminant animals. Efforts have been made to breed improved cultivated cotton with lower gossypol content. The delayed gland morphogenesis trait possessed by some Australian wild cotton species may enable the widespread, direct usage of cottonseed. However, the mechanisms about the delayed gland morphogenesis are still unknown. Here, we sequenced the first Australian wild cotton species ( Gossypium australe ) and a diploid cotton species ( Gossypium arboreum ) using the Illumina Hiseq 2000 RNA-seq platform to help elucidate the mechanisms underlying gossypol synthesis and gland development. Paired-end Illumina short reads were de novo assembled into 226,184, 213,257 and 275,434 transcripts, clustering into 61,048, 47,908 and 72,985 individual clusters with N50 lengths of 1,710 bp, 1544 BP and 1,743 bp, respectively. The clustered Unigenes were searched against three public protein databases (TrEMBL, SwissProt and RefSeq) and the nucleotide and protein sequences of Gossypium raimondii using BLASTx and BLASTn. A total of 21,987, 17,209 and 25,325 Unigenes were annotated. Of these, 18,766 (85.4%), 14,552 (84.6%) and 21,374 (84.4%) Unigenes could be assigned to GO-term classifications. We identified and analyzed 13,884 differentially expressed Unigenes by clustering and functional enrichment. Terpenoid-related biosynthesis pathways showed differentially regulated expression patterns between the two cotton species. Phylogenetic analysis of the terpene synthases family was also carried out to clarify the classifications of TPSs. RNA-seq data from two distinct cotton species provide comprehensive transcriptome annotation resources and global gene expression profiles during seed germination and gland and gossypol formation. These data may be used to further elucidate various mechanisms and

  15. Transcriptome Sequencing Analysis and Functional Identification of Sex Differentiation Genes from the Mosquito Parasitic Nematode, Romanomermis wuchangensis.

    Directory of Open Access Journals (Sweden)

    Mingyue Duan

    Full Text Available Mosquito-transmitted diseases like malaria and dengue fever are global problem and an estimated 50-100 million of dengue or dengue hemorrhagic fever cases are reported worldwide every year. The mermithid nematode Romanomermis wuchangensis has been successfully used as an ecosystem-friendly biocontrol agent for mosquito prevention in laboratory studies. However, this nematode can not undergo sex differentiation in vitro culture, which has seriously affected their application of biocontrol in the field. In this study, based on transcriptome sequencing analysis of R. wuchangensis, Rwucmab-3, Rwuclaf-1 and Rwuctra-2 were cloned and used to investigate molecular regulatory function of sex differentiation. qRT-PCR results demonstrated that the expression level of Rwucmab-3 between male and female displayed obvious difference on the 3rd day of parasitic stage, which was earlier than Rwuclaf-1 and Rwuctra-2, highlighting sex differentiation process may start on the 3rd day of parasitic stage. Besides, FITC was used as a marker to test dsRNA uptake efficiency of R. wuchangensis, which fluorescence intensity increased with FITC concentration after 16 h incubation, indicating this nematode can successfully ingest soaking solution via its cuticle. RNAi results revealed the sex ratio of R. wuchangensis from RNAi treated groups soaked in dsRNA of Rwucmab-3 was significantly higher than gfp dsRNA treated groups and control groups, highlighting RNAi of Rwumab-3 may hinder the development of male nematodes. These results suggest that Rwucmab-3 mainly involves in the initiation of sex differentiation and the development of male sexual dimorphism. Rwuclaf-1 and Rwuctra-2 may play vital role in nematode reproductive and developmental system. In conclusion, transcript sequences presented in this study could provide more bioinformatics resources for future studies on gene cloning and other molecular regulatory mechanism in R. wuchangensis. Moreover, identification

  16. High-throughput sequencing of small RNA transcriptome reveals salt stress regulated microRNAs in sugarcane.

    Directory of Open Access Journals (Sweden)

    Mariana Carnavale Bottino

    Full Text Available Salt stress is a primary cause of crop losses worldwide, and it has been the subject of intense investigation to unravel the complex mechanisms responsible for salinity tolerance. MicroRNA is implicated in many developmental processes and in responses to various abiotic stresses, playing pivotal roles in plant adaptation. Deep sequencing technology was chosen to determine the small RNA transcriptome of Saccharum sp cultivars grown on saline conditions. We constructed four small RNAs libraries prepared from plants grown on hydroponic culture submitted to 170 mM NaCl and harvested after 1 h, 6 hs and 24 hs. Each library was sequenced individually and together generated more than 50 million short reads. Ninety-eight conserved miRNAs and 33 miRNAs* were identified by bioinformatics. Several of the microRNA showed considerable differences of expression in the four libraries. To confirm the results of the bioinformatics-based analysis, we studied the expression of the 10 most abundant miRNAs and 1 miRNA* in plants treated with 170 mM NaCl and in plants with a severe treatment of 340 mM NaCl. The results showed that 11 selected miRNAs had higher expression in samples treated with severe salt treatment compared to the mild one. We also investigated the regulation of the same miRNAs in shoots of four cultivars grown on soil treated with 170 mM NaCl. Cultivars could be grouped according to miRNAs expression in response to salt stress. Furthermore, the majority of the predicted target genes had an inverse regulation with their correspondent microRNAs. The targets encode a wide range of proteins, including transcription factors, metabolic enzymes and genes involved in hormone signaling, probably assisting the plants to develop tolerance to salinity. Our work provides insights into the regulatory functions of miRNAs, thereby expanding our knowledge on potential salt-stressed regulated genes.

  17. Deep mRNA sequencing of the Tritonia diomedea brain transcriptome provides access to gene homologues for neuronal excitability, synaptic transmission and peptidergic signalling.

    Directory of Open Access Journals (Sweden)

    Adriano Senatore

    Full Text Available The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia, has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level.We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes. BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA.Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain.

  18. Transcriptome Sequence and Plasmid Copy Number Analysis of the Brewery Isolate Pediococcus claussenii ATCC BAA-344T during Growth in Beer

    Science.gov (United States)

    Pittet, Vanessa; Phister, Trevor G.; Ziola, Barry

    2013-01-01

    Growth of specific lactic acid bacteria in beer leads to spoiled product and economic loss for the brewing industry. Microbial growth is typically inhibited by the combined stresses found in beer (e.g., ethanol, hops, low pH, minimal nutrients); however, certain bacteria have adapted to grow in this harsh environment. Considering little is known about the mechanisms used by bacteria to grow in and spoil beer, transcriptome sequencing was performed on a variant of the beer-spoilage organism Pediococcus claussenii ATCC BAA-344T (Pc344-358). Illumina sequencing was used to compare the transcript levels in Pc344-358 growing mid-exponentially in beer to those in nutrient-rich MRS broth. Various operons demonstrated high gene expression in beer, several of which are involved in nutrient acquisition and overcoming the inhibitory effects of hop compounds. As well, genes functioning in cell membrane modification and biosynthesis demonstrated significantly higher transcript levels in Pc344-358 growing in beer. Three plasmids had the majority of their genes showing increased transcript levels in beer, whereas the two cryptic plasmids showed slightly decreased gene expression. Follow-up analysis of plasmid copy number in both growth environments revealed similar trends, where more copies of the three non-cryptic plasmids were found in Pc344-358 growing in beer. Transcriptome sequencing also enabled the addition of several genes to the P . claussenii ATCC BAA-344T genome annotation, some of which are putatively transcribed as non-coding RNAs. The sequencing results not only provide the first transcriptome description of a beer-spoilage organism while growing in beer, but they also highlight several targets for future exploration, including genes that may have a role in the general stress response of lactic acid bacteria. PMID:24040005

  19. Extended RF shimming: Sequence-level parallel transmission optimization applied to steady-state free precession MRI of the heart.

    Science.gov (United States)

    Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J

    2017-06-01

    Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.

  20. De novo assembly of human genomes with massively parallel short read sequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue

    2010-01-01

    genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....

  1. A REVISED PARALLEL-SEQUENCE MORPHOLOGICAL CLASSIFICATION OF GALAXIES: STRUCTURE AND FORMATION OF S0 AND SPHEROIDAL GALAXIES

    International Nuclear Information System (INIS)

    Kormendy, John; Bender, Ralf

    2012-01-01

    We update van den Bergh's parallel-sequence galaxy classification in which S0 galaxies form a sequence S0a-S0b-S0c that parallels the sequence Sa-Sb-Sc of spiral galaxies. The ratio B/T of bulge-to-total light defines the position of a galaxy in this tuning-fork diagram. Our classification makes one major improvement. We extend the S0a-S0b-S0c sequence to spheroidal ('Sph') galaxies that are positioned in parallel to irregular galaxies in a similarly extended Sa-Sb-Sc-Im sequence. This provides a natural 'home' for spheroidals, which previously were omitted from galaxy classification schemes or inappropriately combined with ellipticals. To motivate our juxtaposition of Sph and Im galaxies, we present photometry and bulge-disk decompositions of four rare, late-type S0s that bridge the gap between the more common S0b and Sph galaxies. NGC 4762 is an edge-on SB0bc galaxy with a very small classical-bulge-to-total ratio of B/T = 0.13 ± 0.02. NGC 4452 is an edge-on SB0 galaxy with an even tinier pseudobulge-to-total ratio of PB/T = 0.017 ± 0.004. It is therefore an SB0c. VCC 2048, whose published classification is S0, contains an edge-on disk, but its 'bulge' plots in the structural parameter sequence of spheroidals. It is therefore a disky Sph. And NGC 4638 is similarly a 'missing link' between S0s and Sphs—it has a tiny bulge and an edge-on disk embedded in an Sph halo. In the Appendix, we present photometry and bulge-disk decompositions of all Hubble Space Telescope Advanced Camera for Surveys Virgo Cluster Survey S0s that do not have published decompositions. We use these data to update the structural parameter correlations of Sph, S+Im, and E galaxies. We show that Sph galaxies of increasing luminosity form a continuous sequence with the disks (but not bulges) of S0c-S0b-S0a galaxies. Remarkably, the Sph-S0-disk sequence is almost identical to that of Im galaxies and spiral galaxy disks. We review published observations for galaxy transformation processes

  2. A Revised Parallel-sequence Morphological Classification of Galaxies: Structure and Formation of S0 and Spheroidal Galaxies

    Science.gov (United States)

    Kormendy, John; Bender, Ralf

    2012-01-01

    We update van den Bergh's parallel-sequence galaxy classification in which S0 galaxies form a sequence S0a-S0b-S0c that parallels the sequence Sa-Sb-Sc of spiral galaxies. The ratio B/T of bulge-to-total light defines the position of a galaxy in this tuning-fork diagram. Our classification makes one major improvement. We extend the S0a-S0b-S0c sequence to spheroidal ("Sph") galaxies that are positioned in parallel to irregular galaxies in a similarly extended Sa-Sb-Sc-Im sequence. This provides a natural "home" for spheroidals, which previously were omitted from galaxy classification schemes or inappropriately combined with ellipticals. To motivate our juxtaposition of Sph and Im galaxies, we present photometry and bulge-disk decompositions of four rare, late-type S0s that bridge the gap between the more common S0b and Sph galaxies. NGC 4762 is an edge-on SB0bc galaxy with a very small classical-bulge-to-total ratio of B/T = 0.13 ± 0.02. NGC 4452 is an edge-on SB0 galaxy with an even tinier pseudobulge-to-total ratio of PB/T = 0.017 ± 0.004. It is therefore an SB0c. VCC 2048, whose published classification is S0, contains an edge-on disk, but its "bulge" plots in the structural parameter sequence of spheroidals. It is therefore a disky Sph. And NGC 4638 is similarly a "missing link" between S0s and Sphs—it has a tiny bulge and an edge-on disk embedded in an Sph halo. In the Appendix, we present photometry and bulge-disk decompositions of all Hubble Space Telescope Advanced Camera for Surveys Virgo Cluster Survey S0s that do not have published decompositions. We use these data to update the structural parameter correlations of Sph, S+Im, and E galaxies. We show that Sph galaxies of increasing luminosity form a continuous sequence with the disks (but not bulges) of S0c-S0b-S0a galaxies. Remarkably, the Sph-S0-disk sequence is almost identical to that of Im galaxies and spiral galaxy disks. We review published observations for galaxy transformation processes

  3. A Targeted Enrichment Strategy for Massively Parallel Sequencing of Angiosperm Plastid Genomes

    Directory of Open Access Journals (Sweden)

    Gregory W. Stull

    2013-02-01

    Full Text Available Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms. Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots, which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×, even for the two monocots. Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving 50× mean coverage. However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96 available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms.

  4. Optimal control design of turbo spin-echo sequences with applications to parallel-transmit systems

    NARCIS (Netherlands)

    Sbrizzi, Alessandro; Hoogduin, Hans; Hajnal, Joseph V; van den Berg, CAT; Luijten, Peter R; Malik, Shaihan J

    PURPOSE: The design of turbo spin-echo sequences is modeled as a dynamic optimization problem which includes the case of inhomogeneous transmit radiofrequency fields. This problem is efficiently solved by optimal control techniques making it possible to design patient-specific sequences online.

  5. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Directory of Open Access Journals (Sweden)

    Bibby Kyle

    2011-03-01

    Full Text Available Abstract Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology (KO identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  6. Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

    Science.gov (United States)

    Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

    2017-10-09

    Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

  7. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    Directory of Open Access Journals (Sweden)

    Ching-Ping Lin

    Full Text Available We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911. In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum, two monocots (Oryza sativa and Zea mays, two gymnosperms (Pinus taeda and Ginkgo biloba and one moss (Physcomitrella patens. Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants.

  8. Differential transcriptome profiling of chilling stress response between shoots and rhizomes of Oryza longistaminata using RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Ting Zhang

    Full Text Available Rice (Oryza sativa is very sensitive to chilling stress at seedling and reproductive stages, whereas wild rice, O. longistaminata, tolerates non-freezing cold temperatures and has overwintering ability. Elucidating the molecular mechanisms of chilling tolerance (CT in O. longistaminata should thus provide a basis for rice CT improvement through molecular breeding. In this study, high-throughput RNA sequencing was performed to profile global transcriptome alterations and crucial genes involved in response to long-term low temperature in O. longistaminata shoots and rhizomes subjected to 7 days of chilling stress. A total of 605 and 403 genes were respectively identified as up- and down-regulated in O. longistaminata under 7 days of chilling stress, with 354 and 371 differentially expressed genes (DEGs found exclusively in shoots and rhizomes, respectively. GO enrichment and KEGG pathway analyses revealed that multiple transcriptional regulatory pathways were enriched in commonly induced genes in both tissues; in contrast, only the photosynthesis pathway was prevalent in genes uniquely induced in shoots, whereas several key metabolic pathways and the programmed cell death process were enriched in genes induced only in rhizomes. Further analysis of these tissue-specific DEGs showed that the CBF/DREB1 regulon and other transcription factors (TFs, including AP2/EREBPs, MYBs, and WRKYs, were synergistically involved in transcriptional regulation of chilling stress response in shoots. Different sets of TFs, such as OsERF922, OsNAC9, OsWRKY25, and WRKY74, and eight genes encoding antioxidant enzymes were exclusively activated in rhizomes under long-term low-temperature treatment. Furthermore, several cis-regulatory elements, including the ICE1-binding site, the GATA element for phytochrome regulation, and the W-box for WRKY binding, were highly abundant in both tissues, confirming the involvement of multiple regulatory genes and complex networks in the

  9. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery.

    Directory of Open Access Journals (Sweden)

    Bharat Bhusan Patnaik

    Full Text Available The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae, is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction.The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2 were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs were identified from 61,141 unigenes (size of >1 kb with the most abundant being dinucleotide repeats.This dataset represents the first transcriptome analysis of the endangered mollusc, C. plicata

  10. A bumpy ride on the diagnostic bench of massive parallel sequencing, the case of the mitochondrial genome.

    Directory of Open Access Journals (Sweden)

    Kim Vancampenhout

    Full Text Available The advent of massive parallel sequencing (MPS has revolutionized the field of human molecular genetics, including the diagnostic study of mitochondrial (mt DNA dysfunction. The analysis of the complete mitochondrial genome using MPS platforms is now common and will soon outrun conventional sequencing. However, the development of a robust and reliable protocol is rather challenging. A previous pilot study for the re-sequencing of human mtDNA revealed an uneven coverage, affecting predominantly part of the plus strand. In an attempt to address this problem, we undertook a comparative study of standard and modified protocols for the Ion Torrent PGM system. We could not improve strand representation by altering the recommended shearing methodology of the standard workflow or omitting the DNA polymerase amplification step from the library construction process. However, we were able to associate coverage bias of the plus strand with a specific sequence motif. Additionally, we compared coverage and variant calling across technologies. The same samples were also sequenced on a MiSeq device which showed that coverage and heteroplasmic variant calling were much improved.

  11. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    Directory of Open Access Journals (Sweden)

    Chandrasekhar Natarajan

    2015-12-01

    Full Text Available A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages, and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization. In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  12. New strategy for eliminating zero-sequence circulating current between parallel operating three-level NPC voltage source inverters

    DEFF Research Database (Denmark)

    Li, Kai; Dong, Zhenhua; Wang, Xiaodong

    2018-01-01

    A novel strategy based on a zero common mode voltage pulse-width modulation (ZCMV-PWM) technique and zero-sequence circulating current (ZSCC) feedback control is proposed in this study to eliminate ZSCCs between three-level neutral point clamped (NPC) voltage source inverters, with common AC and DC......, the ZCMV-PWM method is presented to reduce CMVs, and a simple electric circuit is adopted to control ZSCCs and neutral point potential. Finally, simulation and experiment are conducted to illustrate effectiveness of the proposed strategy. Results show that ZSCCs between paralleled inverters can...

  13. FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.

    Science.gov (United States)

    Hoogenboom, Jerry; van der Gaag, Kristiaan J; de Leeuw, Rick H; Sijen, Titia; de Knijff, Peter; Laros, Jeroen F J

    2017-03-01

    Massively parallel sequencing (MPS) is on the advent of a broad scale application in forensic research and casework. The improved capabilities to analyse evidentiary traces representing unbalanced mixtures is often mentioned as one of the major advantages of this technique. However, most of the available software packages that analyse forensic short tandem repeat (STR) sequencing data are not well suited for high throughput analysis of such mixed traces. The largest challenge is the presence of stutter artefacts in STR amplifications, which are not readily discerned from minor contributions. FDSTools is an open-source software solution developed for this purpose. The level of stutter formation is influenced by various aspects of the sequence, such as the length of the longest uninterrupted stretch occurring in an STR. When MPS is used, STRs are evaluated as sequence variants that each have particular stutter characteristics which can be precisely determined. FDSTools uses a database of reference samples to determine stutter and other systemic PCR or sequencing artefacts for each individual allele. In addition, stutter models are created for each repeating element in order to predict stutter artefacts for alleles that are not included in the reference set. This information is subsequently used to recognise and compensate for the noise in a sequence profile. The result is a better representation of the true composition of a sample. Using Promega Powerseq™ Auto System data from 450 reference samples and 31 two-person mixtures, we show that the FDSTools correction module decreases stutter ratios above 20% to below 3%. Consequently, much lower levels of contributions in the mixed traces are detected. FDSTools contains modules to visualise the data in an interactive format allowing users to filter data with their own preferred thresholds. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  14. Massively Parallel RNA Sequencing Identifies a Complex Immune Gene Repertoire in the lophotrochozoan Mytilus edulis

    Science.gov (United States)

    Philipp, Eva E. R.; Kraemer, Lars; Melzner, Frank; Poustka, Albert J.; Thieme, Sebastian; Findeisen, Ulrike; Schreiber, Stefan; Rosenstiel, Philip

    2012-01-01

    The marine mussel Mytilus edulis and its closely related sister species are distributed world-wide and play an important role in coastal ecology and economy. The diversification in different species and their hybrids, broad ecological distribution, as well as the filter feeding mode of life has made this genus an attractive model to investigate physiological and molecular adaptations and responses to various biotic and abiotic environmental factors. In the present study we investigated the immune system of Mytilus, which may contribute to the ecological plasticity of this species. We generated a large Mytilus transcriptome database from different tissues of immune challenged and stress treated individuals from the Baltic Sea using 454 pyrosequencing. Phylogenetic comparison of orthologous groups of 23 species demonstrated the basal position of lophotrochozoans within protostomes. The investigation of immune related transcripts revealed a complex repertoire of innate recognition receptors and downstream pathway members including transcripts for 27 toll-like receptors and 524 C1q domain containing transcripts. NOD-like receptors on the other hand were absent. We also found evidence for sophisticated TNF, autophagy and apoptosis systems as well as for cytokines. Gill tissue and hemocytes showed highest expression of putative immune related contigs and are promising tissues for further functional studies. Our results partly contrast with findings of a less complex immune repertoire in ecdysozoan and other lophotrochozoan protostomes. We show that bivalves are interesting candidates to investigate the evolution of the immune system from basal metazoans to deuterostomes and protostomes and provide a basis for future molecular work directed to immune system functioning in Mytilus. PMID:22448234

  15. CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

    Science.gov (United States)

    Jiang, Hanyu; Ganesan, Narayan

    2016-02-27

    HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other

  16. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    Science.gov (United States)

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast

  17. Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

    Directory of Open Access Journals (Sweden)

    Haznedaroglu Berat Z

    2012-07-01

    Full Text Available Abstract Background The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG ortholog identifiers (KOIs, and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63. For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs in the assemblies of individual k-mers (k-19 to k-63 that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions This study demonstrated that different k-mer choices result in various quantities

  18. TIA: algorithms for development of identity-linked SNP islands for analysis by massively parallel DNA sequencing.

    Science.gov (United States)

    Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David

    2018-04-11

    Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions

  19. Tumor Genomic Profiling in Breast Cancer Patients Using Targeted Massively Parallel Sequencing

    Science.gov (United States)

    2016-03-01

    2015 “Cancer Care as a Model for Precision Medicine” MIT Collaborative Series Massachusetts Institute of Technology Invited Talk 2016 “Cancer...Precision Medicine” MIT -CHIEF Series Massachusetts Institute of Technology Invited Talk National 2013 “CanSeq: The Use of Whole Exome Sequencing To...Pennsylvania Philadelphia, PA Invited Talk 2014 “Clinical Genomics and Precision Cancer Medicine” Center for Molecular Oncology Memorial Sloan

  20. A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera, Demospongiae, Poecilosclerida): transcriptome sequencing, characterization and overview of the gene expression along three life cycle stages.

    Science.gov (United States)

    Pérez-Porro, A R; Navarro-Gómez, D; Uriz, M J; Giribet, G

    2013-05-01

    Sponges can be dominant organisms in many marine and freshwater habitats where they play essential ecological roles. They also represent a key group to address important questions in early metazoan evolution. Recent approaches for improving knowledge on sponge biological and ecological functions as well as on animal evolution have focused on the genetic toolkits involved in ecological responses to environmental changes (biotic and abiotic), development and reproduction. These approaches are possible thanks to newly available, massive sequencing technologies-such as the Illumina platform, which facilitate genome and transcriptome sequencing in a cost-effective manner. Here we present the first NGS (next-generation sequencing) approach to understanding the life cycle of an encrusting marine sponge. For this we sequenced libraries of three different life cycle stages of the Mediterranean sponge Crella elegans and generated de novo transcriptome assemblies. Three assemblies were based on sponge tissue of a particular life cycle stage, including non-reproductive tissue, tissue with sperm cysts and tissue with larvae. The fourth assembly pooled the data from all three stages. By aggregating data from all the different life cycle stages we obtained a higher total number of contigs, contigs with blast hit and annotated contigs than from one stage-based assemblies. In that multi-stage assembly we obtained a larger number of the developmental regulatory genes known for metazoans than in any other assembly. We also advance the differential expression of selected genes in the three life cycle stages to explore the potential of RNA-seq for improving knowledge on functional processes along the sponge life cycle. © 2013 Blackwell Publishing Ltd.

  1. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    Science.gov (United States)

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be

  2. Simultaneous sequencing of coding and noncoding RNA reveals a human transcriptome dominated by a small number of highly expressed noncoding genes.

    Science.gov (United States)

    Boivin, Vincent; Deschamps-Francoeur, Gabrielle; Couture, Sonia; Nottingham, Ryan M; Bouchard-Bourelle, Philia; Lambowitz, Alan M; Scott, Michelle S; Abou-Elela, Sherif

    2018-07-01

    Comparing the abundance of one RNA molecule to another is crucial for understanding cellular functions but most sequencing techniques can target only specific subsets of RNA. In this study, we used a new fragmented ribodepleted TGIRT sequencing method that uses a thermostable group II intron reverse transcriptase (TGIRT) to generate a portrait of the human transcriptome depicting the quantitative relationship of all classes of nonribosomal RNA longer than 60 nt. Comparison between different sequencing methods indicated that FRT is more accurate in ranking both mRNA and noncoding RNA than viral reverse transcriptase-based sequencing methods, even those that specifically target these species. Measurements of RNA abundance in different cell lines using this method correlate with biochemical estimates, confirming tRNA as the most abundant nonribosomal RNA biotype. However, the single most abundant transcript is 7SL RNA, a component of the signal recognition particle. S tructured n on c oding RNAs (sncRNAs) associated with the same biological process are expressed at similar levels, with the exception of RNAs with multiple functions like U1 snRNA. In general, sncRNAs forming RNPs are hundreds to thousands of times more abundant than their mRNA counterparts. Surprisingly, only 50 sncRNA genes produce half of the non-rRNA transcripts detected in two different cell lines. Together the results indicate that the human transcriptome is dominated by a small number of highly expressed sncRNAs specializing in functions related to translation and splicing. © 2018 Boivin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  3. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-02-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  4. High-throughput sequencing and pathway analysis reveal alteration of the pituitary transcriptome by 17α-ethynylestradiol (EE2) in female coho salmon, Oncorhynchus kisutch

    Energy Technology Data Exchange (ETDEWEB)

    Harding, Louisa B. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Schultz, Irvin R. [Battelle, Marine Sciences Laboratory – Pacific Northwest National Laboratory, 1529 West Sequim Bay Road, Sequim, WA 98382 (United States); Goetz, Giles W. [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Luckenbach, J. Adam [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Young, Graham [School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98195 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States); Goetz, Frederick W. [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Manchester Research Station, P.O. Box 130, Manchester, WA 98353 (United States); Swanson, Penny, E-mail: penny.swanson@noaa.gov [Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, WA 98112 (United States); Center for Reproductive Biology, Washington State University, Pullman, WA 98164 (United States)

    2013-10-15

    Highlights: •Studied impacts of ethynylestradiol (EE2) exposure on salmon pituitary transcriptome. •High-throughput sequencing, RNAseq, and pathway analysis were performed. •EE2 altered mRNAs for genes in circadian rhythm, GnRH, and TGFβ signaling pathways. •LH and FSH beta subunit mRNAs were most highly up- and down-regulated by EE2, respectively. •Estrogens may alter processes associated with reproductive timing in salmon. -- Abstract: Considerable research has been done on the effects of endocrine disrupting chemicals (EDCs) on reproduction and gene expression in the brain, liver and gonads of teleost fish, but information on impacts to the pituitary gland are still limited despite its central role in regulating reproduction. The aim of this study was to further our understanding of the potential effects of natural and synthetic estrogens on the brain–pituitary–gonad axis in fish by determining the effects of 17α-ethynylestradiol (EE2) on the pituitary transcriptome. We exposed sub-adult coho salmon (Oncorhynchus kisutch) to 0 or 12 ng EE2/L for up to 6 weeks and effects on the pituitary transcriptome of females were assessed using high-throughput Illumina{sup ®} sequencing, RNA-Seq and pathway analysis. After 1 or 6 weeks, 218 and 670 contiguous sequences (contigs) respectively, were differentially expressed in pituitaries of EE2-exposed fish relative to control. Two of the most highly up- and down-regulated contigs were luteinizing hormone β subunit (241-fold and 395-fold at 1 and 6 weeks, respectively) and follicle-stimulating hormone β subunit (−3.4-fold at 6 weeks). Additional contigs related to gonadotropin synthesis and release were differentially expressed in EE2-exposed fish relative to controls. These included contigs involved in gonadotropin releasing hormone (GNRH) and transforming growth factor-β signaling. There was an over-representation of significantly affected contigs in 33 and 18 canonical pathways at 1 and 6 weeks

  5. High-throughput sequencing and pathway analysis reveal alteration of the pituitary transcriptome by 17α-ethynylestradiol (EE2) in female coho salmon, Oncorhynchus kisutch

    International Nuclear Information System (INIS)

    Harding, Louisa B.; Schultz, Irvin R.; Goetz, Giles W.; Luckenbach, J. Adam; Young, Graham; Goetz, Frederick W.; Swanson, Penny

    2013-01-01

    Highlights: •Studied impacts of ethynylestradiol (EE2) exposure on salmon pituitary transcriptome. •High-throughput sequencing, RNAseq, and pathway analysis were performed. •EE2 altered mRNAs for genes in circadian rhythm, GnRH, and TGFβ signaling pathways. •LH and FSH beta subunit mRNAs were most highly up- and down-regulated by EE2, respectively. •Estrogens may alter processes associated with reproductive timing in salmon. -- Abstract: Considerable research has been done on the effects of endocrine disrupting chemicals (EDCs) on reproduction and gene expression in the brain, liver and gonads of teleost fish, but information on impacts to the pituitary gland are still limited despite its central role in regulating reproduction. The aim of this study was to further our understanding of the potential effects of natural and synthetic estrogens on the brain–pituitary–gonad axis in fish by determining the effects of 17α-ethynylestradiol (EE2) on the pituitary transcriptome. We exposed sub-adult coho salmon (Oncorhynchus kisutch) to 0 or 12 ng EE2/L for up to 6 weeks and effects on the pituitary transcriptome of females were assessed using high-throughput Illumina ® sequencing, RNA-Seq and pathway analysis. After 1 or 6 weeks, 218 and 670 contiguous sequences (contigs) respectively, were differentially expressed in pituitaries of EE2-exposed fish relative to control. Two of the most highly up- and down-regulated contigs were luteinizing hormone β subunit (241-fold and 395-fold at 1 and 6 weeks, respectively) and follicle-stimulating hormone β subunit (−3.4-fold at 6 weeks). Additional contigs related to gonadotropin synthesis and release were differentially expressed in EE2-exposed fish relative to controls. These included contigs involved in gonadotropin releasing hormone (GNRH) and transforming growth factor-β signaling. There was an over-representation of significantly affected contigs in 33 and 18 canonical pathways at 1 and 6 weeks

  6. Massively parallel sequencing and the emergence of forensic genomics: Defining the policy and legal issues for law enforcement.

    Science.gov (United States)

    Scudder, Nathan; McNevin, Dennis; Kelty, Sally F; Walsh, Simon J; Robertson, James

    2018-03-01

    Use of DNA in forensic science will be significantly influenced by new technology in coming years. Massively parallel sequencing and forensic genomics will hasten the broadening of forensic DNA analysis beyond short tandem repeats for identity towards a wider array of genetic markers, in applications as diverse as predictive phenotyping, ancestry assignment, and full mitochondrial genome analysis. With these new applications come a range of legal and policy implications, as forensic science touches on areas as diverse as 'big data', privacy and protected health information. Although these applications have the potential to make a more immediate and decisive forensic intelligence contribution to criminal investigations, they raise policy issues that will require detailed consideration if this potential is to be realised. The purpose of this paper is to identify the scope of the issues that will confront forensic and user communities. Copyright © 2017 The Chartered Society of Forensic Sciences. All rights reserved.

  7. Molecular profiling of appendiceal epithelial tumors using massively parallel sequencing to identify somatic mutations.

    Science.gov (United States)

    Liu, Xiaoying; Mody, Kabir; de Abreu, Francine B; Pipas, J Marc; Peterson, Jason D; Gallagher, Torrey L; Suriawinata, Arief A; Ripple, Gregory H; Hourdequin, Kathryn C; Smith, Kerrington D; Barth, Richard J; Colacchio, Thomas A; Tsapakos, Michael J; Zaki, Bassem I; Gardner, Timothy B; Gordon, Stuart R; Amos, Christopher I; Wells, Wendy A; Tsongalis, Gregory J

    2014-07-01

    Some epithelial neoplasms of the appendix, including low-grade appendiceal mucinous neoplasm and adenocarcinoma, can result in pseudomyxoma peritonei (PMP). Little is known about the mutational spectra of these tumor types and whether mutations may be of clinical significance with respect to therapeutic selection. In this study, we identified somatic mutations using the Ion Torrent AmpliSeq Cancer Hotspot Panel v2. Specimens consisted of 3 nonneoplastic retention cysts/mucocele, 15 low-grade mucinous neoplasms (LAMNs), 8 low-grade/well-differentiated mucinous adenocarcinomas with pseudomyxoma peritonei, and 12 adenocarcinomas with/without goblet cell/signet ring cell features. Barcoded libraries were prepared from up to 10 ng of extracted DNA and multiplexed on single 318 chips for sequencing. Data analysis was performed using Golden Helix SVS. Variants that remained after the analysis pipeline were individually interrogated using the Integrative Genomics Viewer. A single Janus kinase 3 (JAK3) mutation was detected in the mucocele group. Eight mutations were identified in the V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) and GNAS complex locus (GNAS) genes among LAMN samples. Additional gene mutations were identified in the AKT1 (v-akt murine thymoma viral oncogene homolog 1), APC (adenomatous polyposis coli), JAK3, MET (met proto-oncogene), phosphatidylinositol-4,5-bisphosphate 3-kinase (PIK3CA), RB1 (retinoblastoma 1), STK11 (serine/threonine kinase 11), and tumor protein p53 (TP53) genes. Among the PMPs, 6 mutations were detected in the KRAS gene and also in the GNAS, TP53, and RB1 genes. Appendiceal cancers showed mutations in the APC, ATM (ataxia telangiectasia mutated), KRAS, IDH1 [isocitrate dehydrogenase 1 (NADP+)], NRAS [neuroblastoma RAS viral (v-ras) oncogene homolog], PIK3CA, SMAD4 (SMAD family member 4), and TP53 genes. Our results suggest molecular heterogeneity among epithelial tumors of the appendix. Next generation sequencing efforts

  8. De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

    Science.gov (United States)

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori

    2014-09-04

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. Copyright © 2014 Huang et al.

  9. A resource of large-scale molecular markers for monitoring Agropyron cristatum chromatin introgression in wheat background based on transcriptome sequences.

    Science.gov (United States)

    Zhang, Jinpeng; Liu, Weihua; Lu, Yuqing; Liu, Qunxing; Yang, Xinming; Li, Xiuquan; Li, Lihui

    2017-09-20

    Agropyron cristatum is a wild grass of the tribe Triticeae and serves as a gene donor for wheat improvement. However, very few markers can be used to monitor A. cristatum chromatin introgressions in wheat. Here, we reported a resource of large-scale molecular markers for tracking alien introgressions in wheat based on transcriptome sequences. By aligning A. cristatum unigenes with the Chinese Spring reference genome sequences, we designed 9602 A. cristatum expressed sequence tag-sequence-tagged site (EST-STS) markers for PCR amplification and experimental screening. As a result, 6063 polymorphic EST-STS markers were specific for the A. cristatum P genome in the single-receipt wheat background. A total of 4956 randomly selected polymorphic EST-STS markers were further tested in eight wheat variety backgrounds, and 3070 markers displaying stable and polymorphic amplification were validated. These markers covered more than 98% of the A. cristatum genome, and the marker distribution density was approximately 1.28 cM. An application case of all EST-STS markers was validated on the A. cristatum 6 P chromosome. These markers were successfully applied in the tracking of alien A. cristatum chromatin. Altogether, this study provided a universal method of large-scale molecular marker development to monitor wild relative chromatin in wheat.

  10. Development and Evaluation of a Novel Set of EST-SSR Markers Based on Transcriptome Sequences of Black Locust (Robinia pseudoacacia L.).

    Science.gov (United States)

    Guo, Qi; Wang, Jin-Xing; Su, Li-Zhuo; Lv, Wei; Sun, Yu-Han; Li, Yun

    2017-07-07

    Black locust ( Robinia pseudoacacia L. of the family Fabaceae) is an ecologically and economically important deciduous tree. However, few genomic resources are available for this forest species, and few effective expressed sequence tag-derived simple sequence repeat (EST-SSR) markers have been developed to date. In this study, paired-end sequencing was used to sequence transcriptomes of R. pseudoacacia by the Illumina HiSeq TM2000 platform, and EST-SSR loci were identified by de novo assembly. Furthermore, a total of 1697 primer pairs were successfully designed, from which 286 primers met the selection screening criteria; 94 pairs were randomly selected and tested for validation using polymerase chain reaction amplification. Forty-five primers were verified as polymorphic, with clear bands. The polymorphism information content values were 0.033-0.765, the number of alleles per locus ranged from 2 to 10, and the observed and expected heterozygosities were 0.000-0.931 and 0.035-0.810, respectively, indicating a high level of informativeness. Subsequently, 45 polymorphic EST-SSR loci were tested for amplification efficiency, using the verified primers, in an additional nine species of Leguminosae, 23 loci were amplified in more than three species, of which two loci were amplified successfully in all species. These EST-SSR markers provide a valuable tool for investigating the genetic diversity and population structure of R . pseudoacacia , constructing a DNA fingerprint database, performing quantitative trait locus mapping, and preserving genetic information.

  11. Targeted capture massively parallel sequencing analysis of LCIS and invasive lobular cancer: Repertoire of somatic genetic alterations and clonal relationships.

    Science.gov (United States)

    Sakr, Rita A; Schizas, Michail; Carniello, Jose V Scarpa; Ng, Charlotte K Y; Piscuoglio, Salvatore; Giri, Dilip; Andrade, Victor P; De Brot, Marina; Lim, Raymond S; Towers, Russell; Weigelt, Britta; Reis-Filho, Jorge S; King, Tari A

    2016-02-01

    Lobular carcinoma in situ (LCIS) has been proposed as a non-obligate precursor of invasive lobular carcinoma (ILC). Here we sought to define the repertoire of somatic genetic alterations in pure LCIS and in synchronous LCIS and ILC using targeted massively parallel sequencing. DNA samples extracted from microdissected LCIS, ILC and matched normal breast tissue or peripheral blood from 30 patients were subjected to massively parallel sequencing targeting all exons of 273 genes, including the genes most frequently mutated in breast cancer and DNA repair-related genes. Single nucleotide variants and insertions and deletions were identified using state-of-the-art bioinformatics approaches. The constellation of somatic mutations found in LCIS (n = 34) and ILC (n = 21) were similar, with the most frequently mutated genes being CDH1 (56% and 66%, respectively), PIK3CA (41% and 52%, respectively) and CBFB (12% and 19%, respectively). Among 19 LCIS and ILC synchronous pairs, 14 (74%) had at least one identical mutation in common, including identical PIK3CA and CDH1 mutations. Paired analysis of independent foci of LCIS from 3 breasts revealed at least one common mutation in each of the 3 pairs (CDH1, PIK3CA, CBFB and PKHD1L1). LCIS and ILC have a similar repertoire of somatic mutations, with PIK3CA and CDH1 being the most frequently mutated genes. The presence of identical mutations between LCIS-LCIS and LCIS-ILC pairs demonstrates that LCIS is a clonal neoplastic lesion, and provides additional evidence that at least some LCIS are non-obligate precursors of ILC. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  12. Massive parallel sequencing as a new diagnostic approach for phenylketonuria and tetrahydrobiopterin-deficiency in Thailand.

    Science.gov (United States)

    Chaiyasap, Pongsathorn; Ittiwut, Chupong; Srichomthong, Chalurmpon; Sangsin, Apiruk; Suphapeetiporn, Kanya; Shotelersuk, Vorasuk

    2017-09-16

    Hyperphenylalaninemia (HPA) can be classified into phenylketonuria (PKU) which is caused by mutations in the phenylalanine hydroxylase (PAH) gene, and BH4 deficiency caused by alterations in genes involved in tetrahydrobiopterin (BH4) biosynthesis pathway. Dietary restriction of phenylalanine is considered to be the main treatment of PKU to prevent irreversible intellectual disability. However, the same dietary intervention in BH4 deficiency patients is not as effective, as BH4 is also a cofactor in many neurotransmitter syntheses. We utilized next generation sequencing (NGS) technique to investigate four unrelated Thai patients with hyperphenylalaninemia. We successfully identified all eight mutant alleles in PKU or BH4-deficiency associated genes including three novel mutations, one in PAH and two in PTS, thus giving a definite diagnosis to these patients. Appropriate management can then be provided. This study identified three novel mutations in either the PAH or PTS gene and supported the use of NGS as an alternative molecular genetic approach for definite diagnosis of hyperphenylalaninemia, thus leading to proper management of these patients in Thailand.

  13. Whole-Genome Analyses of Korean Native and Holstein Cattle Breeds by Massively Parallel Sequencing

    Science.gov (United States)

    Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P.; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

    2014-01-01

    A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea—Hanwoo, Jeju Heugu, and Korean Holstein—using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions–deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding. PMID:24992012

  14. Massively parallel sequencing of 165 ancestry informative SNPs in two Chinese Tibetan-Burmese minority ethnicities.

    Science.gov (United States)

    Wang, Zheng; He, Guanglin; Luo, Tao; Zhao, Xueying; Liu, Jing; Wang, Mengge; Zhou, Di; Chen, Xu; Li, Chengtao; Hou, Yiping

    2018-05-01

    The Tibeto-Burman language, one subfamily of the Sino-Tibetan languages, is spoken by over 60 million people all over East Asia. Yet the ethnic origin and genetic architecture of Tibeto-Burman speaking populations remain largely unexplored. In the present study, 169 Chinese individuals from Tibeto-Burman speaking populations (two ethnic groups: Tibetan and Yi) in four different geographic regions in western China were analyzed using the Precision ID Ancestry Panel (165 AISNPs) and the Ion PGM System. The performance and corresponding forensic statistical parameters of this AISNPs panel were investigated. Comprehensive population genetic comparisons (143 populations based on Kidd' SNPs, 92 populations on the basis of Seldin' SNPs and 31 populations based on the Precision ID Ancestry Panel) and ancestry inference were further performed. Sequencing performance demonstrated that the Precision ID Ancestry Panel is effective and robust. Forensic characteristics suggested that this panel not only can be used for ancestry estimation of Tibeto-Burman populations but also for individual identification. Tibetan and Yi shared a common genetic ancestry origin but experienced the complex history of gene flow, local adaptation, and isolation, and constructed the specific genetic landscape of human genetic diversity of Highlander and Lowlander populations. Tibetan-Burman populations and other East Asian populations showed sufficient genetic difference and could be distinguished into three distinct groups. Furthermore, analysis of population structure revealed that significant genetic difference was existed inter-continent populations and strong genetic affinity was observed within-continent populations. Additional population-specific AISNPs and a relatively more comprehensive database with sufficient reference population data remain necessary to get better-scale resolution within a geographically proximate populations in East Asia. Copyright © 2018 Elsevier B.V. All rights

  15. Targeted massively parallel sequencing of angiosarcomas reveals frequent activation of the mitogen activated protein kinase pathway

    Science.gov (United States)

    Murali, Rajmohan; Chandramohan, Raghu; Möller, Inga; Scholz, Simone L.; Berger, Michael; Huberman, Kety; Viale, Agnes; Pirun, Mono; Socci, Nicholas D.; Bouvier, Nancy; Bauer, Sebastian; Artl, Monika; Schilling, Bastian; Schimming, Tobias; Sucker, Antje; Schwindenhammer, Benjamin; Grabellus, Florian; Speicher, Michael R.; Schaller, Jörg; Hillen, Uwe; Schadendorf, Dirk; Mentzel, Thomas; Cheng, Donavan T.; Wiesner, Thomas; Griewank, Klaus G.

    2015-01-01

    Angiosarcomas are rare malignant mesenchymal tumors of endothelial differentiation. The clinical behavior is usually aggressive and the prognosis for patients with advanced disease is poor with no effective therapies. The genetic bases of these tumors have been partially revealed in recent studies reporting genetic alterations such as amplifications of MYC (primarily in radiation-associated angiosarcomas), inactivating mutations in PTPRB and R707Q hotspot mutations of PLCG1. Here, we performed a comprehensive genomic analysis of 34 angiosarcomas using a clinically-approved, hybridization-based targeted next-generation sequencing assay for 341 well-established oncogenes and tumor suppressor genes. Over half of the angiosarcomas (n = 18, 53%) harbored genetic alterations affecting the MAPK pathway, involving mutations in KRAS, HRAS, NRAS, BRAF, MAPK1 and NF1, or amplifications in MAPK1/CRKL, CRAF or BRAF. The most frequently detected genetic aberrations were mutations in TP53 in 12 tumors (35%) and losses of CDKN2A in 9 tumors (26%). MYC amplifications were generally mutually exclusive of TP53 alterations and CDKN2A loss and were identified in 8 tumors (24%), most of which (n = 7, 88%) arose post-irradiation. Previously reported mutations in PTPRB (n = 10, 29%) and one (3%) PLCG1 R707Q mutation were also identified. Our results demonstrate that angiosarcomas are a genetically heterogeneous group of tumors, harboring a wide range of genetic alterations. The high frequency of genetic events affecting the MAPK pathway suggests that targeted therapies inhibiting MAPK signaling may be promising therapeutic avenues in patients with advanced angiosarcomas. PMID:26440310

  16. Transcriptome sequencing of the blind subterranean mole rat, Spalax galili: Utility and potential for the discovery of novel evolutionary patterns

    KAUST Repository

    Malik, Assaf; Korol, Abraham; Hü bner, Sariel; Hernandez, Alvaro G.; Thimmapuram, Jyothi; Ali, Shahjahan; Glaser, Fabian; Paz, Arnon; Avivi, Aaron; Band, Mark

    2011-01-01

    sequencing of Spalax galili, a chromosomal type of S. ehrenbergi. cDNA pools from muscle and brain tissues isolated from animals exposed to hypoxic and normoxic conditions were sequenced using Sanger, GS FLX, and GS FLX Titanium technologies. Assembly

  17. SNP design from 454 sequencing of Podosphaera plantaginis transcriptome reveals a genetically diverse pathogen metapopulation with high levels of mixed-genotype infection.

    Directory of Open Access Journals (Sweden)

    Charlotte Tollenaere

    Full Text Available Molecular tools may greatly improve our understanding of pathogen evolution and epidemiology but technical constraints have hindered the development of genetic resources for parasites compared to free-living organisms. This study aims at developing molecular tools for Podosphaera plantaginis, an obligate fungal pathogen of Plantago lanceolata. This interaction has been intensively studied in the Åland archipelago of Finland with epidemiological data collected from over 4,000 host populations annually since year 2001.A cDNA library of a pooled sample of fungal conidia was sequenced on the 454 GS-FLX platform. Over 549,411 reads were obtained and annotated into 45,245 contigs. Annotation data was acquired for 65.2% of the assembled sequences. The transcriptome assembly was screened for SNP loci, as well as for functionally important genes (mating-type genes and potential effector proteins. A genotyping assay of 27 SNP loci was designed and tested on 380 infected leaf samples from 80 populations within the Åland archipelago. With this panel we identified 85 multilocus genotypes (MLG with uneven frequencies across the pathogen metapopulation. Approximately half of the sampled populations contain polymorphism. Our genotyping protocol revealed mixed-genotype infection within a single host leaf to be common. Mixed infection has been proposed as one of the main drivers of pathogen evolution, and hence may be an important process in this pathosystem.The developed SNP panel offers exciting research perspectives for future studies in this well-characterized pathosystem. Also, the transcriptome provides an invaluable novel genomic resource for powdery mildews, which cause significant yield losses on commercially important crops annually. Furthermore, the features that render genetic studies in this system a challenge are shared with the majority of obligate parasitic species, and hence our results provide methodological insights from SNP calling to field

  18. Whole transcriptome analysis of Acinetobacter baumannii assessed by RNA-sequencing reveals different mRNA expression profiles in biofilm compared to planktonic cells.

    Directory of Open Access Journals (Sweden)

    Soraya Rumbo-Feal

    Full Text Available Acinetobacterbaumannii has emerged as a dangerous opportunistic pathogen, with many strains able to form biofilms and thus cause persistent infections. The aim of the present study was to use high-throughput sequencing techniques to establish complete transcriptome profiles of planktonic (free-living and sessile (biofilm forms of A. baumannii ATCC 17978 and thereby identify differences in their gene expression patterns. Collections of mRNA from planktonic (both exponential and stationary phase cultures and sessile (biofilm cells were sequenced. Six mRNA libraries were prepared following the mRNA-Seq protocols from Illumina. Reads were obtained in a HiScanSQ platform and mapped against the complete genome to describe the complete mRNA transcriptomes of planktonic and sessile cells. The results showed that the gene expression pattern of A. baumannii biofilm cells was distinct from that of planktonic cells, including 1621 genes over-expressed in biofilms relative to stationary phase cells and 55 genes expressed only in biofilms. These differences suggested important changes in amino acid and fatty acid metabolism, motility, active transport, DNA-methylation, iron acquisition, transcriptional regulation, and quorum sensing, among other processes. Disruption or deletion of five of these genes caused a significant decrease in biofilm formation ability in the corresponding mutant strains. Among the genes over-expressed in biofilm cells were those in an operon involved in quorum sensing. One of them, encoding an acyl carrier protein, was shown to be involved in biofilm formation as demonstrated by the significant decrease in biofilm formation by the corresponding knockout strain. The present work serves as a basis for future studies examining the complex network systems that regulate bacterial biofilm formation and maintenance.

  19. Transcriptome-Wide Analysis of Botrytis elliptica Responsive microRNAs and Their Targets in Lilium Regale Wilson by High-Throughput Sequencing and Degradome Analysis

    Directory of Open Access Journals (Sweden)

    Xue Gao

    2017-05-01

    Full Text Available MicroRNAs, as master regulators of gene expression, have been widely identified and play crucial roles in plant-pathogen interactions. A fatal pathogen, Botrytis elliptica, causes the serious folia disease of lily, which reduces production because of the high susceptibility of most cultivated species. However, the miRNAs related to Botrytis infection of lily, and the miRNA-mediated gene regulatory networks providing resistance to B. elliptica in lily remain largely unexplored. To systematically dissect B. elliptica-responsive miRNAs and their target genes, three small RNA libraries were constructed from the leaves of Lilium regale, a promising Chinese wild Lilium species, which had been subjected to mock B. elliptica treatment or B. elliptica infection for 6 and 24 h. By high-throughput sequencing, 71 known miRNAs belonging to 47 conserved families and 24 novel miRNA were identified, of which 18 miRNAs were downreguleted and 13 were upregulated in response to B. elliptica. Moreover, based on the lily mRNA transcriptome, 22 targets for 9 known and 1 novel miRNAs were identified by the degradome sequencing approach. Most target genes for elliptica-responsive miRNAs were involved in metabolic processes, few encoding different transcription factors, including ELONGATION FACTOR 1 ALPHA (EF1a and TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR 2 (TCP2. Furthermore, the expression patterns of a set of elliptica-responsive miRNAs and their targets were validated by quantitative real-time PCR. This study represents the first transcriptome-based analysis of miRNAs responsive to B. elliptica and their targets in lily. The results reveal the possible regulatory roles of miRNAs and their targets in B. elliptica interaction, which will extend our understanding of the mechanisms of this disease in lily.

  20. Massively parallel sequencing and targeted exomes in familial kidney disease can diagnose underlying genetic disorders.

    Science.gov (United States)

    Mallett, Andrew J; McCarthy, Hugh J; Ho, Gladys; Holman, Katherine; Farnsworth, Elizabeth; Patel, Chirag; Fletcher, Jeffery T; Mallawaarachchi, Amali; Quinlan, Catherine; Bennetts, Bruce; Alexander, Stephen I

    2017-12-01

    Inherited kidney disease encompasses a broad range of disorders, with both multiple genes contributing to specific phenotypes and single gene defects having multiple clinical presentations. Advances in sequencing capacity may allow a genetic diagnosis for familial renal disease, by testing the increasing number of known causative genes. However, there has been limited translation of research findings of causative genes into clinical settings. Here, we report the results of a national accredited diagnostic genetic service for familial renal disease. An expert multidisciplinary team developed a targeted exomic sequencing approach with ten curated multigene panels (207 genes) and variant assessment individualized to the patient's phenotype. A genetic diagnosis (pathogenic genetic variant[s]) was identified in 58 of 135 families referred in two years. The genetic diagnosis rate was similar between families with a pediatric versus adult proband (46% vs 40%), although significant differences were found in certain panels such as atypical hemolytic uremic syndrome (88% vs 17%). High diagnostic rates were found for Alport syndrome (22 of 27) and tubular disorders (8 of 10), whereas the monogenic diagnostic rate for congenital anomalies of the kidney and urinary tract was one of 13. Quality reporting was aided by a strong clinical renal and genetic multidisciplinary committee review. Importantly, for a diagnostic service, few variants of uncertain significance were found with this targeted, phenotype-based approach. Thus, use of targeted massively parallel sequencing approaches in inherited kidney disease has a significant capacity to diagnose the underlying genetic disorder across most renal phenotypes. Copyright © 2017 International Society of Nephrology. Published by Elsevier Inc. All rights reserved.

  1. Genomic Characterization of Non–Small-Cell Lung Cancer in African Americans by Targeted Massively Parallel Sequencing

    Science.gov (United States)

    Araujo, Luiz H.; Timmers, Cynthia; Bell, Erica Hlavin; Shilo, Konstantin; Lammers, Philip E.; Zhao, Weiqiang; Natarajan, Thanemozhi G.; Miller, Clinton J.; Zhang, Jianying; Yilmaz, Ayse S.; Liu, Tom; Coombes, Kevin; Amann, Joseph; Carbone, David P.

    2015-01-01

    Purpose Technologic advances have enabled the comprehensive analysis of genetic perturbations in non–small-cell lung cancer (NSCLC); however, African Americans have often been underrepresented in these studies. This ethnic group has higher lung cancer incidence and mortality rates, and some studies have suggested a lower incidence of epidermal growth factor receptor mutations. Herein, we report the most in-depth molecular profile of NSCLC in African Americans to date. Methods A custom panel was designed to cover the coding regions of 81 NSCLC-related genes and 40 ancestry-informative markers. Clinical samples were sequenced on a massively parallel sequencing instrument, and anaplastic lymphoma kinase translocation was evaluated by fluorescent in situ hybridization. Results The study cohort included 99 patients (61% males, 94% smokers) comprising 31 squamous and 68 nonsquamous cell carcinomas. We detected 227 nonsilent variants in the coding sequence, including 24 samples with nonoverlapping, classic driver alterations. The frequency of driver mutations was not significantly different from that of whites, and no association was found between genetic ancestry and the presence of somatic mutations. Copy number alteration analysis disclosed distinguishable amplifications in the 3q chromosome arm in squamous cell carcinomas and pointed toward a handful of targetable alterations. We also found frequent SMARCA4 mutations and protein loss, mostly in driver-negative tumors. Conclusion Our data suggest that African American ancestry may not be significantly different from European/white background for the presence of somatic driver mutations in NSCLC. Furthermore, we demonstrated that using a comprehensive genotyping approach could identify numerous targetable alterations, with potential impact on therapeutic decisions. PMID:25918285

  2. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing.

    Science.gov (United States)

    Van Neste, Christophe; Vandewoestyne, Mado; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2014-03-01

    Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as molecular autopsy in legal medicine and in mitochondrial DNA research. Copyright © 2013 The Authors. Published by

  3. Bioinformatic prediction of G protein-coupled receptor encoding sequences from the transcriptome of the foreleg, including the Haller's organ, of the cattle tick, Rhipicephalus australis.

    Directory of Open Access Journals (Sweden)

    Sergio Munoz

    Full Text Available The cattle tick of Australia, Rhipicephalus australis, is a vector for microbial parasites that cause serious bovine diseases. The Haller's organ, located in the tick's forelegs, is crucial for host detection and mating. To facilitate the development of new technologies for better control of this agricultural pest, we aimed to sequence and annotate the transcriptome of the R. australis forelegs and associated tissues, including the Haller's organ. As G protein-coupled receptors (GPCRs are an important family of eukaryotic proteins studied as pharmaceutical targets in humans, we prioritized the identification and classification of the GPCRs expressed in the foreleg tissues. The two forelegs from adult R. australis were excised, RNA extracted, and pyrosequenced with 454 technology. Reads were assembled into unigenes and annotated by sequence similarity. Python scripts were written to find open reading frames (ORFs from each unigene. These ORFs were analyzed by different GPCR prediction approaches based on sequence alignments, support vector machines, hidden Markov models, and principal component analysis. GPCRs consistently predicted by multiple methods were further studied by phylogenetic analysis and 3D homology modeling. From 4,782 assembled unigenes, 40,907 possible ORFs were predicted. Using Blastp, Pfam, GPCRpred, TMHMM, and PCA-GPCR, a basic set of 46 GPCR candidates were compiled and a phylogenetic tree was constructed. With further screening of tertiary structures predicted by RaptorX, 6 likely GPCRs emerged and the strongest candidate was classified by PCA-GPCR to be a GABAB receptor.

  4. Characterization of Fusobacterium varium Fv113-g1 isolated from a patient with ulcerative colitis based on complete genome sequence and transcriptome analysis.

    Directory of Open Access Journals (Sweden)

    Tsuyoshi Sekizuka

    Full Text Available Fusobacterium spp. present in the oral and gut flora is carcinogenic and is associated with the risk of pancreatic and colorectal cancers. Fusobacterium spp. is also implicated in a broad spectrum of human pathologies, including Crohn's disease and ulcerative colitis (UC. Here we report the complete genome sequence of Fusobacterium varium Fv113-g1 (genome size, 3.96 Mb isolated from a patient with UC. Comparative genome analyses totally suggested that Fv113-g1 is basically assigned as F. varium, in particular, it could be reclassified as notable F. varium subsp. similar to F. ulcerans because of partial shared orthologs. Compared with the genome sequences of F. varium ATCC 27725 (genome size, 3.30 Mb and other strains of Fusobacterium spp., Fv113-g1 possesses many accessary pan-genome sequences with noteworthy multiple virulence factors, including 44 autotransporters (type V secretion system, T5SS and 13 Fusobacterium adhesion (FadA paralogs involved in potential mucosal inflammation. Indeed, transcriptome analysis demonstrated that Fv113-g1-specific accessary genes, such as multiple T5SS and fadA paralogs, showed notably increased expression with D-MEM cultivation than with brain heart infusion broth. This implied that growth condition may enhance the expression of such potential virulence factors, leading to remarkable survival against other gut microorganisms and to the pathogenicity to human intestinal epithelium.

  5. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum.

    Directory of Open Access Journals (Sweden)

    Meng-Yao Li

    Full Text Available Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  6. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum).

    Science.gov (United States)

    Li, Meng-Yao; Tan, Hua-Wei; Wang, Feng; Jiang, Qian; Xu, Zhi-Sheng; Tian, Chang; Xiong, Ai-Sheng

    2014-01-01

    Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  7. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    Directory of Open Access Journals (Sweden)

    Gomes Paula

    2010-10-01

    Full Text Available Abstract Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR and their RNA transcription level by quantitative PCR (q

  8. Optimal Golomb Ruler Sequences Generation for Optical WDM Systems: A Novel Parallel Hybrid Multi-objective Bat Algorithm

    Science.gov (United States)

    Bansal, Shonak; Singh, Arun Kumar; Gupta, Neena

    2017-02-01

    In real-life, multi-objective engineering design problems are very tough and time consuming optimization problems due to their high degree of nonlinearities, complexities and inhomogeneity. Nature-inspired based multi-objective optimization algorithms are now becoming popular for solving multi-objective engineering design problems. This paper proposes original multi-objective Bat algorithm (MOBA) and its extended form, namely, novel parallel hybrid multi-objective Bat algorithm (PHMOBA) to generate shortest length Golomb ruler called optimal Golomb ruler (OGR) sequences at a reasonable computation time. The OGRs found their application in optical wavelength division multiplexing (WDM) systems as channel-allocation algorithm to reduce the four-wave mixing (FWM) crosstalk. The performances of both the proposed algorithms to generate OGRs as optical WDM channel-allocation is compared with other existing classical computing and nature-inspired algorithms, including extended quadratic congruence (EQC), search algorithm (SA), genetic algorithms (GAs), biogeography based optimization (BBO) and big bang-big crunch (BB-BC) optimization algorithms. Simulations conclude that the proposed parallel hybrid multi-objective Bat algorithm works efficiently as compared to original multi-objective Bat algorithm and other existing algorithms to generate OGRs for optical WDM systems. The algorithm PHMOBA to generate OGRs, has higher convergence and success rate than original MOBA. The efficiency improvement of proposed PHMOBA to generate OGRs up to 20-marks, in terms of ruler length and total optical channel bandwidth (TBW) is 100 %, whereas for original MOBA is 85 %. Finally the implications for further research are also discussed.

  9. Massively parallel signature sequencing and bioinformatics analysis identifies up-regulation of TGFBI and SOX4 in human glioblastoma.

    Directory of Open Access Journals (Sweden)

    Biaoyang Lin

    Full Text Available BACKGROUND: A comprehensive network-based understanding of molecular pathways abnormally altered in glioblastoma multiforme (GBM is essential for developing effective therapeutic approaches for this deadly disease. METHODOLOGY/PRINCIPAL FINDINGS: Applying a next generation sequencing technology, massively parallel signature sequencing (MPSS, we identified a total of 4535 genes that are differentially expressed between normal brain and GBM tissue. The expression changes of three up-regulated genes, CHI3L1, CHI3L2, and FOXM1, and two down-regulated genes, neurogranin and L1CAM, were confirmed by quantitative PCR. Pathway analysis revealed that TGF- beta pathway related genes were significantly up-regulated in GBM tumor samples. An integrative pathway analysis of the TGF beta signaling network identified two alternative TGF-beta signaling pathways mediated by SOX4 (sex determining region Y-box 4 and TGFBI (Transforming growth factor beta induced. Quantitative RT-PCR and immunohistochemistry staining demonstrated that SOX4 and TGFBI expression is elevated in GBM tissues compared with normal brain tissues at both the RNA and protein levels. In vitro functional studies confirmed that TGFBI and SOX4 expression is increased by TGF-beta stimulation and decreased by a specific inhibitor of TGF-beta receptor 1 kinase. CONCLUSIONS/SIGNIFICANCE: Our MPSS database for GBM and normal brain tissues provides a useful resource for the scientific community. The identification of non-SMAD mediated TGF-beta signaling pathways acting through SOX4 and TGFBI (GENE ID:7045 in GBM indicates that these alternative pathways should be considered, in addition to the canonical SMAD mediated pathway, in the development of new therapeutic strategies targeting TGF-beta signaling in GBM. Finally, the construction of an extended TGF-beta signaling network with overlaid gene expression changes between GBM and normal brain extends our understanding of the biology of GBM.

  10. Population genetics, phylogenomics and hybrid speciation of Juglans in China determined from whole chloroplast genomes, transcriptomes, and genotyping-by-sequencing (GBS).

    Science.gov (United States)

    Zhao, Peng; Zhou, Hui-Juan; Potter, Daniel; Hu, Yi-Heng; Feng, Xiao-Jia; Dang, Meng; Feng, Li; Zulfiqar, Saman; Liu, Wen-Zhe; Zhao, Gui-Fang; Woeste, Keith

    2018-04-18

    Genomic data are a powerful tool for elucidating the processes involved in the evolution and divergence of species. The speciation and phylogenetic relationships among Chinese Juglans remain unclear. Here, we used results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing (GBS), and whole chloroplast genomes (Cp genome) data to infer processes of lineage formation among the five native Chinese species of the walnut genus (Juglans, Juglandaceae), a widespread, economically important group. We found that the processes of isolation generated diversity during glaciations, but that the recent range expansion of J. regia, probably from multiple refugia, led to hybrid formation both within and between sections of the genus. In southern China, human dispersal of J. regia brought it into contact with J. sigillata, which we determined to be an ecotype of J. regia that is now maintained as a landrace. In northern China, walnut hybridized with a distinct lineage of J. mandshurica to form J. hopeiensis, a controversial taxon (considered threatened) that our data indicate is a horticultural variety. Comparisons among whole chloroplast genomes and nuclear transcriptome analyses provided conflicting evidence for the timing of the divergence of Chinese Juglans taxa. J. cathayensis and J. mandshurica are poorly differentiated based our genomic data. Reconstruction of Juglans evolutionary history indicate that episodes of climatic variation over the past 4.5 to 33.80 million years, associated with glacial advances and retreats and population isolation, have shaped Chinese walnut demography and evolution, even in the presence of gene flow and introgression. Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Next generation sequencing and de novo transcriptome analysis of Costus pictus D. Don, a non-model plant with potent anti-diabetic properties

    Directory of Open Access Journals (Sweden)

    Annadurai Ramasamy S

    2012-11-01

    Full Text Available Abstract Background Phyto-remedies for diabetic control are popular among patients with Type II Diabetes mellitus (DM, in addition to other diabetic control measures. A number of plant species are known to possess diabetic control properties. Costus pictus D. Don is popularly known as “Insulin Plant” in Southern India whose leaves have been reported to increase insulin pools in blood plasma. Next Generation Sequencing is employed as a powerful tool for identifying molecular signatures in the transcriptome related to physiological functions of plant tissues. We sequenced the leaf transcriptome of C. pictus using Illumina reversible dye terminator sequencing technology and used combination of bioinformatics tools for identifying transcripts related to anti-diabetic properties of C. pictus. Results A total of 55,006 transcripts were identified, of which 69.15% transcripts could be annotated. We identified transcripts related to pathways of bixin biosynthesis and geraniol and geranial biosynthesis as major transcripts from the class of isoprenoid secondary metabolites and validated the presence of putative norbixin methyltransferase, a precursor of Bixin. The transcripts encoding these terpenoids are known to be Peroxisome Proliferator-Activated Receptor (PPAR agonists and anti-glycation agents. Sequential extraction and High Performance Liquid Chromatography (HPLC confirmed the presence of bixin in C. pictus methanolic extracts. Another significant transcript identified in relation to anti-diabetic, anti-obesity and immuno-modulation is of Abscisic Acid biosynthetic pathway. We also report many other transcripts for the biosynthesis of antitumor, anti-oxidant and antimicrobial metabolites of C. pictus leaves. Conclusion Solid molecular signatures (transcripts related to bixin, abscisic acid, and geranial and geraniol biosynthesis for the anti-diabetic properties of C. pictus leaves and vital clues related to the other phytochemical functions

  12. Massively parallel sequencing and genome-wide copy number analysis revealed a clonal relationship in benign metastasizing leiomyoma.

    Science.gov (United States)

    Wu, Ren-Chin; Chao, An-Shine; Lee, Li-Yu; Lin, Gigin; Chen, Shu-Jen; Lu, Yen-Jung; Huang, Huei-Jean; Yen, Chi-Feng; Han, Chien Min; Lee, Yun-Shien; Wang, Tzu-Hao; Chao, Angel

    2017-07-18

    Benign metastasizing leiomyoma (BML) is a rare disease entity typically presenting as multiple extrauterine leiomyomas associated with a uterine leiomyoma. It has been hypothesized that the extrauterine leiomyomata represent distant metastasis of the uterine leiomyoma. To date, the only molecular evidence supporting this hypothesis was derived from clonality analyses based on X-chromosome inactivation assays. Here, we sought to address this issue by examining paired specimens of synchronous pulmonary and uterine leiomyomata from three patients using targeted massively parallel sequencing and molecular inversion probe array analysis for detecting somatic mutations and copy number aberrations. We detected identical non-hot-spot somatic mutations and similar patterns of copy number aberrations (CNAs) in paired pulmonary and uterine leiomyomata from two patients, indicating the clonal relationship between pulmonary and uterine leiomyomata. In addition to loss of chromosome 22q found in the literature, we identified additional recurrent CNAs including losses of chromosome 3q and 11q. In conclusion, our findings of the clonal relationship between synchronous pulmonary and uterine leiomyomas support the hypothesis that BML represents a condition wherein a uterine leiomyoma disseminates to distant extrauterine locations.

  13. Massively parallel sequencing and genome-wide copy number analysis revealed a clonal relationship in benign metastasizing leiomyoma

    Science.gov (United States)

    Lee, Li-Yu; Lin, Gigin; Chen, Shu-Jen; Lu, Yen-Jung; Huang, Huei-Jean; Yen, Chi-Feng; Han, Chien Min; Lee, Yun-Shien; Wang, Tzu-Hao; Chao, Angel

    2017-01-01

    Benign metastasizing leiomyoma (BML) is a rare disease entity typically presenting as multiple extrauterine leiomyomas associated with a uterine leiomyoma. It has been hypothesized that the extrauterine leiomyomata represent distant metastasis of the uterine leiomyoma. To date, the only molecular evidence supporting this hypothesis was derived from clonality analyses based on X-chromosome inactivation assays. Here, we sought to address this issue by examining paired specimens of synchronous pulmonary and uterine leiomyomata from three patients using targeted massively parallel sequencing and molecular inversion probe array analysis for detecting somatic mutations and copy number aberrations. We detected identical non-hot-spot somatic mutations and similar patterns of copy number aberrations (CNAs) in paired pulmonary and uterine leiomyomata from two patients, indicating the clonal relationship between pulmonary and uterine leiomyomata. In addition to loss of chromosome 22q found in the literature, we identified additional recurrent CNAs including losses of chromosome 3q and 11q. In conclusion, our findings of the clonal relationship between synchronous pulmonary and uterine leiomyomas support the hypothesis that BML represents a condition wherein a uterine leiomyoma disseminates to distant extrauterine locations. PMID:28533481

  14. An Extended Flexible Job Shop Scheduling Model for Flight Deck Scheduling with Priority, Parallel Operations, and Sequence Flexibility

    Directory of Open Access Journals (Sweden)

    Lianfei Yu

    2017-01-01

    Full Text Available Efficient scheduling for the supporting operations of aircrafts in flight deck is critical to the aircraft carrier, and even several seconds’ improvement may lead to totally converse outcome of a battle. In the paper, we ameliorate the supporting operations of carrier-based aircrafts and investigate three simultaneous operation relationships during the supporting process, including precedence constraints, parallel operations, and sequence flexibility. Furthermore, multifunctional aircrafts have to take off synergistically and participate in a combat cooperatively. However, their takeoff order must be restrictively prioritized during the scheduling period accorded by certain operational regulations. To efficiently prioritize the takeoff order while minimizing the total time budget on the whole takeoff duration, we propose a novel mixed integer liner programming formulation (MILP for the flight deck scheduling problem. Motivated by the hardness of MILP, we design an improved differential evolution algorithm combined with typical local search strategies to improve computational efficiency. We numerically compare the performance of our algorithm with the classical genetic algorithm and normal differential evolution algorithm and the results show that our algorithm obtains better scheduling schemes that can meet both the operational relations and the takeoff priority requirements.

  15. Retrospective Identification of Herpes Simplex 2 Virus-Associated Acute Liver Failure in an Immunocompetent Patient Detected Using Whole Transcriptome Shotgun Sequencing.

    Science.gov (United States)

    Ono, Atsushi; Hayes, C Nelson; Akamatsu, Sakura; Imamura, Michio; Aikata, Hiroshi; Chayama, Kazuaki

    2017-01-01

    Acute liver failure (ALF) is a severe condition in which liver function rapidly deteriorates in individuals without prior history of liver disease. While most cases result from acetaminophen overdose or viral hepatitis, in up to a third of patients, no clear cause can be identified. Liver transplantation has greatly reduced mortality among these patients, but 40% of patients recover without liver transplantation. Therefore, there is an urgent need for rapid determination of the etiology of acute liver failure. In this case report, we present a case of herpes simplex 2 virus- (HSV-) associated ALF in an immunocompetent patient. The patient recovered without LT, but the presence of HSV was not suspected at the time, precluding more effective treatment with acyclovir. To determine the etiology, stored blood samples were analyzed using whole transcriptome shotgun sequencing followed by mapping to a panel of viral reference sequences. The presence of HSV-DNA in blood samples at the time of admission was confirmed using real-time polymerase chain reaction, and, at the time of discharge, HSV-DNA levels had decreased by a factor of 10 6 . Conclusions. In ALF cases of undetermined etiology, uncommon causes should be considered, especially those for which an effective treatment is available.

  16. Retrospective Identification of Herpes Simplex 2 Virus-Associated Acute Liver Failure in an Immunocompetent Patient Detected Using Whole Transcriptome Shotgun Sequencing

    Directory of Open Access Journals (Sweden)

    Atsushi Ono

    2017-01-01

    Full Text Available Acute liver failure (ALF is a severe condition in which liver function rapidly deteriorates in individuals without prior history of liver disease. While most cases result from acetaminophen overdose or viral hepatitis, in up to a third of patients, no clear cause can be identified. Liver transplantation has greatly reduced mortality among these patients, but 40% of patients recover without liver transplantation. Therefore, there is an urgent need for rapid determination of the etiology of acute liver failure. In this case report, we present a case of herpes simplex 2 virus- (HSV- associated ALF in an immunocompetent patient. The patient recovered without LT, but the presence of HSV was not suspected at the time, precluding more effective treatment with acyclovir. To determine the etiology, stored blood samples were analyzed using whole transcriptome shotgun sequencing followed by mapping to a panel of viral reference sequences. The presence of HSV-DNA in blood samples at the time of admission was confirmed using real-time polymerase chain reaction, and, at the time of discharge, HSV-DNA levels had decreased by a factor of 106. Conclusions. In ALF cases of undetermined etiology, uncommon causes should be considered, especially those for which an effective treatment is available.

  17. De novo RNA sequencing transcriptome of Rhododendron obtusum identified the early heat response genes involved in the transcriptional regulation of photosynthesis.

    Directory of Open Access Journals (Sweden)

    Linchuan Fang

    Full Text Available Rhododendron spp. is an important ornamental species that is widely cultivated for landscape worldwide. Heat stress is a major obstacle for its cultivation in south China. Previous studies on rhododendron principally focused on its physiological and biochemical processes, which are involved in a series of stress tolerance. However, molecular or genetic properties of rhododendron's response to heat stress are still poorly understood. The phenotype and chlorophyll fluorescence kinetics parameters of four rhododendron cultivars were compared under normal or heat stress conditions, and a cultivar with highest heat tolerance, "Yanzhimi" (R. obtusum was selected for transcriptome sequencing. A total of 325,429,240 high quality reads were obtained and assembled into 395,561 transcripts and 92,463 unigenes. Functional annotation showed that 38,724 unigenes had sequence similarity to known genes in at least one of the proteins or nucleotide databases used in this study. These 38,724 unigenes were categorized into 51 functional groups based on Gene Ontology classification and were blasted to 24 known cluster of orthologous groups. A total of 973 identified unigenes belonged to 57 transcription factor families, including the stress-related HSF, DREB, ZNF, and NAC genes. Photosynthesis was significantly enriched in the Kyoto Encyclopedia of Genes and Genomes pathway, and the changed expression pattern was illustrated. The key pathways and signaling components that contribute to heat tolerance in rhododendron were revealed. These results provide a potentially valuable resource that can be used for heat-tolerance breeding.

  18. De novo RNA sequencing transcriptome of Rhododendron obtusum identified the early heat response genes involved in the transcriptional regulation of photosynthesis

    Science.gov (United States)

    Tong, Jun; Dong, Yanfang; Xu, Dongyun; Mao, Jing; Zhou, Yuan

    2017-01-01

    Rhododendron spp. is an important ornamental species that is widely cultivated for landscape worldwide. Heat stress is a major obstacle for its cultivation in south China. Previous studies on rhododendron principally focused on its physiological and biochemical processes, which are involved in a series of stress tolerance. However, molecular or genetic properties of rhododendron’s response to heat stress are still poorly understood. The phenotype and chlorophyll fluorescence kinetics parameters of four rhododendron cultivars were compared under normal or heat stress conditions, and a cultivar with highest heat tolerance, “Yanzhimi” (R. obtusum) was selected for transcriptome sequencing. A total of 325,429,240 high quality reads were obtained and assembled into 395,561 transcripts and 92,463 unigenes. Functional annotation showed that 38,724 unigenes had sequence similarity to known genes in at least one of the proteins or nucleotide databases used in this study. These 38,724 unigenes were categorized into 51 functional groups based on Gene Ontology classification and were blasted to 24 known cluster of orthologous groups. A total of 973 identified unigenes belonged to 57 transcription factor families, including the stress-related HSF, DREB, ZNF, and NAC genes. Photosynthesis was significantly enriched in the Kyoto Encyclopedia of Genes and Genomes pathway, and the changed expression pattern was illustrated. The key pathways and signaling components that contribute to heat tolerance in rhododendron were revealed. These results provide a potentially valuable resource that can be used for heat-tolerance breeding. PMID:29059200

  19. Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™

    DEFF Research Database (Denmark)

    Eduardoff, M; Gross, T E; Santos, C

    2016-01-01

    Seq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures...

  20. Impact of SO(2) on Arabidopsis thaliana transcriptome in wildtype and sulfite oxidase knockout plants analyzed by RNA deep sequencing.

    Science.gov (United States)

    Hamisch, Domenica; Randewig, Dörte; Schliesky, Simon; Bräutigam, Andrea; Weber, Andreas P M; Geffers, Robert; Herschbach, Cornelia; Rennenberg, Heinz; Mendel, Ralf R; Hänsch, Robert

    2012-12-01

    High concentrations of sulfur dioxide (SO(2) ) as an air pollutant, and its derivative sulfite, cause abiotic stress that can lead to cell death. It is currently unknown to what extent plant fumigation triggers specific transcriptional responses. To address this question, and to test the hypothesis that sulfite oxidase (SO) is acting in SO(2) detoxification, we compared Arabidopsis wildtype (WT) and SO knockout lines (SO-KO) facing the impact of 600 nl l(-1) SO(2) , using RNAseq to quantify absolute transcript abundances. These transcriptome data were correlated to sulfur metabolism-related enzyme activities and metabolites obtained from identical samples in a previous study. SO-KO plants exhibited remarkable and broad regulative responses at the mRNA level, especially in transcripts related to sulfur metabolism enzymes, but also in those related to stress response and senescence. Focusing on SO regulation, no alterations were detectable in the WT, whereas in SO-KO plants we found up-regulation of two splice variants of the SO gene, although this gene is not functional in this line. Our data provide evidence for the highly specific coregulation between SO and sulfur-related enzymes like APS reductase, and suggest two novel candidates for involvement in SO(2) detoxification: an apoplastic peroxidase, and defensins as putative cysteine mass storages. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.

  1. The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry).

    Science.gov (United States)

    Buti, Matteo; Moretto, Marco; Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Brilli, Matteo; Lomsadze, Alexandre; Sonego, Paolo; Giongo, Lara; Alonge, Michael; Velasco, Riccardo; Varotto, Claudio; Šurbanovski, Nada; Borodovsky, Mark; Ward, Judson A; Engelen, Kristof; Cavallini, Andrea; Cestaro, Alessandro; Sargent, Daniel James

    2018-04-01

    The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family.

  2. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety 'Amrapali' (Mangifera indica L.).

    Science.gov (United States)

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.

  3. Co-transcriptomic Analysis by RNA Sequencing to Simultaneously Measure Regulated Gene Expression in Host and Bacterial Pathogen

    KAUST Repository

    Ravasi, Timothy; Mavromatis, Charalampos Harris; Bokil, Nilesh J.; Schembri, Mark A.; Sweet, Matthew J.

    2016-01-01

    Intramacrophage pathogens subvert antimicrobial defence pathways using various mechanisms, including the targeting of host TLR-mediated transcriptional responses. Conversely, TLR-inducible host defence mechanisms subject intramacrophage pathogens to stress, thus altering pathogen gene expression programs. Important biological insights can thus be gained through the analysis of gene expression changes in both the host and the pathogen during an infection. Traditionally, research methods have involved the use of qPCR, microarrays and/or RNA sequencing to identify transcriptional changes in either the host or the pathogen. Here we describe the application of RNA sequencing using samples obtained from in vitro infection assays to simultaneously quantify both host and bacterial pathogen gene expression changes, as well as general approaches that can be undertaken to interpret the RNA sequencing data that is generated. These methods can be used to provide insights into host TLR-regulated transcriptional responses to microbial challenge, as well as pathogen subversion mechanisms against such responses.

  4. Co-transcriptomic Analysis by RNA Sequencing to Simultaneously Measure Regulated Gene Expression in Host and Bacterial Pathogen

    KAUST Repository

    Ravasi, Timothy

    2016-01-24

    Intramacrophage pathogens subvert antimicrobial defence pathways using various mechanisms, including the targeting of host TLR-mediated transcriptional responses. Conversely, TLR-inducible host defence mechanisms subject intramacrophage pathogens to stress, thus altering pathogen gene expression programs. Important biological insights can thus be gained through the analysis of gene expression changes in both the host and the pathogen during an infection. Traditionally, research methods have involved the use of qPCR, microarrays and/or RNA sequencing to identify transcriptional changes in either the host or the pathogen. Here we describe the application of RNA sequencing using samples obtained from in vitro infection assays to simultaneously quantify both host and bacterial pathogen gene expression changes, as well as general approaches that can be undertaken to interpret the RNA sequencing data that is generated. These methods can be used to provide insights into host TLR-regulated transcriptional responses to microbial challenge, as well as pathogen subversion mechanisms against such responses.

  5. Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery

    Directory of Open Access Journals (Sweden)

    Materne Michael

    2011-05-01

    Full Text Available Abstract Background Lentil (Lens culinaris Medik. is a cool-season grain legume which provides a rich source of protein for human consumption. In terms of genomic resources, lentil is relatively underdeveloped, in comparison to other Fabaceae species, with limited available data. There is hence a significant need to enhance such resources in order to identify novel genes and alleles for molecular breeding to increase crop productivity and quality. Results Tissue-specific cDNA samples from six distinct lentil genotypes were sequenced using Roche 454 GS-FLX Titanium technology, generating c. 1.38 × 106 expressed sequence tags (ESTs. De novo assembly generated a total of 15,354 contigs and 68,715 singletons. The complete unigene set was sequence-analysed against genome drafts of the model legume species Medicago truncatula and Arabidopsis thaliana to identify 12,639, and 7,476 unique matches, respectively. When compared to the genome of Glycine max, a total of 20,419 unique hits were observed corresponding to c. 31% of the known gene space. A total of 25,592 lentil unigenes were subsequently annoated from GenBank. Simple sequence repeat (SSR-containing ESTs were identified from consensus sequences and a total of 2,393 primer pairs were designed. A subset of 192 EST-SSR markers was screened for validation across a panel 12 cultivated lentil genotypes and one wild relative species. A total of 166 primer pairs obtained successful amplification, of which 47.5% detected genetic polymorphism. Conclusions A substantial collection of ESTs has been developed from sequence analysis of lentil genotypes using second-generation technology, permitting unigene definition across a broad range of functional categories. As well as providing resources for functional genomics studies, the unigene set has permitted significant enhancement of the number of publicly-available molecular genetic markers as tools for improvement of this species.

  6. Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus (Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs

    Science.gov (United States)

    Yang, Wei; Chen, Huapu; Cui, Xuefan; Zhang, Kewei; Jiang, Dongneng; Deng, Siping; Zhu, Chunhua; Li, Guangli

    2017-09-01

    Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker

  7. De novo sequencing and analysis of the Ulva linza transcriptome to discover putative mechanisms associated with its successful colonization of coastal ecosystems

    Directory of Open Access Journals (Sweden)

    Zhang Xiaowen

    2012-10-01

    Full Text Available Abstract Background The green algal genus Ulva Linnaeus (Ulvaceae, Ulvales, Chlorophyta is well known for its wide distribution in marine, freshwater, and brackish environments throughout the world. The Ulva species are also highly tolerant of variations in salinity, temperature, and irradiance and are the main cause of green tides, which can have deleterious ecological effects. However, limited genomic information is currently available in this non-model and ecologically important species. Ulva linza is a species that inhabits bedrock in the mid to low intertidal zone, and it is a major contributor to biofouling. Here, we presented the global characterization of the U. linza transcriptome using the Roche GS FLX Titanium platform, with the aim of uncovering the genomic mechanisms underlying rapid and successful colonization of the coastal ecosystems. Results De novo assembly of 382,884 reads generated 13,426 contigs with an average length of 1,000 bases. Contiguous sequences were further assembled into 10,784 isotigs with an average length of 1,515 bases. A total of 304,101 reads were nominally identified by BLAST; 4,368 isotigs were functionally annotated with 13,550 GO terms, and 2,404 isotigs having enzyme commission (EC numbers were assigned to 262 KEGG pathways. When compared with four other full sequenced green algae, 3,457 unique isotigs were found in U. linza and 18 conserved in land plants. In addition, a specific photoprotective mechanism based on both LhcSR and PsbS proteins and a C4-like carbon-concentrating mechanism were found, which may help U. linza survive stress conditions. At least 19 transporters for essential inorganic nutrients (i.e., nitrogen, phosphorus, and sulphur were responsible for its ability to take up inorganic nutrients, and at least 25 eukaryotic cytochrome P450s, which is a higher number than that found in other algae, may be related to their strong allelopathy. Multi-origination of the stress related proteins

  8. High-accuracy and robust face recognition system based on optical parallel correlator using a temporal image sequence

    Science.gov (United States)

    Watanabe, Eriko; Ishikawa, Mami; Ohta, Maiko; Kodate, Kashiko

    2005-09-01

    Face recognition is used in a wide range of security systems, such as monitoring credit card use, searching for individuals with street cameras via Internet and maintaining immigration control. There are still many technical subjects under study. For instance, the number of images that can be stored is limited under the current system, and the rate of recognition must be improved to account for photo shots taken at different angles under various conditions. We implemented a fully automatic Fast Face Recognition Optical Correlator (FARCO) system by using a 1000 frame/s optical parallel correlator designed and assembled by us. Operational speed for the 1: N (i.e. matching a pair of images among N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 seconds, including the pre/post processing. From trial 1: N identification experiments using FARCO, we acquired low error rates of 2.6% False Reject Rate and 1.3% False Accept Rate. By making the most of the high-speed data-processing capability of this system, much more robustness can be achieved for various recognition conditions when large-category data are registered for a single person. We propose a face recognition algorithm for the FARCO while employing a temporal image sequence of moving images. Applying this algorithm to a natural posture, a two times higher recognition rate scored compared with our conventional system. The system has high potential for future use in a variety of purposes such as search for criminal suspects by use of street and airport video cameras, registration of babies at hospitals or handling of an immeasurable number of images in a database.

  9. Time-resolved echo-shared parallel MRA of the lung: observer preference study of image quality in comparison with non-echo-shared sequences

    International Nuclear Information System (INIS)

    Fink, C.; Puderbach, M.; Zaporozhan, J.; Plathow, C.; Kauczor, H.-U.; Ley, S.

    2005-01-01

    The aim of this study was to evaluate the image quality of time-resolved echo-shared parallel MRA of the lung. The pulmonary vasculature of nine patients (seven females, two males; median age: 44 years) with pulmonary disease was examined using a time-resolved MRA sequence combining echo sharing with parallel imaging (time-resolved echo-shared angiography technique, or TREAT). The sharpness of the vessel borders, conspicuousness of peripheral lung vessels, artifact level, and overall image quality of TREAT was assessed independently by four readers in a side-by-side comparison with non-echo-shared time-resolved parallel MRA data (pMRA) previously acquired in the same patients. Furthermore, the SNR of pulmonary arteries (PA) and veins (PV) achieved with both pulse sequences was compared. The mean voxel size of TREAT MRA was decreased by 24% compared with the non-echo-shared MRA. Regarding the sharpness of the vessel borders, conspicuousness of peripheral lung vessels, and overall image quality the TREAT sequence was rated superior in 75-76% of all cases. If the TREAT images were preferred over the pMRA images, the advantage was rated as major in 61-71% of all cases. The level of artifacts was not increased with the TREAT sequence. The mean interobserver agreement for all categories ranged between fair (artifact level) and good (overall image quality). The maximum SNR of TREAT did not differ from non-echo-shared parallel MRA (PA: TREAT: 273±45; pMRA: 280±71; PV: TREAT: 273±33; pMRA: 258±62). TREAT achieves a higher spatial resolution than non-echo-shared parallel MRA which is also perceived as an improved image quality. (orig.)

  10. Transcriptome sequencing of mung bean (Vigna radiate L.) genes and the identification of EST-SSR markers.

    Science.gov (United States)

    Chen, Honglin; Wang, Lixia; Wang, Suhua; Liu, Chunji; Blair, Matthew Wohlgemuth; Cheng, Xuzhen

    2015-01-01

    Mung bean (Vigna radiate (L.) Wilczek) is an important traditional food legume crop, with high economic and nutritional value. It is widely grown in China and other Asian countries. Despite its importance, genomic information is currently unavailable for this crop plant species or some of its close relatives in the Vigna genus. In this study, more than 103 million high quality cDNA sequence reads were obtained from mung bean using Illumina paired-end sequencing technology. The processed reads were assembled into 48,693 unigenes with an average length of 874 bp. Of these unigenes, 25,820 (53.0%) and 23,235 (47.7%) showed significant similarity to proteins in the NCBI non-redundant protein and nucleotide sequence databases, respectively. Furthermore, 19,242 (39.5%) could be classified into gene ontology categories, 18,316 (37.6%) into Swiss-Prot categories and 10,918 (22.4%) into KOG database categories (E-value SSR), and 2,303 sequences contained more than one SSR together in the same expressed sequence tag (EST). A total of 13,134 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats being the most abundant motif class and G/C repeats being rare. In this SSR analysis, we found five main repeat motifs: AG/CT (30.8%), GAA/TTC (12.6%), AAAT/ATTT (6.8%), AAAAT/ATTTT (6.2%) and AAAAAT/ATTTTT (1.9%). A total of 200 SSR loci were randomly selected for validation by PCR amplification as EST-SSR markers. Of these, 66 marker primer pairs produced reproducible amplicons that were polymorphic among 31 mung bean accessions selected from diverse geographical locations. The large number of SSR-containing sequences found in this study will be valuable for the construction of a high-resolution genetic linkage maps, association or comparative mapping and genetic analyses of various Vigna species.

  11. Uncovering the immune responses of Apis mellifera ligustica larval gut to Ascosphaera apis infection utilizing transcriptome sequencing.

    Science.gov (United States)

    Chen, Dafu; Guo, Rui; Xu, Xijian; Xiong, Cuiling; Liang, Qin; Zheng, Yanzhen; Luo, Qun; Zhang, Zhaonan; Huang, Zhijian; Kumar, Dhiraj; Xi, Weijun; Zou, Xuan; Liu, Min

    2017-07-20

    Honeybees are susceptible to a variety of diseases, including chalkbrood, which is capable of causing huge losses of both the number of bees and colony productivity. This research is designed to characterize the transcriptome profiles of Ascosphaera apis-treated and un-treated larval guts of Apis mellifera ligustica in an attempt to unravel the molecular mechanism underlying the immune responses of western honeybee larval guts to mycosis. In this study, 24, 296 and 2157 genes were observed to be differentially expressed in A. apis-treated Apis mellifera (4-, 5- and 6-day-old) compared with un-treated larval guts. Moreover, the expression patterns of differentially expressed genes (DEGs) were examined via trend analysis, and subsequently, gene ontology analysis and KEGG pathway enrichment analysis were conducted for DEGs involved in up- and down-regulated profiles. Immunity-related pathways were selected for further analysis, and our results demonstrated that a total of 13 and 50 DEGs were annotated in the humoral immune-related and cellular immune-related pathways, respectively. Additionally, we observed that many DEGs up-regulated in treated guts were part of cellular immune pathways, such as the lysosome, ubiquitin mediated proteolysis, and insect hormone biosynthesis pathways and were induced by A. apis invasion. However, more down-regulated DEGs were restrained. Surprisingly, a majority of DEGs within the Toll-like receptor signaling pathway, and the MAPK signaling pathway were up-regulated in treated guts, while all but two genes involved in the NF-κB signaling pathway were down-regulated, which suggested that most genes involved in humoral immune-related pathways were activated in response to the invasive fungal pathogen. This study's findings provide valuable information regarding the investigation of the molecular mechanism of immunity defenses of A. m. ligustica larval guts to infection with A. apis. Furthermore, these studies lay the groundwork for

  12. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology

    OpenAIRE

    Tanase Koji; Nishitani Chikako; Hirakawa Hideki; Isobe Sachiko; Tabata Satoshi; Ohmiya Akemi; Onozaki Takashi

    2012-01-01

    Abstract Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, ...

  13. Transcriptome Characterization and Sequencing-Based Identification of Salt-Responsive Genes in Millettia pinnata, a Semi-Mangrove Plant

    OpenAIRE

    Huang, Jianzi; Lu, Xiang; Yan, Hao; Chen, Shouyi; Zhang, Wanke; Huang, Rongfeng; Zheng, Yizhi

    2012-01-01

    Semi-mangroves form a group of transitional species between glycophytes and halophytes, and hold unique potential for learning molecular mechanisms underlying plant salt tolerance. Millettia pinnata is a semi-mangrove plant that can survive a wide range of saline conditions in the absence of specialized morphological and physiological traits. By employing the Illumina sequencing platform, we generated ∼192 million short reads from four cDNA libraries of M. pinnata and processed them into 108 ...

  14. Profiling of secondary metabolite gene clusters regulated by LaeA in Aspergillus niger FGSC A1279 based on genome sequencing and transcriptome analysis.

    Science.gov (United States)

    Wang, Bin; Lv, Yangyong; Li, Xuejie; Lin, Yiying; Deng, Hai; Pan, Li

    The global regulator LaeA controls the production of many fungal secondary metabolites, possibly via chromatin remodeling. Here we aimed to survey the secondary metabolite profile regulated by LaeA in Aspergillus niger FGSC A1279 by genome sequencing and comparative transcriptomics between the laeA deletion (ΔlaeA) and overexpressing (OE-laeA) mutants. Genome sequencing revealed four putative polyketide synthase genes specific to FGSC A1279, suggesting that the corresponding polyketide compounds might be unique to FGSC A1279. RNA-seq data revealed 281 putative secondary metabolite genes upregulated in the OE-laeA mutants, including 22 secondary metabolite backbone genes. LC-MS chemical profiling illustrated that many secondary metabolites were produced in OE-laeA mutants compared to wild type and ΔlaeA mutants, providing potential resources for drug discovery. KEGG analysis annotated 16 secondary metabolite clusters putatively linked to metabolic pathways. Furthermore, 34 of 61 Zn 2 Cys 6 transcription factors located in secondary metabolite clusters were differentially expressed between ΔlaeA and OE-laeA mutants. Three secondary metabolite clusters (cluster 18, 30 and 33) containing Zn 2 Cys 6 transcription factors that were upregulated in OE-laeA mutants were putatively linked to KEGG pathways, suggesting that Zn 2 Cys 6 transcription factors might play an important role in synthesizing secondary metabolites regulated by LaeA. Taken together, LaeA dramatically influences the secondary metabolite profile in FGSC A1279. Copyright © 2017 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  15. Diversity and Genome Analysis of Australian and Global Oilseed Brassica napus L. Germplasm Using Transcriptomics and Whole Genome Re-sequencing

    Directory of Open Access Journals (Sweden)

    M. Michelle Malmberg

    2018-04-01

    Full Text Available Intensive breeding of Brassica napus has resulted in relatively low diversity, such that B. napus would benefit from germplasm improvement schemes that sustain diversity. As such, samples representative of global germplasm pools need to be assessed for existing population structure, diversity and linkage disequilibrium (LD. Complexity reduction genotyping-by-sequencing (GBS methods, including GBS-transcriptomics (GBS-t, enable cost-effective screening of a large number of samples, while whole genome re-sequencing (WGR delivers the ability to generate large numbers of unbiased genomic single nucleotide polymorphisms (SNPs, and identify structural variants (SVs. Furthermore, the development of genomic tools based on whole genomes representative of global oilseed diversity and orientated by the reference genome has substantial industry relevance and will be highly beneficial for canola breeding. As recent studies have focused on European and Chinese varieties, a global diversity panel as well as a substantial number of Australian spring types were included in this study. Focusing on industry relevance, 633 varieties were initially genotyped using GBS-t to examine population structure using 61,037 SNPs. Subsequently, 149 samples representative of global diversity were selected for WGR and both data sets used for a side-by-side evaluation of diversity and LD. The WGR data was further used to develop genomic resources consisting of a list of 4,029,750 high-confidence SNPs annotated using SnpEff, and SVs in the form of 10,976 deletions and 2,556 insertions. These resources form the basis of a reliable and repeatable system allowing greater integration between canola genomics studies, with a strong focus on breeding germplasm and industry applicability.

  16. Transcriptome Sequencing Analysis Reveals a Difference in Monoterpene Biosynthesis between Scented Lilium ‘Siberia’ and Unscented Lilium ‘Novano’

    Directory of Open Access Journals (Sweden)

    Zenghui Hu

    2017-08-01

    Full Text Available Lilium is a world famous fragrant bulb flower with high ornamental and