WorldWideScience

Sample records for cdna deep sequencing

  1. Construction of small RNA cDNA libraries for deep sequencing.

    Science.gov (United States)

    Lu, Cheng; Meyers, Blake C; Green, Pamela J

    2007-10-01

    Small RNAs (21-24 nucleotides) including microRNAs (miRNAs) and small interfering RNAs (siRNAs) are potent regulators of gene expression in both plants and animals. Several hundred genes encoding miRNAs and thousands of siRNAs have been experimentally identified by cloning approaches. New sequencing technologies facilitate the identification of these molecules and provide global quantitative expression data in a given biological sample. Here, we describe the methods used in our laboratory to construct small RNA cDNA libraries for high-throughput sequencing using technologies such as MPSS, 454 or SBS.

  2. The venom gland transcriptome of Latrodectus tredecimguttatus revealed by deep sequencing and cDNA library analysis.

    Directory of Open Access Journals (Sweden)

    Quanze He

    Full Text Available Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity.

  3. RNA-ligase-dependent biases in miRNA representation in deep-sequenced small RNA cDNA libraries

    Science.gov (United States)

    Hafner, Markus; Renwick, Neil; Brown, Miguel; Mihailović, Aleksandra; Holoch, Daniel; Lin, Carolina; Pena, John T.G.; Nusbaum, Jeffrey D.; Morozov, Pavel; Ludwig, Janos; Ojo, Tolulope; Luo, Shujun; Schroth, Gary; Tuschl, Thomas

    2011-01-01

    Sequencing of small RNA cDNA libraries is an important tool for the discovery of new RNAs and the analysis of their mutational status as well as expression changes across samples. It requires multiple enzyme-catalyzed steps, including sequential oligonucleotide adapter ligations to the 3′ and 5′ ends of the small RNAs, reverse transcription (RT), and PCR. We assessed biases in representation of miRNAs relative to their input concentration, using a pool of 770 synthetic miRNAs and 45 calibrator oligoribonucleotides, and tested the influence of Rnl1 and two variants of Rnl2, Rnl2(1–249) and Rnl2(1–249)K227Q, for 3′-adapter ligation. The use of the Rnl2 variants for adapter ligations yielded substantially fewer side products compared with Rnl1; however, the benefits of using Rnl2 remained largely obscured by additional biases in the 5′-adapter ligation step; RT and PCR steps did not have a significant impact on read frequencies. Intramolecular secondary structures of miRNA and/or miRNA/3′-adapter products contributed to these biases, which were highly reproducible under defined experimental conditions. We used the synthetic miRNA cocktail to derive correction factors for approximation of the absolute levels of individual miRNAs in biological samples. Finally, we evaluated the influence of 5′-terminal 5-nt barcode extensions for a set of 20 barcoded 3′ adapters and observed similar biases in miRNA read distribution, thereby enabling cost-saving multiplex analysis for large-scale miRNA profiling. PMID:21775473

  4. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Unknown

    317. 2.4 cDNA sequencing and analysis. The nucleotide sequence of the cloned H. fossilis GH. cDNA was determined by Sanger's dideoxy chain termi- nation method, using Perkin Elmer bigdye terminator kit in an ABI Prism 377 automated DNA sequencer. All other computational analysis of the GH cDNA was done using.

  5. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  6. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Using polymerase chain reaction (PCR) primers representing the conserved regions of fish GH sequences the 3′ region of catfish GH cDNA (540 bp) was cloned by random amplification of cDNA ends and the clone was used as a probe to isolate recombinant phages carrying the full-length cDNA sequence. The full-length ...

  7. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    RPS16 of eukaryote is a component of the 40S small ribosomal subunit encoded by RPS16 gene and is also a homolog of prokaryotic RPS9. The cDNA and genomic sequence of RPS16 was cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) using reverse transcription-polymerase chain ...

  8. CDNA encoding a polypeptide including a hevein sequence

    Science.gov (United States)

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  9. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  10. Cloning and sequence analysis of H. contortus HC58cDNA gene ...

    African Journals Online (AJOL)

    Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the cathepsin B like proteases, suggesting that HC58cDNA was a member of the papain family. Keywords:Haemonchus contortus, HC58cDNA, cathepsin B like protease, papain family. Kenya Veterinarian Vol.

  11. Generation of cDNA expression libraries enriched for in-frame sequences

    OpenAIRE

    Davis, Claytus A.; Benzer, Seymour

    1997-01-01

    Bacterial cDNA expression libraries are made to reproduce protein sequences present in the mRNA source tissue. However, there is no control over which frame of the cDNA is translated, because translation of the cDNA must be initiated on vector sequence. In a library of nondirectionally cloned cDNAs, only some 8% of the protein sequences produced are expected to be correct. Directional cloning can increase this by a factor of two, but it does not solve the frame problem. We have therefore deve...

  12. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Unknown

    cell embryo and the expression was monitored continuously. The expression shown here is in developing embryo and freshly hatched fish. The intensity of green colour indicate the strong expression of EGFP in all the tissues of the embryo/fry. The expression of EGPF indicates the co-expression of catfish GH cDNA and the ...

  13. 5'-end sequences of budding yeast full-length cDNA clones and quality scores - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project 5'-end sequences of budding yeast full-length cDNA clones and quality... scores Data detail Data name 5'-end sequences of budding yeast full-length cDNA clones and quality...or-capping method, the sequence quality score generated by the Phred software, and links to SGD, dbEST and U...es. FASTA format. Quality Phred's quality score About This Database Database Desc...g yeast full-length cDNA clones and quality scores - Budding yeast cDNA sequencing project | LSDB Archive ...

  14. Cloning and sequencing of complete τ-crystallin cDNA from ...

    Indian Academy of Sciences (India)

    Unknown

    length τ-crystallin cDNA from crocodilian lens and α-enolase from other tissues. ... human (Acc. No. NM_001428). The sequences were used to construct a phylogenetic tree depicting gene lineage, using the clustering program DNAML.

  15. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  16. Cloning and sequencing of dolphinfish (Coryphaena hippurus, Coryphaenidae) growth hormone-encoding cDNA.

    Science.gov (United States)

    Peduel, A D; Elizur, A; Knibb, W

    1994-01-01

    The cDNA encoding the preprotein growth hormone from the dolphinfish (Coryphaena hippurus) has been cloned and sequenced. The cDNA was derived by reverse transcription of RNA from the pituitary of a young fish using the method known as Rapid Amplification of cDNA Ends (RACE). An oligonucleotide primer corresponding to the 5' region of Pagrus major and the universal RACE primer enabled amplification using the Polymerase Chain Reaction (PCR). The dolphinfish and yellow-tail, Seriola quineqeradiata, are both members of the sub-order Percoidei (Perciforme) and their GH sequences show a high level of homology.

  17. Biases in small RNA deep sequencing data.

    Science.gov (United States)

    Raabe, Carsten A; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S

    2014-02-01

    High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias.

  18. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76...... in human. Although additional minor bands of 1.5 and 3.3 kb were found in Northern blots, RT-PCR (reverse transcription polymerase chain reaction) analysis failed to provide evidence that these minor bands are products of the tetranectin gene. Finally, the genetic map location for this gene, Tna...

  19. Cloning of human purine-nucleoside phosphorylase cDNA sequences by complementation in Escherichia coli.

    OpenAIRE

    Goddard, J M; Caput, D; Williams, S R; Martin, D W

    1983-01-01

    We have obtained cDNA clones that contain the entire coding region of the human purine-nucleoside phosphorylase (PNP; EC 2.4.2.1) mRNA. The cDNA sequences were generated by reverse transcription of PNP-enriched mRNA obtained by immunoadsorption of HeLa cell polyribosomes with monospecific antibody to human PNP. cDNA molecules that were close in length to PNP mRNA were separated by agarose gel electrophoresis and inserted into the Pst I site of the plasmid pBR322. Plasmid DNA from the pooled c...

  20. Generation of cDNA expression libraries enriched for in-frame sequences.

    Science.gov (United States)

    Davis, C A; Benzer, S

    1997-03-18

    Bacterial cDNA expression libraries are made to reproduce protein sequences present in the mRNA source tissue. However, there is no control over which frame of the cDNA is translated, because translation of the cDNA must be initiated on vector sequence. In a library of nondirectionally cloned cDNAs, only some 8% of the protein sequences produced are expected to be correct. Directional cloning can increase this by a factor of two, but it does not solve the frame problem. We have therefore developed and tested a library construction methodology using a novel vector, pKE-1, with which translation in the correct reading frame confers kanamycin resistance on the host. Following kanamycin selection, the cDNA libraries contained 60-80% open, in-frame clones. These, compared with unselected libraries, showed a 10-fold increase in the number of matches between the cDNA-encoded proteins made by the bacteria and database protein sequences. cDNA sequencing programs will benefit from the enrichment for correct coding sequences, and screening methods requiring protein expression will benefit from the enrichment for authentic translation products.

  1. Sequence of a cDNA encoding turtle high mobility group 1 protein.

    Science.gov (United States)

    Zheng, Jifang; Hu, Bi; Wu, Duansheng

    2005-07-01

    In order to understand sequence information about turtle HMG1 gene, a cDNA encoding HMG1 protein of the Chinese soft-shell turtle (Pelodiscus sinensis) was amplified by RT-PCR from kidney total RNA, and was cloned, sequenced and analyzed. The results revealed that the open reading frame (ORF) of turtle HMG1 cDNA is 606 bp long. The ORF codifies 202 amino acid residues, from which two DNA-binding domains and one polyacidic region are derived. The DNA-binding domains share higher amino acid identity with homologues sequences of chicken (96.5%) and mammalian (74%) than homologues sequence of rainbow trout (67%). The polyacidic region shows 84.6% amino acid homology with the equivalent region of chicken HMG1 cDNA. Turtle HMG1 protein contains 3 Cys residues located at completely conserved positions. Conservation in sequence and structure suggests that the functions of turtle HMG1 cDNA may be highly conserved during evolution. To our knowledge, this is the first report of HMG1 cDNA sequence in any reptilian.

  2. cDNA sequencing improves the detection of P53 missense mutations in colorectal cancer

    International Nuclear Information System (INIS)

    Szybka, Malgorzata; Kordek, Radzislaw; Zakrzewska, Magdalena; Rieske, Piotr; Pasz-Walczak, Grazyna; Kulczycka-Wojdala, Dominika; Zawlik, Izabela; Stawski, Robert; Jesionek-Kupnicka, Dorota; Liberski, Pawel P

    2009-01-01

    Recently published data showed discrepancies beteween P53 cDNA and DNA sequencing in glioblastomas. We hypothesised that similar discrepancies may be observed in other human cancers. To this end, we analyzed 23 colorectal cancers for P53 mutations and gene expression using both DNA and cDNA sequencing, real-time PCR and immunohistochemistry. We found P53 gene mutations in 16 cases (15 missense and 1 nonsense). Two of the 15 cases with missense mutations showed alterations based only on cDNA, and not DNA sequencing. Moreover, in 6 of the 15 cases with a cDNA mutation those mutations were difficult to detect in the DNA sequencing, so the results of DNA analysis alone could be misinterpreted if the cDNA sequencing results had not also been available. In all those 15 cases, we observed a higher ratio of the mutated to the wild type template by cDNA analysis, but not by the DNA analysis. Interestingly, a similar overexpression of P53 mRNA was present in samples with and without P53 mutations. In terms of colorectal cancer, those discrepancies might be explained under three conditions: 1, overexpression of mutated P53 mRNA in cancer cells as compared with normal cells; 2, a higher content of cells without P53 mutation (normal cells and cells showing K-RAS and/or APC but not P53 mutation) in samples presenting P53 mutation; 3, heterozygous or hemizygous mutations of P53 gene. Additionally, for heterozygous mutations unknown mechanism(s) causing selective overproduction of mutated allele should also be considered. Our data offer new clues for studying discrepancy in P53 cDNA and DNA sequencing analysis

  3. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    Alignment analysis indicated that the nucleotide sequence of the coding sequence shows a high homology to those of Homo sapiens, Pongo abelii, Macaca fascicularis, Mus musculus, Bos taurus and Rattus norvegicus are 93.1, 92.5, 92.2, 91.1, 90.6 and 90.0% respectively. The amino acid sequence encoded by RPS20 ...

  4. Expressed sequence tags: normalization and subtraction of cDNA libraries expressed sequence tags\\ normalization and subtraction of cDNA libraries.

    Science.gov (United States)

    Soares, Marcelo Bento; de Fatima Bonaldo, Maria; Hackett, Jeremiah D; Bhattacharya, Debashish

    2009-01-01

    Expressed Sequence Tags (ESTs) provide a rapid and efficient approach for gene discovery and analysis of gene expression in eukaryotes. ESTs have also become particularly important with recent expanded efforts in complete genome sequencing of understudied, nonmodel eukaryotes such as protists and algae. For these projects, ESTs provide an invaluable source of data for gene identification and prediction of exon-intron boundaries. The generation of EST data, although straightforward in concept, requires nonetheless great care to ensure the highest efficiency and return for the investment in time and funds. To this end, key steps in the process include generation of a normalized cDNA library to facilitate a high gene discovery rate followed by serial subtraction of normalized libraries to maintain the discovery rate. Here we describe in detail, protocols for normalization and subtraction of cDNA libraries followed by an example using the toxic dinoflagellate Alexandrium tamarense.

  5. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  6. New Approaches to Attenuated Hepatitis a Vaccine Development: Cloning and Sequencing of Cell-Culture Adapted Viral cDNA.

    Science.gov (United States)

    1987-10-13

    Insert frag- ments from p16 cDNA clones were subcloned into the phage vector Ml3mp8 or Ml3mpl9 qnd subjected to rapid sequencing using the...2), and selected cDNA insert fragments were subcloned into M13 vectors for sequencing. The sequence of the complete genome was determined, with over

  7. Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.).

    Science.gov (United States)

    Blair, Matthew W; Fernandez, Andrea C; Ishitani, Manabu; Moreta, Danilo; Seki, Motoaki; Ayling, Sarah; Shinozaki, Kazuo

    2011-11-25

    Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of

  8. Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.

    Directory of Open Access Journals (Sweden)

    Blair Matthew W

    2011-11-01

    Full Text Available Abstract Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole

  9. cDNA sequence of human transforming gene hst and identification of the coding sequence required for transforming activity

    International Nuclear Information System (INIS)

    Taira, M.; Yoshida, T.; Miyagawa, K.; Sakamoto, H.; Terada, M.; Sugimura, T.

    1987-01-01

    The hst gene was originally identified as a transforming gene in DNAs from human stomach cancers and from a noncancerous portion of stomach mucosa by DNA-mediated transfection assay using NIH3T3 cells. cDNA clones of hst were isolated from the cDNA library constructed from poly(A) + RNA of a secondary transformant induced by the DNA from a stomach cancer. The sequence analysis of the hst cDNA revealed the presence of two open reading frames. When this cDNA was inserted into an expression vector containing the simian virus 40 promoter, it efficiently induced the transformation of NIH3T3 cells upon transfection. It was found that one of the reading frames, which coded for 206 amino acids, was responsible for the transforming activity

  10. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-02

    Nov 2, 2009 ... Rattus norvegicus are 93.1, 92.5, 92.2, 91.1, 90.6 and 90.0% respectively. The amino acid sequence encoded by RPS20 gene of the Giant Panda shared a high homology (100%) with those of H. sapiens,. Mac. fascicularis, Mus musculus, B. taurus and R. norvegicus, except for P. abelii (99.88%). Primary.

  11. cDNA, genomic cloning and sequence analysis of ribosomal protein ...

    African Journals Online (AJOL)

    Ribosomal protein S4X (RPS4X) is one of the 40S ribosomal proteins encoded by the RPS4X gene. The cDNA and the genomic sequence of RPS4X were cloned successfully from giant panda (Ailuropoda melanoleuca) using reverse transcriptase-polymerase chain reaction (RT-PCR) and touchdown-PCR technology ...

  12. Cloning and sequencing of complete τ-crystallin cDNA from ...

    Indian Academy of Sciences (India)

    Unknown

    brain, heart and gonad, suggesting both to be the product of the same gene. The study thus provides the first report on cDNA sequence of τ-crystallin from a reptilian species and also re-confirms it to be an example of the phenomenon of gene sharing as was demonstrated earlier in the case of peking duck. Moreover, the ...

  13. cDNA, genomic sequence cloning and analysis of the ribosomal ...

    African Journals Online (AJOL)

    Ribosomal protein L37A (RPL37A) is a component of 60S large ribosomal subunit encoded by the RPL37A gene, which belongs to the family of ribosomal L37AE proteins, located in the cytoplasm. The complementary deoxyribonucleic acid (cDNA) and the genomic sequence of RPL37A were cloned successfully from giant ...

  14. Complete amino acid sequence of human intestinal aminopeptidase N as deduced from cloned cDNA

    DEFF Research Database (Denmark)

    Cowell, G M; Kønigshøfer, E; Danielsen, E M

    1988-01-01

    The complete primary structure (967 amino acids) of an intestinal human aminopeptidase N (EC 3.4.11.2) was deduced from the sequence of a cDNA clone. Aminopeptidase N is anchored to the microvillar membrane via an uncleaved signal for membrane insertion. A domain constituting amino acid 250...

  15. cDNA sequence of the long mRNA for human glutamine synthase

    NARCIS (Netherlands)

    van den Hoff, M. J.; Geerts, W. J.; Das, A. T.; Moorman, A. F.; Lamers, W. H.

    1991-01-01

    Screening a human liver cDNA library in lambda ZAP revealed several clones for the mRNA of glutamine synthase. The longest clone was completely sequenced and consists of a 109 bp 5' untranslated region, a 1119 bp protein coding region, a 1498 bp 3' untranslated region and a poly(A) tract of 12 bp

  16. Detection of reverse transcriptase termination sites using cDNA ligation and massive parallel sequencing

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz J; Boyd, Mette; Sandelin, Albin

    2013-01-01

    of these methods can be increased by applying massive parallel sequencing technologies.Here, we describe a versatile method for detection of reverse transcriptase termination sites based on ligation of an adapter to the 3' end of cDNA with bacteriophage TS2126 RNA ligase (CircLigase™). In the following PCR...

  17. Primary structure of a lipoxygenase from barley grain as deduced from its cDNA sequence

    NARCIS (Netherlands)

    Mechelen, J.R. van; Smits, M.; Douma, A.C.; Rouster, J.; Cameron-Mills, V.; Heidekamp, F.; Valk, B.E.

    1995-01-01

    A full length cDNA sequence for a barley grain lipoxygenase was obtained. It includes a 5' untranslated region of 69 nucleotides, an open reading frame of 2586 nucleotides encoding a protein of 862 amino acid residues and a 3' untranslated region of 142 nucleotides. The molecular mass of the encoded

  18. cDNA, genomic cloning and sequence analysis of ribosomal protein ...

    African Journals Online (AJOL)

    enoh

    2012-03-13

    Mar 13, 2012 ... Ribosomal protein S4X (RPS4X) is one of the 40S ribosomal proteins encoded by the RPS4X gene. The. cDNA and the genomic sequence of RPS4X were cloned successfully from giant panda (Ailuropoda melanoleuca) using reverse transcriptase-polymerase chain reaction (RT-PCR) and touchdown- ...

  19. Isolation and sequence analysis of a cDNA clone encoding the fifth complement component

    DEFF Research Database (Denmark)

    Lundwall, Åke B; Wetsel, Rick A; Kristensen, Torsten

    1985-01-01

    clone of 1.85 kilobase pairs was isolated. Hybridization of the mixed-sequence probe to the complementary strand of the plasmid insert and sequence analysis by the dideoxy method predicted the expected protein sequence of C5a (positions 1-12), amino-terminal to the anticipated priming site. The sequence......We have used available protein sequence data for the anaphylatoxin (C5a) portion of the fifth component of human complement (residues 19-25) to synthesize a mixed-sequence oligonucleotide probe. The labeled oligonucleotide was then used to screen a human liver cDNA library, and a single candidate cDNA...... obtained further predicted an arginine-rich sequence (RPRR) immediately upstream of the N-terminal threonine of C5a, indicating that the promolecule form of C5 is synthesized with a beta alpha-chain orientation as previously shown for pro-C3 and pro-C4. The C5 cDNA clone was sheared randomly by sonication...

  20. Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage.

    Science.gov (United States)

    Archer, Stuart K; Shirokikh, Nikolay E; Preiss, Thomas

    2014-05-26

    A major hurdle to transcriptome profiling by deep-sequencing technologies is that abundant transcripts, such as rRNAs, can overwhelm the libraries, severely reducing transcriptome-wide coverage. Methods for depletion of such unwanted sequences typically require treatment of RNA samples prior to library preparation, are costly and not suited to unusual species and applications. Here we describe Probe-Directed Degradation (PDD), an approach that employs hybridisation to DNA oligonucleotides at the single-stranded cDNA library stage and digestion with Duplex-Specific Nuclease (DSN). Targeting Saccharomyces cerevisiae rRNA sequences in Illumina HiSeq libraries generated by the split adapter method we show that PDD results in efficient removal of rRNA. The probes generate extended zones of depletion as a function of library insert size and the requirements for DSN cleavage. Using intact total RNA as starting material, probes can be spaced at the minimum anticipated library size minus 20 nucleotides to achieve continuous depletion. No off-target bias is detectable when comparing PDD-treated with untreated libraries. We further provide a bioinformatics tool to design suitable PDD probe sets. We find that PDD is a rapid procedure that results in effective and specific depletion of unwanted sequences from deep-sequencing libraries. Because PDD acts at the cDNA stage, handling of fragile RNA samples can be minimised and it should further be feasible to remediate existing libraries. Importantly, PDD preserves the original RNA fragment boundaries as is required for nucleotide-resolution footprinting or base-cleavage studies. Finally, as PDD utilises unmodified DNA oligonucleotides it can provide a low-cost option for large-scale projects, or be flexibly customised to suit different depletion targets, sample types and organisms.

  1. Isolation and sequence analysis of a chalcone synthase cDNA of Matthiola incana R. Br. (Brassicaceae).

    Science.gov (United States)

    Epping, B; Kittel, M; Ruhnau, B; Hemleben, V

    1990-06-01

    A cDNA clone (pcM12) of the chalcone synthase (CHS) of Matthiola incana R. Br. (Brassicaceae) was isolated from a cDNA library, sequenced and analysed. It comprises the complete coding sequence for the CHS and 5' and 3' untranslated regions. The deduced amino acid sequence shows that the Matthiola incana CHS consists of 394 amino acid residues. Comparison with CHS amino acid sequences of other plants indicates more than 82% homology.

  2. cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing.

    Science.gov (United States)

    Cartolano, Maria; Huettel, Bruno; Hartwig, Benjamin; Reinhardt, Richard; Schneeberger, Korbinian

    2016-01-01

    The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.

  3. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  4. Cloning and sequencing of Indian Water buffalo (Bubalus bubalis) interleukin-3 cDNA

    KAUST Repository

    Sugumar, Thennarasu

    2011-12-12

    Full-length cDNA (435 bp) of the interleukin-3(IL-3) gene of the Indian water buffalo was amplified by reverse transcriptase-polymerase chain reaction and sequenced. This sequence had 96% nucleotide identity and 92% amino acid identity with bovine IL-3. There are 10 amino acid substitutions in buffalo compared with that of bovine. The amino acid sequence of buffalo IL-3 also showed very high identity with that of other ruminants, indicating functional cross-reactivity. Structural homology modelling of buffalo IL-3 protein with human IL-3 showed the presence of five helical structures.

  5. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  6. cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Valenzuela Jesus G

    2007-07-01

    Full Text Available Abstract Background The completion of the Plasmodium falciparum genome represents a milestone in malaria research. The genome sequence allows for the development of genome-wide approaches such as microarray and proteomics that will greatly facilitate our understanding of the parasite biology and accelerate new drug and vaccine development. Designing and application of these genome-wide assays, however, requires accurate information on gene prediction and genome annotation. Unfortunately, the genes in the parasite genome databases were mostly identified using computer software that could make some erroneous predictions. Results We aimed to obtain cDNA sequences to examine the accuracy of gene prediction in silico. We constructed cDNA libraries from mixed blood stages of P. falciparum parasite using the SMART cDNA library construction technique and generated 17332 high-quality expressed sequence tags (EST, including 2198 from primer-walking experiments. Assembly of our sequence tags produced 2548 contigs and 2671 singletons versus 5220 contigs and 5910 singletons when our EST were assembled with EST in public databases. Comparison of all the assembled EST/contigs with predicted CDS and genomic sequences in the PlasmoDB database identified 356 genes with predicted coding sequences fully covered by EST, including 85 genes (23.6% with introns incorrectly predicted. Careful automatic software and manual alignments found an additional 308 genes that have introns different from those predicted, with 152 new introns discovered and 182 introns with sizes or locations different from those predicted. Alternative spliced and antisense transcripts were also detected. Matching cDNA to predicted genes also revealed silent chromosomal regions, mostly at subtelomere regions. Conclusion Our data indicated that approximately 24% of the genes in the current databases were predicted incorrectly, although some of these inaccuracies could represent alternatively

  7. Sequencing of first-strand cDNA library reveals full-length transcriptomes.

    Science.gov (United States)

    Agarwal, Saurabh; Macfarlan, Todd S; Sartor, Maureen A; Iwase, Shigeki

    2015-01-21

    Massively parallel strand-specific sequencing of RNA (ssRNA-seq) has emerged as a powerful tool for profiling complex transcriptomes. However, many current methods for ssRNA-seq suffer from the underrepresentation of both the 5' and 3' ends of RNAs, which can be attributed to second-strand cDNA synthesis. The 5' and 3' ends of RNA harbour crucial information for gene regulation; namely, transcription start sites (TSSs) and polyadenylation sites. Here we report a novel ssRNA-seq method that does not involve second-strand cDNA synthesis, as we Directly Ligate sequencing Adaptors to the First-strand cDNA (DLAF). This novel method with fewer enzymatic reactions results in a higher quality of the libraries than the conventional method. Sequencing of DLAF libraries followed by a novel analysis pipeline enables the profiling of both 5' ends and polyadenylation sites at near-base resolution. Therefore, DLAF offers the first genomics tool to obtain the 'full-length' transcriptome with a single library.

  8. Deep sequencing approach for investigating infectious agents causing fever.

    Science.gov (United States)

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis.

  9. [Cloning and sequence analysis of Eg95 cDNA from different stages of Echinococcus granulosus in Xinjiang].

    Science.gov (United States)

    Lin, Ren-yong; Ding, Jian-bing; Wen, Hao; Zhang, Wen-bao; Li, Jun; Lu, Xiao-mei

    2003-01-01

    To study expression and sequence differences of Echinococcus granulosus 95(Eg95) antigen cDNA from different stages of protoscolex, oncosphere and adult worm of E. granulosus from Xinjiang Uighur Aut. Reg. In accordance with the sequence of Eg95 antigen cDNA, the primers of Eg95 were designed. Eg95 antigen cDNAs were amplified by PCR from protoscolex, oncosphere and adult worm cDNA libraries of E. granulosus, respectively and were cloned into pUCm-T plasmid, and sequenced. The sequences were analyzed by DNAman and GenBank/BLAST biosoftware. PCR results showed that Eg95 antigen cDNA was amplified from three stages of E. granulosus cDNA libraries. Sequencing analysis indicated that the Eg95 cDNA length was 402 bp, same as the reported data in GenBank. The Eg95 antigen cDNA was expressed in the different life-cycle stages of E. granulosus in Xinjiang and there was no nucleic acid sequence difference of Eg95 antigen among the protoscolex, oncosphere and adult worm of E. granulosus.

  10. cDNA sequences of two inducible T-cell genes

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, B.S. (Indiana Univ. School of Medicine, Indianapolis (USA) Guthrie Research Institute, Sayre, PA (USA)); Weissman, S.M. (Yale Univ., New Haven, CT (USA))

    1989-03-01

    The authors have previously described a set of human T-lymphocyte-specific cDNA clones isolated by a modified differential screening procedure. Apparent full-length cDNAs containing the sequences of 14 of the 16 initial isolates were sequenced and were found to represent five different species of mRNA; three of the five species were identical to previously reported cDNA sequences of preproenkephalin, T-cell-replacing factor, and a serine esterase, respectively. The other two species, 4-1BB and L2G25B, were inducible sequences found in mRNA from both a cytolytic T-lymphocyte and a helper T-lymphocyte clone and were not previously described in T-cell mRNA; these mRNA sequences encode peptides of 256 and 92 amino acids, respectively. Both peptides contain putative leader sequences. The protein encoded by 4-1BB also has a potential membrane anchor segment and other features also seen in known receptor proteins.

  11. Application of deep sequence technology in hepatology.

    Science.gov (United States)

    Ninomiya, Masashi; Ueno, Yoshiyuki; Shimosegawa, Tooru

    2014-02-01

    Deep sequencing technologies are currently cutting edge, and are opening fascinating opportunities in biomedicine, producing over 100-times more data compared to the conventional capillary sequencers based on the Sanger method. Next-generation sequencing (NGS) is now generally defined as the sequencing technology that, by employing parallel sequencing processes, producing thousands or millions of sequence reads simultaneously. Since the GS20 was released as the first NGS sequencer on the market by 454 Life Sciences, the competition in the development of the new sequencers has become intense. In this review, we describe the current deep sequencing systems and discuss the application of advanced technologies in the field of hepatology. © 2013 The Japan Society of Hepatology.

  12. Construction of small RNA cDNA libraries for high-throughput sequencing.

    Science.gov (United States)

    Lu, Cheng; Shedge, Vikas

    2011-01-01

    Small RNAs (smRNAs) play an essential role in virtually every aspect of growth and development, by regulating gene expression at the post-transcriptional and/or transcriptional level. New high-throughput sequencing technology allows for a comprehensive coverage of smRNAs in any given biological sample, and has been widely used for profiling smRNA populations in various developmental stages, tissue and cell types, or normal and disease states. In this article, we describe the method used in our laboratory to construct smRNA cDNA libraries for high-throughput sequencing.

  13. cDNA encoding a polypeptide including a hev ein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, Natasha V. (Okemos, MI); Broekaert, Willem F. (Dilbeek, BE); Chua, Nam-Hai (Scarsdale, NY); Kush, Anil (New York, NY)

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  14. Generation of longer 3' cDNA fragments from massively parallel signature sequencing tags.

    Science.gov (United States)

    Silva, Ana Paula M; Chen, Jianjun; Carraro, Dirce M; Wang, San Ming; Camargo, Anamaria A

    2004-07-06

    Massively Parallel Signature Sequencing (MPSS) is a powerful technique for genome-wide gene expression analysis, which, similar to SAGE, relies on the production of short tags proximal to the 3'end of transcripts. A single MPSS experiment can generate over 10(7) tags, providing a 10-fold coverage of the transcripts expressed in a human cell. A significant fraction of MPSS tags cannot be assigned to known transcripts (orphan tags) and are likely to be derived from transcripts expressed at very low levels (approximately 1 copy per cell). In order to explore the potential of MPSS for the characterization of the human transcriptome, we have adapted the GLGI protocol (Generation of Longer cDNA fragments from SAGE tags for Gene Identification) to convert MPSS tags into their corresponding 3' cDNA fragments. GLGI-MPSS was applied to 83 orphan tags and 41 cDNA fragments were obtained. The analysis of these 41 fragments allowed the identification of novel transcripts, alternative tags generated from polymorphic and alternatively spliced transcripts, as well as the detection of artefactual MPSS tags. A systematic large-scale analysis of the genome by MPSS, in combination with the use of GLGI-MPSS protocol, will certainly provide a complementary approach to generate the complete catalog of human transcripts.

  15. Generation of longer 3′ cDNA fragments from massively parallel signature sequencing tags

    Science.gov (United States)

    Silva, Ana Paula M.; Chen, Jianjun; Carraro, Dirce M.; Wang, San Ming; Camargo, Anamaria A.

    2004-01-01

    Massively Parallel Signature Sequencing (MPSS) is a powerful technique for genome-wide gene expression analysis, which, similar to SAGE, relies on the production of short tags proximal to the 3′end of transcripts. A single MPSS experiment can generate over 107 tags, providing a 10-fold coverage of the transcripts expressed in a human cell. A significant fraction of MPSS tags cannot be assigned to known transcripts (orphan tags) and are likely to be derived from transcripts expressed at very low levels (∼1 copy per cell). In order to explore the potential of MPSS for the characterization of the human transcriptome, we have adapted the GLGI protocol (Generation of Longer cDNA fragments from SAGE tags for Gene Identification) to convert MPSS tags into their corresponding 3′ cDNA fragments. GLGI-MPSS was applied to 83 orphan tags and 41 cDNA fragments were obtained. The analysis of these 41 fragments allowed the identification of novel transcripts, alternative tags generated from polymorphic and alternatively spliced transcripts, as well as the detection of artefactual MPSS tags. A systematic large-scale analysis of the genome by MPSS, in combination with the use of GLGI-MPSS protocol, will certainly provide a complementary approach to generate the complete catalog of human transcripts. PMID:15247327

  16. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger.

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-10-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×10(6) pfu/ml and 1.62×10(9) pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142 bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers.

  17. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  18. De-novo transcriptome sequencing of a normalized cDNA pool from influenza infected ferrets.

    Directory of Open Access Journals (Sweden)

    Jeremy V Camp

    Full Text Available The ferret is commonly used as a model for studies of infectious diseases. The genomic sequence of this animal model is not yet characterized, and only a limited number of fully annotated cDNAs are currently available in GenBank. The majority of genes involved in innate or adaptive immune response are still lacking, restricting molecular genetic analysis of host response in the ferret model. To enable de novo identification of transcriptionally active ferret genes in response to infection, we performed de-novo transcriptome sequencing of animals infected with H1N1 A/California/07/2009. We also included splenocytes induced with bacterial lipopolysaccharide to allow for identification of transcripts specifically induced by gram-negative bacteria. We pooled and normalized the cDNA library in order to delimit the risk of sequencing only highly expressed genes. While normalization of the cDNA library removes the possibility of assessing expression changes between individual animals, it has been shown to increase identification of low abundant transcripts. In this study, we identified more than 19,000 partial ferret transcripts, including more than 1000 gene orthologs known to be involved in the innate and the adaptive immune response.

  19. Identification and complete sequencing of novel human transcripts through the use of mouse orthologs and testis cDNA sequences

    DEFF Research Database (Denmark)

    Ferreira, Elisa N; Pires, Lilian C; Parmigiani, Raphael B

    2004-01-01

    The correct identification of all human genes, and their derived transcripts, has not yet been achieved, and it remains one of the major aims of the worldwide genomics community. Computational programs suggest the existence of 30,000 to 40,000 human genes. However, definitive gene identification...... can only be achieved by experimental approaches. We used two distinct methodologies, one based on the alignment of mouse orthologous sequences to the human genome, and another based on the construction of a high-quality human testis cDNA library, in an attempt to identify new human transcripts within...... the human genome sequence. We generated 47 complete human transcript sequences, comprising 27 unannotated and 20 annotated sequences. Eight of these transcripts are variants of previously known genes. These transcripts were characterized according to size, number of exons, and chromosomal localization...

  20. Molecular cloning and nucleotide sequence of full-length cDNA for sweet potato catalase mRNA.

    Science.gov (United States)

    Sakajo, S; Nakamura, K; Asahi, T

    1987-06-01

    A nearly full-length cDNA clone for catalase (pCAS01) was obtained through immunological screening of cDNA expression library constructed from size-fractionated poly(A)-rich RNA of wounded sweet potato tuberous roots by Escherichia coli expression vector-primed cDNA synthesis. Two additional catalase cDNA clones (pCAS10 and pCAS13), which contained cDNA inserts slightly longer than that of pCAS01 at their 5'-termini, were identified by colony hybridization of another cDNA library. Those three catalase cDNAs contained primary structures not identical, but closely related, to one another based on their restriction enzyme and RNase cleavage mapping analyses, suggesting that microheterogeneity exists in catalase mRNAs. The cDNA insert of pCAS13 carried the entire catalase coding capacity, since the RNA transcribed in vitro from the cDNA under the SP6 phage promoter directed the synthesis of a catalase polypeptide in the wheat germ in vitro translation assay. The nucleotide sequencing of these catalase cDNAs indicated that 1900-base catalase mRNA contained a coding region of 1476 bases. The amino acid sequence of sweet potato catalase deduced from the nucleotide sequence was 35 amino acids shorter than rat liver catalase [Furuta, S., Hayashi, H., Hijikata, M., Miyazawa, S., Osumi, T. & Hashimoto, T. (1986) Proc. Natl Acad. Sci. USA 83, 313-317]. Although these two sequences showed only 38% homology, the sequences around the amino acid residues implicated in catalytic function, heme ligand or heme contact had been well conserved during evolution.

  1. Molecular cloning and sequence analysis of growth hormone cDNA of Neotropical freshwater fish Pacu (Piaractus mesopotamicus

    Directory of Open Access Journals (Sweden)

    Janeth Silva Pinheiro

    2008-01-01

    Full Text Available RT-PCR was used for amplifying Piaractus mesopotamicus growth hormone (GH cDNA obtained from mRNA extracted from pituitary cells. The amplified fragment was cloned and the complete cDNA sequence was determined. The cloned cDNA encompassed a sequence of 543 nucleotides that encoded a polypeptide of 178 amino acids corresponding to mature P. mesopotamicus GH. Comparison with other GH sequences showed a gap of 10 amino acids localized in the N terminus of the putative polypeptide of P. mesopotamicus. This same gap was also observed in other members of the family. Neighbor-joining tree analysis with GH sequences from fishes belonging to different taxonomic groups placed the P. mesopotamicus GH within the Otophysi group. To our knowledge, this is the first GH sequence of a Neotropical characiform fish deposited in GenBank.

  2. Crustacean hyperglycemic hormones of two cold water crab species, Chionoecetes opilio and C. japonicus: isolation of cDNA sequences and localization of CHH neuropeptide in eyestalk ganglia.

    Science.gov (United States)

    Chung, J Sook; Ahn, I S; Yu, O H; Kim, D S

    2015-04-01

    Crustacean hyperglycemic hormone (CHH) is primarily known for its prototypical function in hyperglycemia which is induced by the release of CHH. The CHH release takes place as an adaptive response to the energy demands of the animals experiencing stressful environmental, physiological or behavioral conditions. Although >63 decapod CHH nucleotide sequences are known (GenBank), the majority of them is garnered from the species inhabiting shallow and warm water. In order to understand the adaptive role of CHH in Chionoecetes opilio and Chionoecetes japonicus inhabiting deep water environments, we first aimed for the isolation of the full-length cDNA sequence of CHH from the eyestalk ganglia of C. opilio (ChoCHH) and C. japonicus (ChjCHH) using degenerate PCR and 5' and 3' RACE. Cho- and ChjCHH cDNA sequences are identical in 5' UTR and ORF with 100% sequence identity of the putative 138aa of preproCHHs. The length of 3' UTR ChjCHH cDNA sequence is 39 nucleotides shorter than that of ChoCHH. This is the first report in decapod crustaceans that two different species have the identical sequence of CHH. ChoCHH expression increases during embryogenesis of C. opilio and is significantly higher in adult males and females. C. japonicus males have slightly higher ChjCHH expression than C. opilio males, but no statistical difference. In both species, the immunostaining intensity of CHH is stronger in the sinus gland than that of X-organ cells. Future studies will enable us to gain better understanding of the comparative metabolic physiology and endocrinology of cold, deep water species of Chionoecetes spp. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. DeepSimulator: a deep simulator for Nanopore sequencing

    KAUST Repository

    Li, Yu

    2017-12-23

    Motivation: Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals. Results: Here we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection. Availability: The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator.

  4. [Cloning, sequencing and subcloning of cDNA coding for group I allergen of Dermatophagoides farinae].

    Science.gov (United States)

    Yang, Qing-gui; Li, Chao-pin

    2004-06-01

    To clone, sequence and subclone the cDNA coding for group 1 allergen of Dermatophagoides farinae (Der f 1). The cDNA of Der f 1 was amplified by RT-PCR and PCR. After purified, the gene fragment was cloned into a vector pMD-18T. The recombinant plasmid pMD-18T-Der f 1 was transformed into E. coli JM109. Positive clones were screened and identified by PCR and digestion with restriction enzyme. The sequence of inserted Der f 1 gene fragment was also detected. Der f 1 was then subcloned into the vector of pET-32a(+). The Der f 1 gene fragment of Dermatophagoides farinae was specifically amplified from RNA by RT-PCR and PCR. The recombinant plasmid pMD-18T-Der f 1 and pET-32a(+)-Der f 1 was constructed and digested by Bam H I and Sac I, the size of gene fragment was 646 bp and in accordance with the expected one. The pET-32a(+)-Der f 1 subcloning has been constructed successfully.

  5. Primary analysis of the expressed sequence tags in a pentastomid nymph cDNA library.

    Directory of Open Access Journals (Sweden)

    Jing Zhang

    Full Text Available BACKGROUND: Pentastomiasis is a rare zoonotic disease caused by pentastomids. Despite their worm-like appearance, they are commonly placed into a separate sub-class of the subphylum Crustacea, phylum Arthropoda. However, until now, the systematic classification of the pentastomids and the diagnosis of pentastomiasis are immature, and genetic information about pentastomid nylum is almost nonexistent. The objective of this study was to obtain information on pentastomid nymph genes and identify the gene homologues related to host-parasite interactions or stage-specific antigens. METHODOLOGY/PRINCIPAL FINDINGS: Total pentastomid nymph RNA was used to construct a cDNA library and 500 colonies were sequenced. Analysis shows one hundred and ninety-seven unigenes were identified. In which, 147 genes were annotated, and 75 unigenes (53.19% were mapped to 82 KEGG pathways, including 29 metabolism pathways, 29 genetic information processing pathways, 4 environmental information processing pathways, 7 cell motility pathways and 5 organismal systems pathways. Additionally, two host-parasite interaction-related gene homologues, a putative Kunitz inhibitor and a putative cysteine protease. CONCLUSION/SIGNIFICANCE: We first successfully constructed a cDNA library and gained a number of expressed sequence tags (EST from pentastomid nymphs, which will lay the foundation for the further study on pentastomids and pentastomiasis.

  6. Nucleotide sequence of cloned cDNA for human sphingolipid activator protein 1 precursor

    International Nuclear Information System (INIS)

    Dewji, N.N.; Wenger, D.A.; O'Brien, J.S.

    1987-01-01

    Two cDNA clones encoding prepro-sphingolipid activator protein 1 (SAP-1) were isolated from a λ gt11 human hepatoma expression library using polyclonal antibodies. These had inserts of ≅ 2 kilobases (λ-S-1.2 and λ-S-1.3) and both were both homologous with a previously isolated clone (λ-S-1.1) for mature SAP-1. The authors report here the nucleotide sequence of the longer two EcoRI fragments of S-1.2 and S-1.3 that were not the same and the derived amino acid sequences of mature SAP-1 and its prepro form. The open reading frame encodes 19 amino acids, which are colinear with the amino-terminal sequence of mature SAP-1, and extends far beyond the predicted carboxyl terminus of mature SAP-1, indicating extensive carboxyl-terminal processing. The nucleotide sequence of cDNA encoding prepro-SAP-1 includes 1449 bases from the assigned initiation codon ATG at base-pair 472 to the stop codon TGA at base-pair 1921. The first 23 amino acids coded after the initiation ATG are characteristic of a signal peptide. The calculated molecular mass for a polypeptide encoded by 1449 bases is ≅ 53 kDa, in keeping with the reported value for pro-SAP-1. The data indicate that after removal of the signal peptide mature SAP-1 is generated by removing an additional 7 amino acids from the amino terminus and ≅ 373 amino acids from the carboxyl terminus. One potential glycosylation site was previously found in mature SAP-1. Three additional potential glycosylation sites are present in the processed carboxyl-terminal polypeptide, which they designate as P-2

  7. Characterization of cDNA clones encoding rabbit and human serum paraoxonase: The mature protein retains its signal sequence

    Energy Technology Data Exchange (ETDEWEB)

    Hassett, C.; Richter, R.J.; Humbert, R.; Omiecinski, C.J.; Furlong, C.E. (Univ. of Washington, Seattle (United States)); Chapline, C.; Crabb, J.W. (W.Alton Jones Cell Science Center, Lake Placid, NY (United States))

    1991-10-22

    Serum paraoxonase hydrolyzes the toxic metabolites of a variety of organophosphorus insecticides. High serum paraoxonase levels appear to protect against the neurotoxic effects of organophosphorus substrates of this enzyme. The amino acid sequence accounting for 42% of rabbit paraoxonase was determined. From these data, two oligonucleotide probes were synthesized and used to screen a rabbit liver cDNA library. Human paraoxonase clones were isolated from a liver cDNA library by using the rabbit cDNA as a hybridization probe. Inserts from three of the longest clones were sequenced, and one full-length clone contained an open reading frame encoding 355 amino acids, four less than the rabbit paraoxonase protein. Amino-terminal sequences derived from purified rabbit and human paraoxonase proteins suggested that the signal sequence is retained, with the exception of the initiator methionine residue. Characterization of the rabbit and human paraoxonase cDNA clones confirms that the signal sequences are not processed, except for the N-terminal methionine residue. The rabbit and human cDNA clones demonstrate striking nucleotide and deduced amino acid similarities (greater than 85%), suggesting an important metabolic role and constraints on the evolution of this protein.

  8. Determination of cDNA and genomic DNA sequences of hevamine, a chitinase from the rubber tree Hevea brasiliensis

    NARCIS (Netherlands)

    Bokma, E; Spiering, M; Chow, KS; Mulder, PPMFA; Subroto, T; Beintema, JJ

    Hevamine is a chitinase from the rubber tree Hevea brasiliensis and belongs to the family 18 glycosyl hydrolases. This paper describes the cloning of hevamine DNA and cDNA sequences. Hevamine contains a signal peptide at the N-terminus and a putative vacuolar targeting sequence at the C-terminus

  9. Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project

    Directory of Open Access Journals (Sweden)

    Heinz Ruth A

    2003-09-01

    Full Text Available Abstract Background Subtractive hybridization methods are valuable tools for identifying differentially regulated genes in a given tissue avoiding redundant sequencing of clones representing the same expressed genes, maximizing detection of low abundant transcripts and thus, affecting the efficiency and cost effectiveness of small scale cDNA sequencing projects aimed to the specific identification of useful genes for breeding purposes. The objective of this work is to evaluate alternative strategies to high-throughput sequencing projects for the identification of novel genes differentially expressed in sunflower as a source of organ-specific genetic markers that can be functionally associated to important traits. Results Differential organ-specific ESTs were generated from leaf, stem, root and flower bud at two developmental stages (R1 and R4. The use of different sources of RNA as tester and driver cDNA for the construction of differential libraries was evaluated as a tool for detection of rare or low abundant transcripts. Organ-specificity ranged from 75 to 100% of non-redundant sequences in the different cDNA libraries. Sequence redundancy varied according to the target and driver cDNA used in each case. The R4 flower cDNA library was the less redundant library with 62% of unique sequences. Out of a total of 919 sequences that were edited and annotated, 318 were non-redundant sequences. Comparison against sequences in public databases showed that 60% of non-redundant sequences showed significant similarity to known sequences. The number of predicted novel genes varied among the different cDNA libraries, ranging from 56% in the R4 flower to 16 % in the R1 flower bud library. Comparison with sunflower ESTs on public databases showed that 197 of non-redundant sequences (60% did not exhibit significant similarity to previously reported sunflower ESTs. This approach helped to successfully isolate a significant number of new reported sequences

  10. Isolation and characterization of sequences homologous to the tobacco clone axi 1 (auxin independent) from a Vicia sativa nodule cDNA library

    NARCIS (Netherlands)

    Yalçin-Mendi, Y.; Çetiner, S.; Bisseling, T.

    2001-01-01

    In this research, partial nucleotide sequences of the axi 1 gene, which is related to auxin perception and transduction, isolated from Vicia sativa using cDNA library screening were investigated. Four V. sativa cDNA clones representing homologous of the tobacco axi 1 (auxin independent) cDNA clone

  11. [Construction and sequence analysis of a normalized full-length cDNA library of Dendrobium officinale].

    Science.gov (United States)

    Jiang, Min; Wang, Jiang; Wen, Guo-Song; Xu, Shao-Zhong; Zha, Ying-Hong; Rong, Tian-Ju; Qian, Xiong

    2013-02-01

    In order to obtain functional genes, a normalized stems cDNA library was constructed from medicinal plant Dendrobium officinale. SMART (switching mechanism at 5' end of RNA transcript) cDNA synthesis combined with DSN (duplex-specific nuclease) normalization was applied to construct the normalized full-length cDNA library of D. officinale. The titer of cDNA library was about 1.3 x 10(6) cfu x mL(-1) and the average insertion size was about 1.5 kb with high recombination rate (93.9%). Random selected 163 positive clones were sequenced at single side. Bio-information analysis indicated that 147 from 150 high-quality unique sequences matched corresponding homologous proteins, and they participated in various biological processes based on GO (gene ontology). There were 8 clones with complete coding sequence, which presumed to be full-length genes. These results showed preliminarily that we successfully constructed a normalized full-length cDNA library of D. officinale which could be used to screen the functional genes related to metabolic pathways of medicinal ingredients.

  12. The cDNA sequence of three hemocyanin subunits from the garden snail Helix lucorum.

    Science.gov (United States)

    De Smet, Lina; Dimitrov, Ivan; Debyser, Griet; Dolashka-Angelova, Pavlina; Dolashki, Aleksandar; Van Beeumen, Jozef; Devreese, Bart

    2011-11-10

    Hemocyanins are blue copper containing respiratory proteins residing in the hemolymph of many molluscs and arthropods. They can have different molecular masses and quaternary structures. Moreover, several molluscan hemocyanins are isolated with one, two or three isoforms occurring as decameric, didecameric, multidecameric or tubule aggregates. We could recently isolate three different hemocyanin isopolypeptides from the hemolymph of the garden snail Helix lucorum (HlH). These three structural subunits were named α(D)-HlH, α(N)-HlH and β-HlH. We have cloned and sequenced their cDNA which is the first result ever reported for three isoforms of a molluscan hemocyanin. Whereas the complete gene sequence of α(D)-HlH and β-HlH was obtained, including the 5' and 3' UTR, 180bp of the 5' end and around 900bp at the 3' end are missing for the third subunit. The subunits α(D)-HlH and β-HlH comprise a signal sequence of 19 amino acids plus a polypeptide of 3409 and 3414 amino acids, respectively. We could determine 3031 residues of the α(N)-HLH subunit. Sequence comparison with other molluscan hemocyanins shows that α(D)-HlH is more related to Aplysia californicum hemocyanin than to each of its own isopolypeptides. The structural subunits comprise 8 different functional units (FUs: a, b, c, d, e, f, g, h) and each functional unit possesses a highly conserved copper-A and copper-B site for reversible oxygen binding. Potential N-glycosylation sites are present in all three structural subunits. We confirmed that all three different isoforms are effectively produced and secreted in the hemolymph of H. lucorum by analyzing a tryptic digest of the purified native hemocyanin by MALDI-TOF and LC-FTICR mass spectrometry. Copyright © 2011 Elsevier B.V. All rights reserved.

  13. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  14. Cloning and Sequencing of Protein Kinase cDNA from Harbor Seal (Phoca vitulina Lymphocytes

    Directory of Open Access Journals (Sweden)

    Jennifer C. C. Neale

    2004-01-01

    Full Text Available Protein kinases (PKs play critical roles in signal transduction and activation of lymphocytes. The identification of PK genes provides a tool for understanding mechanisms of immunotoxic xenobiotics. As part of a larger study investigating persistent organic pollutants in the harbor seal and their possible immunomodulatory actions, we sequenced harbor seal cDNA fragments encoding PKs. The procedure, using degenerate primers based on conserved motifs of human protein tyrosine kinases (PTKs, successfully amplified nine phocid PK gene fragments with high homology to human and rodent orthologs. We identified eight PTKs and one dual (serine/threonine and tyrosine kinase. Among these were several PKs important in early signaling events through the B- and T-cell receptors (FYN, LYN, ITK and SYK and a MAP kinase involved in downstream signal transduction. V-FGR, RET and DDR2 were also expressed. Sequential activation of protein kinases ultimately induces gene transcription leading to the proliferation and differentiation of lymphocytes critical to adaptive immunity. PKs are potential targets of bioactive xenobiotics, including persistent organic pollutants of the marine environment; characterization of these molecules in the harbor seal provides a foundation for further research illuminating mechanisms of action of contaminants speculated to contribute to large-scale die-offs of marine mammals via immunosuppression.

  15. Construction of a cDNA library from the ephemeral plant Olimarabidopsis pumila and preliminary analysis of expressed sequence tags.

    Science.gov (United States)

    Zhao, Yun-Xia; Wei, Yan-Ling; Zhao, Ping; Xiang, Cheng-Bin; Xu, Fang; Li, Chao; Huang, Xian-Zhong

    2013-01-01

    Olimarabidopsis pumila is a close relative of the model plant Arabidopsis thaliana but, unlike A. thaliana, it is a salt-tolerant ephemeral plant that is widely distributed in semi-arid and semi-salinized regions of the Xinjiang region of China, thus providing an ideal candidate plant system for salt tolerance gene mining. A good-quality cDNA library was constructed using cap antibody to enrich full-length cDNA with the gateway technology allowing library construction without traditional methods of cloning by use of restriction enzymes. A preliminary analysis of expressed sequence tags (ESTs) was carried out. The titers of the primary and the normalized cDNA library were 1.6 x 10(6) cfu/mL and 6.7 x 10(6) cfu/mL, respectively. A total of 1093 clones were randomly selected from the normalized library for EST sequencing. By sequence analysis, 894 high-quality ESTs were generated and assembled into 736 unique sequences consisting of 72 contigs and 664 singletons. The resulting unigenes were categorized according to the gene ontology (GO) hierarchy. The potential roles of gene products associated with stress-related ESTs are discussed. The 736 unigenes were similar to A. thaliana, A. lyrata, or Thellungiella salsuginea. This research provides an overview of the mRNA expression profile and first-hand information of gene sequence expressed in young leaves of O. pumila.

  16. In-depth cDNA library sequencing provides quantitative gene expression profiling in cancer biomarker discovery.

    Science.gov (United States)

    Yang, Wanling; Ying, Dingge; Lau, Yu-Lung

    2009-06-01

    Quantitative gene expression analysis plays an important role in identifying differentially expressed genes in various pathological states, gene expression regulation and co-regulation, shedding light on gene functions. Although microarray is widely used as a powerful tool in this regard, it is suboptimal quantitatively and unable to detect unknown gene variants. Here we demonstrated effective detection of differential expression and co-regulation of certain genes by expressed sequence tag analysis using a selected subset of cDNA libraries. We discussed the issues of sequencing depth and library preparation, and propose that increased sequencing depth and improved preparation procedures may allow detection of many expression features for less abundant gene variants. With the reduction of sequencing cost and the emerging of new generation sequencing technology, in-depth sequencing of cDNA pools or libraries may represent a better and powerful tool in gene expression profiling and cancer biomarker detection. We also propose using sequence-specific subtraction to remove hundreds of the most abundant housekeeping genes to increase sequencing depth without affecting relative expression ratio of other genes, as transcripts from as few as 300 most abundantly expressed genes constitute about 20% of the total transcriptome. In-depth sequencing also represents a unique advantage of detecting unknown forms of transcripts, such as alternative splicing variants, fusion genes, and regulatory RNAs, as well as detecting mutations and polymorphisms that may play important roles in disease pathogenesis.

  17. Nucleotide sequence and infectious cDNA clone of the L1 isolate of Pea seed-borne mosaic potyvirus.

    Science.gov (United States)

    Olsen, B S; Johansen, I E

    2001-01-01

    The complete nucleotide sequence of Pea seed-borne mosaic potyvirus isolate L1 has been determined from cloned virus cDNA. The PSbMV L1 genome is 9895 nucleotides in length excluding the poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9594 nucleotides. The ORF potentially encodes a polyprotein of 3198 amino acids with a deduced Mr of 363537. Nine putative proteolytic cleavage sites were identified by analogy to consensus sequences and genome arrangement in other potyviruses. Two full-length cDNA clones, p35S-L1-4 and p35S-L1-5, were assembled under control of an enhanced 35S promoter and nopaline synthase terminator. Clone p35S-L1-4 was constructed with four introns and p35S-L1-5 with five introns inserted in the cDNA. Clone p35S-L1-4 was unstable in Escherichia coli often resulting in amplification of plasmids with deletions. Clone p35S-L1-5 was stable and apparently less toxic to Escherichia coli resulting in larger bacterial colonies and higher plasmid yield. Both clones were infectious upon mechanical inoculation of plasmid DNA on susceptible pea cultivars Fjord, Scout, and Brutus. Eight pea genotypes resistant to L1 virus were also resistant to the cDNA derived L1 virus. Both native PSbMV L1 and the cDNA derived virus infected Chenopodium quinoa systemically giving rise to characteristic necrotic lesions on uninoculated leaves.

  18. Construction of cDNA library and preliminary analysis of expressed sequence tags from green microalga Ankistrodesmus convolutus Corda.

    Science.gov (United States)

    Thanh, Tran; Chi, Vu Thi Quynh; Abdullah, Mohd Puad; Omar, Hishamuddin; Noroozi, Mostafa; Ky, Huynh; Napis, Suhaimi

    2011-01-01

    Green microalga Ankistrodesmus convolutus Corda is a fast growing alga which produces appreciable amount of carotenoids and polyunsaturated fatty acids. To our knowledge, this is the first report on the construction of cDNA library and preliminary analysis of ESTs for this species. The titers of the primary and amplified cDNA libraries were 1.1×10(6) and 6.0×10(9) pfu/ml respectively. The percentage of recombinants was 97% in the primary library and a total of 337 out of 415 original cDNA clones selected randomly contained inserts ranging from 600 to 1,500 bps. A total of 201 individual ESTs with sizes ranging from 390 to 1,038 bps were then analyzed and the BLASTX score revealed that 35.8% of the sequences were classified as strong match, 38.3% as nominal and 25.9% as weak match. Among the ESTs with known putative function, 21.4% of them were found to be related to gene expression, 14.4% ESTs to photosynthesis, 10.9% ESTs to metabolism, 5.5% ESTs to miscellaneous, 2.0% to stress response, and the remaining 45.8% were classified as novel genes. Analysis of ESTs described in this paper can be an effective approach to isolate and characterize new genes from A. convolutus and thus the sequences obtained represented a significant contribution to the extensive database of sequences from green microalgae.

  19. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

    Directory of Open Access Journals (Sweden)

    Adam D. Hargreaves

    2015-11-01

    Full Text Available Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete and Sanger-based ESTs (15/29. We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

  20. 5'-end sequences of budding yeast full-length cDNA clones - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...cription Download License Update History of This Database Site Policy | Contact Us 5'-end sequences of buddi

  1. Generation of a large scale repertoire of Expressed Sequence Tags (ESTs from normalised rainbow trout cDNA libraries

    Directory of Open Access Journals (Sweden)

    Guiguen Yann

    2006-08-01

    Full Text Available Abstract Background Within the framework of a genomics project on livestock species (AGENAE, we initiated a high-throughput DNA sequencing program of Expressed Sequence Tags (ESTs in rainbow trout, Oncorhynchus mykiss. Results We constructed three cDNA libraries including one highly complex pooled-tissue library. These libraries were normalized and subtracted to reduce clone redundancy. ESTs sequences were produced, and 96 472 ESTs corresponding to high quality sequence reads were released on the international database, currently representing 42.5% of the overall sequence knowledge in this species. All these EST sequences and other publicly available ESTs in rainbow trout have been included on a publicly available Website (SIGENAE and have been clustered into a total of 52 930 clusters of putative transcripts groups, including 24 616 singletons. 57.1% of these 52 930 clusters are represented by at least one Agenae EST and 14 343 clusters (27.1% are only composed by Agenae ESTs. Sequence analysis also reveals that normalization and especially subtraction were effective in decreasing redundancy, and that the pooled-tissue library was representative of the initial tissue complexity. Conclusion Due to present work on the construction of rainbow trout normalized cDNA libraries and their extensive sequencing, along with other large scale sequencing programs, rainbow trout is now one of the major fish models in term of EST sequences available in a public database, just after Zebrafish, Danio rerio. This information is now used for the selection of a non redundant set of clones for producing DNA micro-arrays in order to examine global gene expression.

  2. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  3. cDNA sequence analysis of a 29-kDa cysteine-rich surface antigen of pathogenic Entamoeba histolytica

    International Nuclear Information System (INIS)

    Torian, B.E.; Stroeher, V.L.; Stamm, W.E.; Flores, B.M.; Hagen, F.S.

    1990-01-01

    A λgt11 cDNA library was constructed from poly(U)-Spharose-selected Entamoeba histolytica trophozoite RNA in order to clone and identify surface antigens. The library was screened with rabbit polyclonal anti-E. histolytica serum. A 700-base-pair cDNA insert was isolated and the nucleotide sequence was determined. The deduced amino acid sequence of the cDNA revealed a cysteine-rich protein. DNA hybridizations showed that the gene was specific to E. histolytica since the cDNA probe reacted with DNA from four axenic strains of E. histolytica but did not react with DNA from Entamoeba invadens, Acanthamoeba castellanii, or Trichomonas vaginalis. The insert was subcloned into the expression vector pGEX-1 and the protein was expressed as a fusion with the C terminus of glutathione S-transferase. Purified fusion protein was used to generate 22 monoclonal antibodies (mAbs) and a mouse polyclonal antiserum specific for the E. histolytica portion of the fusion protein. A 29-kDa protein was identified as a surface antigen when mAbs were used to immunoprecipitate the antigen from metabolically 35 S-labeled live trophozoites. The surface location of the antigen was corroborated by mAb immunoprecipitation of a 29-kDa protein from surface- 125 I-labeled whole trophozoites as well as by the reaction of mAbs with live trophozoites in an indirect immunofluorescence assay performed at 4 degree C. Immunoblotting with mAbs demonstrated that the antigen was present on four axenic isolates tested. mAbs recognized epitopes on the 29-kDa native antigen on some but not all clinical isolates tested

  4. cDNA sequence analysis of a 29-kDa cysteine-rich surface antigen of pathogenic Entamoeba histolytica

    Energy Technology Data Exchange (ETDEWEB)

    Torian, B.E.; Stroeher, V.L.; Stamm, W.E. (Univ. of Washington, Seattle (USA)); Flores, B.M. (Louisiana State Univ. Medical Center, New Orleans (USA)); Hagen, F.S. (Zymogenetics Incorporated, Seattle, WA (USA))

    1990-08-01

    A {lambda}gt11 cDNA library was constructed from poly(U)-Spharose-selected Entamoeba histolytica trophozoite RNA in order to clone and identify surface antigens. The library was screened with rabbit polyclonal anti-E. histolytica serum. A 700-base-pair cDNA insert was isolated and the nucleotide sequence was determined. The deduced amino acid sequence of the cDNA revealed a cysteine-rich protein. DNA hybridizations showed that the gene was specific to E. histolytica since the cDNA probe reacted with DNA from four axenic strains of E. histolytica but did not react with DNA from Entamoeba invadens, Acanthamoeba castellanii, or Trichomonas vaginalis. The insert was subcloned into the expression vector pGEX-1 and the protein was expressed as a fusion with the C terminus of glutathione S-transferase. Purified fusion protein was used to generate 22 monoclonal antibodies (mAbs) and a mouse polyclonal antiserum specific for the E. histolytica portion of the fusion protein. A 29-kDa protein was identified as a surface antigen when mAbs were used to immunoprecipitate the antigen from metabolically {sup 35}S-labeled live trophozoites. The surface location of the antigen was corroborated by mAb immunoprecipitation of a 29-kDa protein from surface-{sup 125}I-labeled whole trophozoites as well as by the reaction of mAbs with live trophozoites in an indirect immunofluorescence assay performed at 4{degree}C. Immunoblotting with mAbs demonstrated that the antigen was present on four axenic isolates tested. mAbs recognized epitopes on the 29-kDa native antigen on some but not all clinical isolates tested.

  5. Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure

    International Nuclear Information System (INIS)

    Hummel, K.B.; Litwer, S.; Bradford, A.P.; Aitken, A.; Danner, D.J.; Yeaman, S.J.

    1988-01-01

    Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain α-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and α-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases

  6. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    Full Text Available Abstract Background Melon (Cucumis melo, an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs and 3,073 single nucleotide polymorphisms (SNPs in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but

  7. Primary structure of bovine pituitary secretory protein I (chromogranin A) deduced from the cDNA sequence

    Energy Technology Data Exchange (ETDEWEB)

    Ahn, T.G.; Cohn, D.V.; Gorr, S.U.; Ornstein, D.L.; Kashdan, M.A.; Levine, M.A.

    1987-07-01

    Secretory protein I (SP-I), also referred to as chromogranin A, is an acidic glycoprotein that has been found in every tissue of endocrine and neuroendocrine origin examined but never in exocrine or epithelial cells. Its co-storage and co-secretion with peptide hormones and neurotransmitters suggest that it has an important endocrine or secretory function. The authors have isolated cDNA clones from a bovine pituitary lambdagt11 expression library using an antiserum to parathyroid SP-I. The largest clone (SP4B) hybridized to a transcript of 2.1 kilobases in RNA from parathyroid, pituitary, and adrenal medulla. Immunoblots of bacterial lysates derived from SP4B lysognes demonstrated specific antibody binding to an SP4B/..beta..-galactosidase fusion protein (160 kDa) with a cDNA-derived component of 46 kDa. Radioimmunoassay of the bacterial lystates with SP-I antiserum yielded parallel displacement curves of /sup 125/I-labeled SP-I by the SP4B lysate and authentic SP-I. SP4B contains a cDNA of 1614 nucleotides that encodes a 449-amino acid protein (calculated mass, 50 kDa). The nucleotide sequences of the pituitary SP-I cDNA and adrenal medullary SP-I cDNAs are nearly identical. Analysis of genomic DNA suggests that pituitary, adrenal, and parathyroid SP-I are products of the same gene.

  8. Primary structure of bovine pituitary secretory protein I (chromogranin A) deduced from the cDNA sequence

    International Nuclear Information System (INIS)

    Ahn, T.G.; Cohn, D.V.; Gorr, S.U.; Ornstein, D.L.; Kashdan, M.A.; Levine, M.A.

    1987-01-01

    Secretory protein I (SP-I), also referred to as chromogranin A, is an acidic glycoprotein that has been found in every tissue of endocrine and neuroendocrine origin examined but never in exocrine or epithelial cells. Its co-storage and co-secretion with peptide hormones and neurotransmitters suggest that it has an important endocrine or secretory function. The authors have isolated cDNA clones from a bovine pituitary λgt11 expression library using an antiserum to parathyroid SP-I. The largest clone (SP4B) hybridized to a transcript of 2.1 kilobases in RNA from parathyroid, pituitary, and adrenal medulla. Immunoblots of bacterial lysates derived from SP4B lysognes demonstrated specific antibody binding to an SP4B/β-galactosidase fusion protein (160 kDa) with a cDNA-derived component of 46 kDa. Radioimmunoassay of the bacterial lystates with SP-I antiserum yielded parallel displacement curves of 125 I-labeled SP-I by the SP4B lysate and authentic SP-I. SP4B contains a cDNA of 1614 nucleotides that encodes a 449-amino acid protein (calculated mass, 50 kDa). The nucleotide sequences of the pituitary SP-I cDNA and adrenal medullary SP-I cDNAs are nearly identical. Analysis of genomic DNA suggests that pituitary, adrenal, and parathyroid SP-I are products of the same gene

  9. Bioinformatic analysis of barcoded cDNA libraries for small RNA profiling by next-generation sequencing.

    Science.gov (United States)

    Farazi, Thalia A; Brown, Miguel; Morozov, Pavel; Ten Hoeve, Jelle J; Ben-Dov, Iddo Z; Hovestadt, Volker; Hafner, Markus; Renwick, Neil; Mihailović, Aleksandra; Wessels, Lodewyk F A; Tuschl, Thomas

    2012-10-01

    The characterization of post-transcriptional gene regulation by small regulatory RNAs of 20-30 nt length, particularly miRNAs and piRNAs, has become a major focus of research in recent years. A prerequisite for the characterization of small RNAs is their identification and quantification across different developmental stages, normal and diseased tissues, as well as model cell lines. Here we present a step-by-step protocol for the bioinformatic analysis of barcoded cDNA libraries for small RNA profiling generated by Illumina sequencing, thereby facilitating miRNA and other small RNA profiling of large sample collections. Copyright © 2012 Elsevier Inc. All rights reserved.

  10. Acetylcholinesterase of the sand fly, Phlebotomus papatasi (Scopoli): cDNA sequence, baculovirus expression, and biochemical properties.

    Science.gov (United States)

    Temeyer, Kevin B; Brake, Danett K; Tuckow, Alexander P; Li, Andrew Y; Pérez de León, Adalberto A

    2013-02-04

    Millions of people and domestic animals around the world are affected by leishmaniasis, a disease caused by various species of flagellated protozoans in the genus Leishmania that are transmitted by several sand fly species. Insecticides are widely used for sand fly population control to try to reduce or interrupt Leishmania transmission. Zoonotic cutaneous leishmaniasis caused by L. major is vectored mainly by Phlebotomus papatasi (Scopoli) in Asia and Africa. Organophosphates comprise a class of insecticides used for sand fly control, which act through the inhibition of acetylcholinesterase (AChE) in the central nervous system. Point mutations producing an altered, insensitive AChE are a major mechanism of organophosphate resistance in insects and preliminary evidence for organophosphate-insensitive AChE has been reported in sand flies. This report describes the identification of complementary DNA for an AChE in P. papatasi and the biochemical characterization of recombinant P. papatasi AChE. A P. papatasi Israeli strain laboratory colony was utilized to prepare total RNA utilized as template for RT-PCR amplification and sequencing of cDNA encoding acetylcholinesterase 1 using gene specific primers and 3'-5'-RACE. The cDNA was cloned into pBlueBac4.5/V5-His TOPO, and expressed by baculovirus in Sf21 insect cells in serum-free medium. Recombinant P. papatasi acetylcholinesterase was biochemically characterized using a modified Ellman's assay in microplates. A 2309 nucleotide sequence of PpAChE1 cDNA [GenBank: JQ922267] of P. papatasi from a laboratory colony susceptible to insecticides is reported with 73-83% nucleotide identity to acetylcholinesterase mRNA sequences of Culex tritaeniorhynchus and Lutzomyia longipalpis, respectively. The P. papatasi cDNA ORF encoded a 710-amino acid protein [GenBank: AFP20868] exhibiting 85% amino acid identity with acetylcholinesterases of Cx. pipiens, Aedes aegypti, and 92% amino acid identity for L. longipalpis. Recombinant P

  11. Intervening sequences in a plant gene-comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin

    Science.gov (United States)

    Sun, S. M.; Slightom, J. L.; Hall, T. C.

    1981-01-01

    A plant gene coding for the major storage protein (phaseolin, G1-globulin) of the French bean was isolated from a genomic library constructed in the phage vector Charon 24A. Comparison of the nucleotide sequence of part of the gene with that of the cloned messenger RNA (cDNA) revealed the presence of three intervening sequences, all beginning with GTand ending with AG. The 5' and 3' boundaries of intervening sequences TVS-A (88 base pairs) and IVS-B (124 base pairs) are similar to those described for animal and viral genes, but the 3' boundary of IVS-C (129 base pairs) shows some differences. A sequence of 185 amino acids deduced from the cloned DMAs represents about 40% of a phaseolin polypeptide.

  12. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Directory of Open Access Journals (Sweden)

    Reginaldo M Kuroshu

    Full Text Available BACKGROUND: Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. METHODOLOGY: We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded, and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. CONCLUSIONS: The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  13. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  14. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  15. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  16. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr. Different Developmental Stages

    Directory of Open Access Journals (Sweden)

    Jun-Feng Li

    2012-10-01

    Full Text Available To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN. This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  17. cDNA sequence and tissue expression analysis of glucokinase from ...

    African Journals Online (AJOL)

    Yomi

    2012-01-10

    Jan 10, 2012 ... (Rattus norvegicus) were 98.1, 96.8, 80.3 and 79.8%, respectively. Phylogenetic analysis based on GK amino acid sequences. Phylogenetic analysis among eight fish species, eleven endothermic species and one amphibian species based on glucokinase amino acid sequences is shown in Figure. 3.

  18. Cloning, sequencing and expression of a novel xylanase cDNA from ...

    African Journals Online (AJOL)

    A strain SH 2016, capable of producing xylanase, was isolated and identified as Aspergillus awamori, based on its physiological and biochemical characteristics as well as its ITS rDNA gene sequence analysis. A xylanase gene of 591 bp was cloned from this newly isolated A. awamori and the ORF sequence predicted a ...

  19. ANDES: Statistical tools for the ANalyses of DEep Sequencing

    Directory of Open Access Journals (Sweden)

    Denison Mark R

    2010-07-01

    Full Text Available Abstract Background The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA, where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep. Findings We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA. Tools include the root mean square deviation (RMSD plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates, variation (Shannon entropy, inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS plot, threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values. Conclusions As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its

  20. cDNA, amino acid carbohydrate sequence of barley seed-specific peroxidase BP 1

    DEFF Research Database (Denmark)

    Johansson, A.; Rasmussen, Søren Kjærsgård; Harthill, J.E.

    1992-01-01

    of 69% in the translated region, a 90% G or C preference in the wobble position of the codons and a typical signal peptide sequence. N-terminal amino acid sequencing and sequence analysis of tryptic peptides verified 98% of the sequence of the mature BP 1 which contains 309 amino acid residues. BP 1...... biological role of this enzyme. The barley peroxidase is processed at the C-terminus and might be targeted to the vacuole. The single site of glycosylation is located near the C-terminus in the N-glycosylation sequon -Asn-Cys-Ser- in which Cys forms part of a disulphide bridge. The major glycan is a typical...

  1. Biases in small RNA deep sequencing data

    OpenAIRE

    Raabe, Carsten A.; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S.

    2013-01-01

    High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samp...

  2. Genome-scale validation of deep-sequencing libraries.

    Directory of Open Access Journals (Sweden)

    Dominic Schmidt

    Full Text Available Chromatin immunoprecipitation followed by high-throughput (HTP sequencing (ChIP-seq is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used to maximize the efficiency of library creation and quickly generate reliable preliminary data on a chromosomal scale in advance of deep sequencing. We also exploit this technique to compare enriched regions identified using microarrays with those identified by sequencing, demonstrating that they agree on a core set of clearly identified enriched regions, while characterizing the additional enriched regions identifiable using HTP sequencing.

  3. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  4. Generation of longer 3′ cDNA fragments from massively parallel signature sequencing tags

    OpenAIRE

    Silva, Ana Paula M.; Chen, Jianjun; Carraro, Dirce M.; Wang, San Ming; Camargo, Anamaria A.

    2004-01-01

    Massively Parallel Signature Sequencing (MPSS) is a powerful technique for genome-wide gene expression analysis, which, similar to SAGE, relies on the production of short tags proximal to the 3′end of transcripts. A single MPSS experiment can generate over 107 tags, providing a 10-fold coverage of the transcripts expressed in a human cell. A significant fraction of MPSS tags cannot be assigned to known transcripts (orphan tags) and are likely to be derived from transcripts expressed at very l...

  5. Deep sequencing as a method of typing bluetongue virus isolates.

    Science.gov (United States)

    Rao, Pavuluri Panduranga; Reddy, Yella Narasimha; Ganesh, Kapila; Nair, Shreeja G; Niranjan, Vidya; Hegde, Nagendra R

    2013-11-01

    Bluetongue (BT) is an economically important endemic disease of livestock in tropics and subtropics. In addition, its recent spread to temperate regions like North America and Northern Europe is of serious concern. Rapid serotyping and characterization of BT virus (BTV) is an essential step in the identification of origin of the virus and for controlling the disease. Serotyping of BTV is typically performed by serum neutralization, and of late by nucleotide sequencing. This report describes the near complete genome sequencing and typing of two isolates of BTV using Illumina next generation sequencing platform. Two of the BTV RNAs were multiplexed with ten other unknown samples. Viral RNA was isolated and fragmented, reverse transcribed, the cDNA ends were repaired and ligated with a multiplex oligo. The genome library was amplified using primers complementary to the ligated oligo and subjected to single and paired end sequencing. The raw reads were assembled using a de novo method and reference-based assembly was performed based on the contig data. Near complete sequences of all segments of BTV were obtained with more than 20× coverage, and single read sequencing method was sufficient to identify the genotype and serotype of the virus. The two viruses used in this study were typed as BTV-1 and BTV-9E. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction.

    Science.gov (United States)

    Khurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo-Yu; Bensmail, Halima; Mall, Raghvendra

    2018-03-15

    Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning based protein solubility predictor. The backbone of our framework is a Convolutional Neural Network (CNN) that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. DeepSol's best performing models and results are publicly deposited at https://doi. org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). skhurana@mit.edu and rmall@hbku.edu.qa. Supplementary data are available at Bioinformatics online.

  7. Deep sequencing in pre- and clinical vaccine research.

    Science.gov (United States)

    Prachi, P; Donati, C; Masciopinto, F; Rappuoli, R; Bagnoli, F

    2013-01-01

    Vaccine research has experienced a quantum leap after the beginning of the genomics era. High-throughput sequencing techniques, unlimited computing resources, as well as new bioinformatic algorithms are now changing the way we perform genomic studies. Whole genome sequencing will soon become the gold standard for phylogenetic and epidemiology studies and is already shedding new light on the dynamics of bacterial evolution. We believe that deep sequencing projects, together with structural studies on vaccine candidates, will allow targeting constant epitopes and avoid vaccine failure due to antigenic variability. Systems biology, which is expected to revolutionize vaccine research and clinical studies, greatly relies on high-throughput technologies such as RNA-seq. Furthermore, genomics is a key element to develop safer vaccines, and the accuracy of deep sequencing will allow monitoring vaccine coverage after their introduction on the market. Copyright © 2013 S. Karger AG, Basel.

  8. Increased mRNA expression of a laminin-binding protein in human colon carcinoma: Complete sequence of a full-length cDNA encoding the protein

    International Nuclear Information System (INIS)

    Yow, Hsiukang; Wong, Jau Min; Chen, Hai Shiene; Lee, C.; Steele, G.D. Jr.; Chen, Lanbo

    1988-01-01

    Reliable markers to distinguish human colon carcinoma from normal colonic epithelium are needed particularly for poorly differentiated tumors where no useful marker is currently available. To search for markers the authors constructed cDNA libraries from human colon carcinoma cell lines and screened for clones that hybridize to a greater degree with mRNAs of colon carcinomas than with their normal counterparts. Here they report one such cDNA clone that hybridizes with a 1.2-kilobase (kb) mRNA, the level of which is ∼9-fold greater in colon carcinoma than in adjacent normal colonic epithelium. Blot hybridization of total RNA from a variety of human colon carcinoma cell lines shows that the level of this 1.2-kb mRNA in poorly differentiated colon carcinomas is as high as or higher than that in well-differentiated carcinomas. Molecular cloning and complete sequencing of cDNA corresponding to the full-length open reading frame of this 1.2-kb mRNA unexpectedly show it to contain all the partial cDNA sequence encoding 135 amino acid residues previously reported for a human laminin receptor. The deduced amino acid sequence suggests that this putative laminin-binding protein from human colon carcinomas consists of 295 amino acid residues with interesting features. There is an unusual C-terminal 70-amino acid segment, which is trypsin-resistant and highly negatively charged

  9. Sequencing and comparative genomics analysis inSenecio scandensBuch.-Ham. Ex D. Don, based on full-length cDNA library.

    Science.gov (United States)

    Qian, Gang; Ping, Junjiao; Zhang, Zhen; Xu, Delin

    2014-09-03

    Senecio scandens Buch.-Ham. ex D. Don, an important antibacterial source of Chinese traditional medicine, has a widespread distribution in a few ecological habitats of China. We generated a full-length complementary DNA (cDNA) library from a sample of elite individuals with superior antibacterial properties, with satisfactory parameters such as library storage (4.30 × 10 6 CFU), efficiency of titre (1.30 × 10 6 CFU/mL), transformation efficiency (96.35%), full-length ratio (64.00%) and redundancy ratio (3.28%). The BLASTN search revealed the facile formation of counterparts between the experimental sample and Arabidopsis thaliana in view of high-homology cDNA sequence (90.79%) with e -values cDNA clones consist of the major of functional genes identified by a large set of microarray data from the present experimental material. For other Compositae species, a large set of full-length cDNA clones reported in the present article will serve as a useful resource to facilitate further research on the transferability of expressed sequence tag-derived simple sequence repeats (EST-SSR) development, comparative genomics and novel transcript profiles.

  10. Nucleotide sequence of Phaseolus vulgaris L. alcohol dehydrogenase encoding cDNA and three-dimensional structure prediction of the deduced protein.

    Science.gov (United States)

    Amelia, Kassim; Khor, Chin Yin; Shah, Farida Habib; Bhore, Subhash J

    2015-01-01

    Common beans (Phaseolus vulgaris L.) are widely consumed as a source of proteins and natural products. However, its yield needs to be increased. In line with the agenda of Phaseomics (an international consortium), work of expressed sequence tags (ESTs) generation from bean pods was initiated. Altogether, 5972 ESTs have been isolated. Alcohol dehydrogenase (AD) encoding gene cDNA was a noticeable transcript among the generated ESTs. This AD is an important enzyme; therefore, to understand more about it this study was undertaken. The objective of this study was to elucidate P. vulgaris L. AD (PvAD) gene cDNA sequence and to predict the three-dimensional (3D) structure of deduced protein. positive and negative strands of the PvAD cDNA clone were sequenced using M13 forward and M13 reverse primers to elucidate the nucleotide sequence. Deduced PvAD cDNA and protein sequence was analyzed for their basic features using online bioinformatics tools. Sequence comparison was carried out using bl2seq program, and tree-view program was used to construct a phylogenetic tree. The secondary structures and 3D structure of PvAD protein were predicted by using the PHYRE automatic fold recognition server. The sequencing results analysis showed that PvAD cDNA is 1294 bp in length. It's open reading frame encodes for a protein that contains 371 amino acids. Deduced protein sequence analysis showed the presence of putative substrate binding, catalytic Zn binding, and NAD binding sites. Results indicate that the predicted 3D structure of PvAD protein is analogous to the experimentally determined crystal structure of s-nitrosoglutathione reductase from an Arabidopsis species. The 1294 bp long PvAD cDNA encodes for 371 amino acid long protein that contains conserved domains required for biological functions of AD. The predicted deduced PvAD protein's 3D structure reflects the analogy with the crystal structure of Arabidopsis thaliana s-nitrosoglutathione reductase. Further study is required

  11. Sequencing over 13 000 expressed sequence tags from six subtractive cDNA libraries of wild and modern wheats following slow drought stress.

    Science.gov (United States)

    Ergen, Neslihan Z; Budak, Hikmet

    2009-03-01

    A deeper understanding of the drought response and genetic improvement of the cultivated crops for better tolerance requires attention because of the complexity of the drought response syndrome and the loss of genetic diversity during domestication. We initially screened about 200 wild emmer wheat genotypes and then focused on 26 of these lines, which led to the selection of two genotypes with contrasting responses to water deficiency. Six subtractive cDNA libraries were constructed, and over 13 000 expressed sequence tags (ESTs) were sequenced using leaf and root tissues of wild emmer wheat genotypes TR39477 (tolerant) and TTD-22 (sensitive), and modern wheat variety Kiziltan drought stressed for 7 d. Clustering and assembly of ESTs resulted in 2376 unique sequences (1159 without hypothetical proteins and no hits), 75% of which were represented only once. At this level of EST sampling, each tissue shared a very low percentage of transcripts (13-26%). The data obtained indicated that the genotypes shared common elements of drought stress as well as distinctly differential expression patterns that might be illustrative of their contrasting ability to tolerate water deficiencies. The new EST data generated here provide a highly diverse and rich source for gene discovery in wheat and other grasses.

  12. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  13. Cloning and sequencing of a cDNA for the delta-subunit of photosynthetic ATP-synthase (EC 3.6.1.34) from pea (Pisum sativum).

    Science.gov (United States)

    Hoesche, J A; Berzborn, R J

    1992-12-29

    lambda gt10 cDNA clones for the nuclear encoded subunit delta of chloroplast ATP-synthase from Pisum sativum have been isolated. The 5' end was completed by PCR. The sequenced cDNA codes for the import precursor. N-Terminal sequencing of the mature protein isolated from chloroplasts revealed that the processing sites of the transit peptide from Pisum sativum and Spinacea oleracea are similar. The overall homology of the deduced amino acid sequences of the mature delta proteins from higher plants is about 40%. The conservation among hydrophilic residues is higher than for hydrophobic ones, indicating that the surface of delta is important for its function within the ATP-synthase.

  14. CONSTRUCTION OF SILKWORM MIDGUT cDNA LIBRARY FOR SCREEN AND SEQUENCE ANALYSIS OF PERITROPHIC MEMBRANE PROTEIN GENES.

    Science.gov (United States)

    Zhou, Yi-Jun; Xue, Bin; Li, Yang-Yang; Li, Fan-Chi; Ni, Min; Shen, Wei-De; Gu, Zhi-Ya; Li, Bing; Shen, Wei-De; Gu, Zhi-Ya; Li, Bing

    2016-01-01

    Silkworm is an important economic insect and the model species for Lepidoptera. The midgut of silkworm is an important physiological barrier, as its peritrophic membrane (PM) can resist pathogen invasion. In this study, a silkworm midgut cDNA library was constructed in order to identify silkworm PM genes. The capacity of the initial library was 6.92 × 10(6) pfu/ml, along with a recombination rate of 92.14% and a postamplification titer of 4.10 × 10(9) pfu/ml. Three silkworm PM protein genes were obtained by immunoscreening, two of which were chitin-binding protein (CBP) genes and one of which was a chitin deacetylase (CDA) gene as revealed by sequence analysis. Three genes were named BmCBP02, BmCBP13, and BmCDA17, and their ORF sizes are 678, 1,029, and 645 bp, respectively; all of them contain sequences of chitin-binding domains. Phylogenetic analysis indicated that BmCBP02 has the highest consensus with Mamestra configurata CBP at 61.0%; BmCBP13 has the highest consensus with Loxostege sticticalis PM CBP at 53.35%; BmCDA17 has the highest consensus with Helicoverpa armigera CDA5a at 70.83%. Tissue transcriptional analysis revealed that all three genes were specifically expressed in the midgut, and during the developmental process of fifth-instar silkworms, the transcription of all the genes showed an upward trend. This study laid a foundation for further studies on the functions of silkworm PM genes. © 2015 Wiley Periodicals, Inc.

  15. Ferritin from the Pacific abalone Haliotis discus hannai: Analysis of cDNA sequence, expression, and activity.

    Science.gov (United States)

    Qiu, Reng; Kan, Yunchao; Li, Dandan

    2016-02-01

    Ferritin plays an important role in iron homeostasis due to its ability to bind and sequester large amounts of iron. In this study, the gene encoding a ferritin (HdhFer2) was cloned from Pacific abalone (Haliotis discus hannai). The full-length cDNA of HdhFer2 contains a 5'-UTR of 121 bp, an ORF of 516 bp, and a 3'-UTR of 252 bp with a polyadenylation signal sequence of AATAAA and a poly(A) tail. It also contains a 31 bp iron-responsive element (IRE) in the 5'-UTR position, which is conserved in many ferritins. HdhFer2 consists of 171 amino acid residues with a predicted molecular weight (MW) ∼19.8 kDa and a theoretical isoelectric point (PI) of 4.84. The deduced amino acid sequence of HdhFer2 contains two ferritin iron-binding region signatures (IBRSs). HdhFer2 mRNA was detected in a wide range of tissues and was dominantly expressed in the gill. Infection with the bacterial pathogen Vibrio anguillarum significantly upregulated HdhFer2 expression in a time-dependent manner. Recombinant HdhFer2 (rHdhFer2) purified from Escherichia coli was able to bind ferrous iron in a concentration-dependent manner. In summary, these results suggest that HdhFer2 is a crucial protein in the iron-withholding defense system, and plays an important role in the innate immune response of abalone. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Cloning, sequence analysis, and expression of cDNA coding for the major house dust mite allergen, Der f 1, in Escherichia coli

    Directory of Open Access Journals (Sweden)

    Y. Cui

    2008-05-01

    Full Text Available Our objective was to clone, express and characterize adult Dermatophagoides farinae group 1 (Der f 1 allergens to further produce recombinant allergens for future clinical applications in order to eliminate side reactions from crude extracts of mites. Based on GenBank data, we designed primers and amplified the cDNA fragment coding for Der f 1 by nested-PCR. After purification and recovery, the cDNA fragment was cloned into the pMD19-T vector. The fragment was then sequenced, subcloned into the plasmid pET28a(+, expressed in Escherichia coli BL21 and identified by Western blotting. The cDNA coding for Der f 1 was cloned, sequenced and expressed successfully. Sequence analysis showed the presence of an open reading frame containing 966 bp that encodes a protein of 321 amino acids. Interestingly, homology analysis showed that the Der p 1 shared more than 87% identity in amino acid sequence with Eur m 1 but only 80% with Der f 1. Furthermore, phylogenetic analyses suggested that D. pteronyssinus was evolutionarily closer to Euroglyphus maynei than to D. farinae, even though D. pteronyssinus and D. farinae belong to the same Dermatophagoides genus. A total of three cysteine peptidase active sites were found in the predicted amino acid sequence, including 127-138 (QGGCGSCWAFSG, 267-277 (NYHAVNIVGYG and 284-303 (YWIVRNSWDTTWGDSGYGYF. Moreover, secondary structure analysis revealed that Der f 1 contained an a helix (33.96%, an extended strand (17.13%, a ß turn (5.61%, and a random coil (43.30%. A simple three-dimensional model of this protein was constructed using a Swiss-model server. The cDNA coding for Der f 1 was cloned, sequenced and expressed successfully. Alignment and phylogenetic analysis suggests that D. pteronyssinus is evolutionarily more similar to E. maynei than to D. farinae.

  17. Construction of cDNA library and preliminary analysis of expressed sequence tags from tea plant [Camellia sinensis (L) O. Kuntze].

    Science.gov (United States)

    Phukon, Munmi; Namdev, Richa; Deka, Diganta; Modi, Mahendra K; Sen, Priyabrata

    2012-09-10

    Tea is the most popular non-alcoholic and healthy beverage across the world. The understanding of the genetic organization and molecular biology of tea plant, which is very poorly understood at present, is required for quantum increase in productivity and efficient use of germplasm for either cultivation or breeding program. Single-pass sequencing of randomly selected cDNA clones is the most widely accepted technique for gene identification and cloning. In the present study, a good quality cDNA library was constructed and preliminary analysis of ESTs was carried out. The titers of unamplified and amplified libraries were 1.4 × 10(6)pfu/ml and 5.27 × 10(8)pfu/ml respectively. A total of 210 cDNA clones from the constructed cDNA library were sequenced and analyzed. A total of 84 high quality Expressed Sequence Tags (ESTs) were generated, among which 71 ESTs had significant homology with sequences in NCBI non-redundant protein database by BLAST X analysis. About 80% ESTs had poly (A) tail at 3' end indicating that the cDNAs were full length. The database-matched ESTs were classified into putative cellular roles, viz. energy-related category (corresponding to 20% of total BLAST X matched ESTs), Transcription (14.2%), protein synthesis (14.2%) cell growth and division (8.6%), cell structure (5.7%), signal transduction (5.7%), transporters (2.9%), disease and defenses (2.9%), secondary metabolism (2.9%) and gene regulation (2.9%). This study provides an overview of the mRNA expression profile and first hand information of gene sequence expressed in tender leaves and apical buds of tea plant. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Cloning and sequencing of cDNA encoding human DNA topoisomerase II and localization of the gene to chromosome region 17q21-22

    International Nuclear Information System (INIS)

    Tsai-Pflugfelder, M.; Liu, L.F.; Liu, A.A.; Tewey, K.M.; Whang-Peng, J.; Knutsen, T.; Huebner, K.; Croce, C.M.; Wang, J.C.

    1988-01-01

    Two overlapping cDNA clones encoding human DNA topoisomerase II were identified by two independent methods. In one, a human cDNA library in phage λ was screened by hybridization with a mixed oligonucleotide probe encoding a stretch of seven amino acids found in yeast and Drosophila DNA topoisomerase II; in the other, a different human cDNA library in a λgt11 expression vector was screened for the expression of antigenic determinants that are recognized by rabbit antibodies specific to human DNA topoisomerase II. The entire coding sequences of the human DNA topoisomerase II gene were determined from these and several additional clones, identified through the use of the cloned human TOP2 gene sequences as probes. Hybridization between the cloned sequences and mRNA and genomic DNA indicates that the human enzyme is encoded by a single-copy gene. The location of the gene was mapped to chromosome 17q21-22 by in situ hybridization of a cloned fragment to metaphase chromosomes and by hybridization analysis with a panel of mouse-human hybrid cell lines, each retaining a subset of human chromosomes

  19. Deep sequencing reveals complex spurious transcription from transiently transfected plasmids

    Czech Academy of Sciences Publication Activity Database

    Nejepínská, Jana; Malík, Radek; Moravec, Martin

    2012-01-01

    Roč. 7, č. 8 (2012), e43283 E-ISSN 1932-6203 R&D Projects: GA ČR GA204/09/0085 Grant - others:EMBO(XE) 0001488 Institutional research plan: CEZ:AV0Z50520514 Institutional support: RVO:68378050 Keywords : transient plasmid transfection * deep sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.730, year: 2012

  20. Frameshift mutations in infectious cDNA clones of Citrus tristeza virus: a strategy to minimize the toxicity of viral sequences to Escherichia coli

    International Nuclear Information System (INIS)

    Satyanarayana, Tatineni; Gowda, Siddarame; Ayllon, Maria A.; Dawson, William O.

    2003-01-01

    The advent of reverse genetics revolutionized the study of positive-stranded RNA viruses that were amenable for cloning as cDNAs into high-copy-number plasmids of Escherichia coli. However, some viruses are inherently refractory to cloning in high-copy-number plasmids due to toxicity of viral sequences to E. coli. We report a strategy that is a compromise between infectivity of the RNA transcripts and toxicity to E. coli effected by introducing frameshift mutations into 'slippery sequences' near the viral 'toxicity sequences' in the viral cDNA. Citrus tristeza virus (CTV) has cDNA sequences that are toxic to E. coli. The original full-length infectious cDNA of CTV and a derivative replicon, CTV-ΔCla, cloned into pUC119, resulted in unusually limited E. coli growth. However, upon sequencing of these cDNAs, an additional uridinylate (U) was found in a stretch of U's between nts 3726 and 3731 that resulted in a change to a reading frame with a stop codon at nt 3734. Yet, in vitro produced RNA transcripts from these clones infected protoplasts, and the resulting progeny virus was repaired. Correction of the frameshift mutation in the CTV cDNA constructs resulted in increased infectivity of in vitro produced RNA transcripts, but also caused a substantial increase of toxicity to E. coli, now requiring 3 days to develop visible colonies. Frameshift mutations created in sequences not suspected to facilitate reading frame shifting and silent mutations introduced into oligo(U) regions resulted in complete loss of infectivity, suggesting that the oligo(U) region facilitated the repair of the frameshift mutation. Additional frameshift mutations introduced into other oligo(U) regions also resulted in transcripts with reduced infectivity similarly to the original clones with the +1 insertion. However, only the frameshift mutations introduced into oligo(U) regions that were near and before the toxicity region improved growth and stability in E. coli. These data demonstrate that

  1. Construction of full-length cDNA library and development of EST-derived simple sequence repeat (EST-SSR) markers in Senecio scandens.

    Science.gov (United States)

    Qian, Gang; Ping, Junjiao; Lu, Jian; Zhang, Zhen; Wang, Lei; Xu, Delin

    2014-12-01

    Senecio scandens Buch.-Ham. ex D. Don (Compositae) is a crucial source of Chinese traditional medicine with antibacterial properties. We constructed a cDNA library and obtained expressed sequence tags (ESTs) to show the distribution of gene ontology annotations for mRNAs, using an individual plant with superior antibacterial characteristics. Analysis of comparative genomics indicates that the putative uncharacterized proteins (21.07%) might be derived from "molecular function unknown" clones or rare transcripts. Furthermore, the Compositae had high cross-species transferability of EST-derived simple sequence repeats (EST-SSR), based on valid amplifications of 206 primer pairs developed from the newly assembled expressed sequence tag sequences in Artemisia annua L. Among those EST-SSR markers, 52 primers showed polymorphic amplifications between individuals with contrasting diverse antibacterial traits. Our sequence data and molecular markers will be cost-effective tools for further studies such as genome annotation, molecular breeding, and novel transcript profiles within Compositae species.

  2. Isolation and sequence of a cDNA clone for human tyrosinase that maps at the mouse c-albino locus.

    Science.gov (United States)

    Kwon, B S; Haq, A K; Pomerantz, S H; Halaban, R

    1987-01-01

    Screening of a lambda gt11 human melanocyte cDNA library with antibodies against hamster tyrosinase (monophenol, L-dopa:oxygen oxidoreductase, EC 1.14.18.1) resulted in the isolation of 16 clones. The cDNA inserts from 13 of the 16 clones cross-hybridized with each other, indicating that they were from related mRNA species. One of the cDNA clones, Pmel34, detected one mRNA species with an approximate length of 2.4 kilobases that was expressed preferentially in normal and malignant melanocytes but not in other cell types. The amino acid sequence deduced from the nucleotide sequence showed that the putative human tyrosinase is composed of 548 amino acids with a molecular weight of 62,610. The deduced protein contains glycosylation sites and histidine-rich sites that could be used for copper binding. Southern blot analysis of DNA derived from newborn mice carrying lethal albino deletion mutations revealed that Pmel34 maps near or at the c-albino locus, the position of the structural gene for tyrosinase. Images PMID:2823263

  3. Construction of a full-length cDNA library and preliminary analysis of expressed sequence tags from lymphocytes of half-pipe snowboarding athletes.

    Science.gov (United States)

    Zhao, Y H; Zhang, Z B; Zhao, C Q; Zhang, Y; Wang, Y F; Guan, W J; Zhu, Z Q

    2015-10-21

    The genes of top athletes are a valuable genetic resource for the human race, and could be exploited to identify novel genes related to sports ability, as well as other functions. We analyzed the expressed sequence tags from top half-pipe snowboarding athletes using the SMART complementary DNA (cDNA) library construction method to elucidate the characteristics of the athlete genome and the differential expression of the genes it contains. Overall, we established a full-length cDNA library from the lymphocytes of half-pipe snowboarding athletes and analyzed the inserted gene fragments. We also classified those genes according to molecular function, biological characteristics, cellular composition, protein types, and signal paths. A total of 201 functional genes were noted, which were distributed in 27 pathways. TXN, MDH1, ARL1, ARPC3, ACTG1, and other genes measured in sequence may be associated with physical ability. This suggests that the SMART cDNA library constructed from the genetic material from top athletes is an effective tool for preserving genetic sports resources and providing genetic markers of physical ability for athlete selection.

  4. Cloning and sequencing of the cDNA encoding a core protein of the paired helical filament of Alzheimer's disease: Identification as the microtubule-associated protein tau

    International Nuclear Information System (INIS)

    Goedert, M.; Wischik, C.M.; Crowther, R.A.; Walker, J.E.; Klug, A.

    1988-01-01

    Screening of cDNA libraries prepared from the frontal cortex of an Alzheimer's disease patient and from fetal human brain has led to isolation of the cDNA for a core protein of the paired helical filament of Alzheimer's disease. The partial amino acid sequence of this core protein was used to design synthetic oligonucleotide probes. The cDNA encodes a protein of 352 amino acids that contains a characteristic amino acid repeat in its carboxyl-terminal half. This protein is highly homologous to the sequence of the mouse microtubule-associated protein tau and thus constitutes the human equivalent of mouse tau. RNA blot analysis indicates the presence of two major transcripts, 6 and 2 kilobases long, with a wide distribution in normal human brain. Tau protein mRNAs were found in normal amounts in the frontal cortex from patients with Alzheimer's disease. The proof that at least part of tau protein forms a component of the paired helical filament core opens the way to understanding the mode of formation of paired helical filaments and thus, ultimately, the pathogenesis of Alzheimer's disease

  5. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome.

    Directory of Open Access Journals (Sweden)

    Michalis Kotsyfakis

    2015-05-01

    Full Text Available Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution.De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS; these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor.This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so far deposited by our previous transcriptome studies

  6. Nucleotide sequence of a cDNA coding for the barley seed protein CMa: an inhibitor of insect α-amylase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Johansson, A.

    1992-01-01

    The primary structure of the insect alpha-amylase inhibitor CMa of barley seeds was deduced from a full-length cDNA clone pc43F6. Analysis of RNA from barley endosperm shows high levels 15 and 20 days after flowering. The cDNA predicts an amino acid sequence of 119 residues preceded by a signal...... peptide of 25 amino acids. Ala and Leu account for 55% of the signal peptide. CMa is 60-85% identical with alpha-amylase inhibitors of wheat, but shows less than 50% identity to trypsin inhibitors of barley and wheat. The 10 Cys residues are located in identical positions compared to the cereal inhibitor...

  7. Construction of a muscle cDNA library of Chinese shrimp Fenneropenaeus chinensis and sequence analysis of the troponin I gene

    Science.gov (United States)

    Li, Jitao; Chen, Ping; Li, Jian; Liu, Ping; He, Yuying; Wang, Qingyin

    2010-03-01

    A muscle cDNA library of Chinese shrimp ( Fenneropenaeus chinensis) was constructed with the SMART™ cDNA Library Construction Kit. The titer of optimal primary library was 7.7×105 pfu mL-1 and that of the amplified library was 3.0×109 pfu mL-1. The percentages of the recombinant clones of primary and amplified libraries were over 98%. The insert sizes were longer than 400 bp with an average of 1000 bp. A positive clone containing a 794 bp insert was sequenced and identified encoding fast skeletal troponin I gene. This library provided a useful resource for the functional genomic research of F. chinensis.

  8. deepTools2: a next generation web server for deep-sequencing data analysis.

    Science.gov (United States)

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-08

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Acetylcholinesterase of the Sand Fly, Phlebotomus papatasi (Scopoli): cDNA Sequence, Baculovirus Expression, and Biochemical Properties

    Science.gov (United States)

    2013-01-01

    and domestic animals around the world are affected by leishmaniasis, a disease caused by various species of flagellated protozoans in the genus...distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Background: Millions of people and domestic animals around the world are affected by...nucleotides, [GenBank: DQ898276]) respectively, as well as to other arthropod AChEs. The P. papatasi AChE cDNA ORF encodes a 710-amino acid protein

  10. Identification and characterization of a novel legume-like lectin cDNA sequence from the red marine algae Gracilaria fisheri.

    Science.gov (United States)

    Suttisrisung, Sukanya; Senapin, Saengchan; Withyachumnarnkul, Boonsirm; Wongprasert, Kanokpan

    2011-12-01

    A legume-type lectin (L-Lectin) gene of the red algae Gracilaria fisheri (GFL) was cloned by rapid amplification of cDNA ends (RACE). The full-length cDNA of GFL was 1714 bp and contained a 1542 bp open reading frame encoding 513 amino acids with a predicted molecular mass of 56.5 kDa. Analysis of the putative amino acid sequence with NCBI-BLAST revealed a high homology (30-68%) with legume-type lectins (L-lectin) from Griffithsia japonica, Clavispora lusitaniae, Acyrthosiphon pisum, Tetraodon nigroviridis and Xenopus tropicalis. Phylogenetic relationship analysis showed the highest sequence identity to a glycoprotein of the red algae Griffithsia japonica (68%) (GenBank number AAM93989). Conserved Domain Database analysis detected an N-terminal carbohydrate recognition domain (CRD), the characteristic of L-lectins, which contained two sugar binding sites and a metal binding site. The secondary structure prediction of GFL showed a beta-sheet structure, connected with turn and coil. The most abundant structural element of GFL was the random coil, while the alpha-helixes were distributed at the N- and C-termini, and 21 beta-sheets were distributed in the CRD. Computer analysis of three-dimensional structure showed a common feature of L-lectins of GFL, which included an overall globular shape that was composed of a beta-sandwich of two anti-parallel beta-sheets, monosaccharide binding sites, were on the top of the structure and in proximity with a metal binding site. Northern blot analysis using a DIG-labelled probe derived from a partial GFL sequence revealed a hybridization signal of (approx.) 1.7 kb consistent with the length of the full-length GFL cDNA identified by RACE. No detectable band was observed from control total RNA extracted from filamentous green algae.

  11. Deep Sequencing in Microdissected Renal Tubules Identifies Nephron Segment-Specific Transcriptomes.

    Science.gov (United States)

    Lee, Jae Wook; Chou, Chung-Lin; Knepper, Mark A

    2015-11-01

    The function of each renal tubule segment depends on the genes expressed therein. High-throughput methods used for global profiling of gene expression in unique cell types have shown low sensitivity and high false positivity, thereby limiting the usefulness of these methods in transcriptomic research. However, deep sequencing of RNA species (RNA-seq) achieves highly sensitive and quantitative transcriptomic profiling by sequencing RNAs in a massive, parallel manner. Here, we used RNA-seq coupled with classic renal tubule microdissection to comprehensively profile gene expression in each of 14 renal tubule segments from the proximal tubule through the inner medullary collecting duct of rat kidneys. Polyadenylated mRNAs were captured by oligo-dT primers and processed into adapter-ligated cDNA libraries that were sequenced using an Illumina platform. Transcriptomes were identified to a median depth of 8261 genes in microdissected renal tubule samples (105 replicates in total) and glomeruli (5 replicates). Manual microdissection allowed a high degree of sample purity, which was evidenced by the observed distributions of well established cell-specific markers. The main product of this work is an extensive database of gene expression along the nephron provided as a publicly accessible webpage (https://helixweb.nih.gov/ESBL/Database/NephronRNAseq/index.html). The data also provide genome-wide maps of alternative exon usage and polyadenylation sites in the kidney. We illustrate the use of the data by profiling transcription factor expression along the renal tubule and mapping metabolic pathways. Copyright © 2015 by the American Society of Nephrology.

  12. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2013-05-01

    Full Text Available In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  13. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-05-24

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  14. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  15. Nasopharyngeal metagenomic deep sequencing data, Lancaster, UK, 2014-2015.

    Science.gov (United States)

    Atkinson, Kate V; Bishop, Lisa A; Rhodes, Glenn; Salez, Nicolas; McEwan, Neil R; Hegarty, Matthew J; Robey, Julie; Harding, Nicola; Wetherell, Simon; Lauder, Robert M; Pickup, Roger W; Wilkinson, Mark; Gatherer, Derek

    2017-10-24

    Nasopharyngeal swabs were taken from volunteers attending a general medical practice and a general hospital in Lancaster, UK, and at Lancaster University, in the winter of 2014-2015. 51 swabs were selected based on high RNA yield and allocated to deep sequencing pools as follows: patients with chronic obstructive pulmonary disease; asthmatics; adults with no respiratory symptoms; adults with feverish respiratory symptoms; adults with respiratory symptoms and presence of antibodies against influenza C; paediatric patients with respiratory symptoms (2 pools); adults with influenza C infection (2 pools), giving a total of 9 pools. Illumina sequencing was performed, with data yields per pool in the range of 345.6 megabases to 14 gigabases after removal of reads aligning to the human genome. The data were deposited in the Sequence Read Archive at NCBI, and constitute a resource for study of the viral, bacterial and fungal metagenome of the human nasopharynx in healthy and diseased states and comparison with other metagenomic studies on the human respiratory tract.

  16. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  17. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  18. Sequencing Infrastructure Investments under Deep Uncertainty Using Real Options Analysis

    Directory of Open Access Journals (Sweden)

    Nishtha Manocha

    2018-02-01

    Full Text Available The adaptation tipping point and adaptation pathway approach developed to make decisions under deep uncertainty do not shed light on which among the multiple available pathways should be chosen as the preferred pathway. This creates the need to extend these approaches by means of suitable tools that can help sequence actions and subsequently enable the outlining of relevant policies. This paper presents two sequencing approaches, namely, the “Build to Target” and “Build Up” approach, to aid in sub-selecting a set of preferred pathways. Both approaches differ in the levels of flexibility they offer. They are exemplified by means of two case studies wherein the Net Present Valuation and the Real Options Analysis are employed as selection criterions. The results demonstrate the benefit of these two approaches when used in conjunction with the adaptation pathways and show how the pathways selected by means of a Build to Target approach generally have a value greater than, or at least the same as, the pathways selected by the Build Up approach. Further, this paper also demonstrates the capacity of Real Options to quantify and capture the economic value of flexibility, which cannot be done by traditional valuation approaches such as Net Present Valuation.

  19. A protein deep sequencing evaluation of metastatic melanoma tissues.

    Directory of Open Access Journals (Sweden)

    Charlotte Welinder

    Full Text Available Malignant melanoma has the highest increase of incidence of malignancies in the western world. In early stages, front line therapy is surgical excision of the primary tumor. Metastatic disease has very limited possibilities for cure. Recently, several protein kinase inhibitors and immune modifiers have shown promising clinical results but drug resistance in metastasized melanoma remains a major problem. The need for routine clinical biomarkers to follow disease progression and treatment efficacy is high. The aim of the present study was to build a protein sequence database in metastatic melanoma, searching for novel, relevant biomarkers. Ten lymph node metastases (South-Swedish Malignant Melanoma Biobank were subjected to global protein expression analysis using two proteomics approaches (with/without orthogonal fractionation. Fractionation produced higher numbers of protein identifications (4284. Combining both methods, 5326 unique proteins were identified (2641 proteins overlapping. Deep mining proteomics may contribute to the discovery of novel biomarkers for metastatic melanoma, for example dividing the samples into two metastatic melanoma "genomic subtypes", ("pigmentation" and "high immune" revealed several proteins showing differential levels of expression. In conclusion, the present study provides an initial version of a metastatic melanoma protein sequence database producing a total of more than 5000 unique protein identifications. The raw data have been deposited to the ProteomeXchange with identifiers PXD001724 and PXD001725.

  20. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    Science.gov (United States)

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Characterization of the venom from the Australian scorpion Urodacus yaschenkoi: Molecular mass analysis of components, cDNA sequences and peptides with antimicrobial activity.

    Science.gov (United States)

    Luna-Ramírez, Karen; Quintero-Hernández, Veronica; Vargas-Jaimes, Leonel; Batista, Cesar V F; Winkel, Kenneth D; Possani, Lourival D

    2013-03-01

    The Urodacidae scorpions are the most widely distributed of the four families in Australia and represent half of the species in the continent, yet their venoms remain largely unstudied. This communication reports the first results of a proteome analysis of the venom of the scorpion Urodacus yaschenkoi performed by mass fingerprinting, after high performance liquid chromatography (HPLC) separation. A total of 74 fractions were obtained by HPLC separation allowing the identification of approximately 274 different molecular masses with molecular weights varying from 287 to 43,437 Da. The most abundant peptides were those from 1 K Da and 4-5 K Da representing antimicrobial peptides and putative potassium channel toxins, respectively. Three such peptides were chemically synthesized and tested against Gram-positive and Gram-negative bacteria showing minimum inhibitory concentration in the low micromolar range, but with moderate hemolytic activity. It also reports a transcriptome analysis of the venom glands of the same scorpion species, undertaken by constructing a cDNA library and conducting random sequencing screening of the transcripts. From the resultant cDNA library 172 expressed sequence tags (ESTs) were analyzed. These transcripts were further clustered into 120 unique sequences (23 contigs and 97 singlets). The identified putative proteins can be assorted in several groups, such as those implicated in common cellular processes, putative neurotoxins and antimicrobial peptides. The scorpion U. yaschenkoi is not known to be dangerous to humans and its venom contains peptides similar to those of Opisthacanthus cayaporum (antibacterial), Scorpio maurus palmatus (maurocalcin), Opistophthalmus carinatus (opistoporines) and Hadrurus gerstchi (scorpine-like molecules), amongst others. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. Human glutamate pyruvate transaminase (GPT): Localization to 8q24.3, cDNA and genomic sequences, and polymorphic sites

    Energy Technology Data Exchange (ETDEWEB)

    Sohocki, M.M.; Sullivan, L.S.; Daiger, S.P. [Univ. of Texas Health Science Center, Houston, TX (United States)] [and others

    1997-03-01

    Two frequent protein variants of glutamate pyruvate transaminase (GPT) (E.C.2.6.1.2) have been used as genetic markers in humans for more than two decades, although chromosomal mapping of the GPT locus in the 1980s produced conflicting results. To resolve this conflict and develop useful DNA markers for this gene, we isolated and characterized cDNA and genomic clones of GPT. We have definitively mapped human GPT to the terminus of 8q using several methods. First, two cosmids shown to contain the GPT sequence were derived from a chromosome 8-specific library. Second, by fluorescence in situ hybridization, we mapped the cosmid containing the human GPT gene to chromosome band 8q24.3. Third, we mapped the rat gpt cDNA to the syntenic region of rat chromosome 7. Finally, PCR primers specific to human GPT amplify sequences contained within a {open_quotes}half-YAC{close_quotes} from the long arm of chromosome 8, that is, a YAC containing the 8q telomere. The human GPT genomic sequence spans 2.7 kb and consists of 11 exons, ranging in size from 79 to 243 bp. The exonic sequence encodes a protein of 495 amino acids that is nearly identical to the previously reported protein sequence of human GPT-1. The two polymorphic GPT isozymes are the result of a nucleotide substitution in codon 14. In addition, a cosmid containing the GPT sequence also contains a previously unmapped, polymorphic microsatellite sequence, D8S421. The cloned GPT gene and associated polymorphisms will be useful for linkage and physical mapping of disease loci that map to the terminus of 8q, including atypical vitelliform macular dystrophy (VMD1) and epidermolysis bullosa simplex, type Ogna (EBS1). In addition, this will be a useful system for characterizing the telomeric region of 8q. Finally, determination of the molecular basis of the GPT isozyme variants will permit PCR-based detection of this world-wide polymorphism. 22 refs., 3 figs.

  3. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  4. Probe-Directed Degradation (PDD) for Flexible Removal of Unwanted cDNA Sequences from RNA-Seq Libraries.

    Science.gov (United States)

    Archer, Stuart K; Shirokikh, Nikolay E; Preiss, Thomas

    2015-04-01

    Most applications for RNA-seq require the depletion of abundant transcripts to gain greater coverage of the underlying transcriptome. The sequences to be targeted for depletion depend on application and species and in many cases may not be supported by commercial depletion kits. This unit describes a method for generating RNA-seq libraries that incorporates probe-directed degradation (PDD), which can deplete any unwanted sequence set, with the low-bias split-adapter method of library generation (although many other library generation methods are in principle compatible). The overall strategy is suitable for applications requiring customized sequence depletion or where faithful representation of fragment ends and lack of sequence bias is paramount. We provide guidelines to rapidly design specific probes against the target sequence, and a detailed protocol for library generation using the split-adapter method including several strategies for streamlining the technique and reducing adapter dimer content. Copyright © 2015 John Wiley & Sons, Inc.

  5. Localization of the human fibromodulin gene (FMOD) to chromosome 1q32 and completion of the cDNA sequence

    Energy Technology Data Exchange (ETDEWEB)

    Sztrolovics, R.; Grover, J.; Roughley, P.J. [McGill Univ., Montreal (Canada)] [and others

    1994-10-01

    This report describes the cloning of the 3{prime}-untranslated region of the human fibromodulin cDNA and its use to map the gene. For somatic cell hybrids, the generation of the PCR product was concordant with the presence of chromosome 1 and discordant with the presence of all other chromosomes, confirming that the fibromodulin gene is located within region q32 of chromosome 1. The physical mapping of genes is a critical step in the process of identifying which genes may be responsible for various inherited disorders. Specifically, the mapping of the fibromodulin gene now provides the information necessary to evaluate its potential role in genetic disorders of connective tissues. The analysis of previously reported diseases mapped to chromosome 1 reveals two genes located in the proximity of the fibromodulin locus. These are Usher syndrome type II, a recessive disorder characterized by hearing loss and retinitis pigmentosa, and Van der Woude syndrome, a dominant condition associated with abnormalities such as cleft lip and palate and hyperdontia. The genes for both of these disorders have been projected to be localized to 1q32 of a physical map that integrates available genetic linkage and physical data. However, it seems improbable that either of these disorders, exhibiting restricted tissue involvement, could be linked to the fibromodulin gene, given the wide tissue distribution of the encoded proteoglycan, although it remains possible that the relative importance of the quantity and function of the proteoglycan may avry between tissues. 11 refs., 1 fig.

  6. Normalized cDNA libraries

    Science.gov (United States)

    Soares, Marcelo B.; Efstratiadis, Argiris

    1997-01-01

    This invention provides a method to normalize a directional cDNA library constructed in a vector that allows propagation in single-stranded circle form comprising: (a) propagating the directional cDNA library in single-stranded circles; (b) generating fragments complementary to the 3' noncoding sequence of the single-stranded circles in the library to produce partial duplexes; (c) purifying the partial duplexes; (d) melting and reassociating the purified partial duplexes to moderate Cot; and (e) purifying the unassociated single-stranded circles, thereby generating a normalized cDNA library.

  7. Human nuclear respiratory factor 2 alpha subunit cDNA: isolation, subcloning, sequencing, and in situ hybridization of transcripts in normal and monocularly deprived macaque visual system.

    Science.gov (United States)

    Guo, A; Nie, F; Wong-Riley, M

    2000-02-07

    Nuclear respiratory factor 2 (NRF-2) has been shown to contribute to the transcriptional regulation of a number of subunits of respiratory chain enzymes, including cytochrome c oxidase (CO). Our recent study demonstrated a parallel distribution of the alpha subunit proteins of NRF-2 (NRF-2 alpha) with CO in the monkey striate cortex, and that it can be regulated by neuronal activity. To determine whether this regulation is at the transcriptional level, the present study examined the expression of NRF-2 alpha mRNA in normal and monocularly deprived adult monkeys. A partial NRF-2 alpha cDNA was isolated from a human brain cDNA library. Sequence analysis revealed that it shared 99% identity with the published sequence from human HeLa cells. Riboprobes of NRF-2 alpha was generated and labeled with digoxigenin-11-UTP for in situ hybridization. The expression pattern of NRF-2 alpha mRNA in the normal striate cortex paralleled that of CO activity. It was highly expressed in layers IVC and VI, which contained high levels of CO, and more densely expressed in puffs of layers II and III than in interpuffs. In monkeys monocularly treated with tetrodotoxin for 1 day to 2 weeks, both NRF-2 alpha expression and CO activity were reduced in deprived ocular dominance columns of the visual cortex and in deprived layers of the lateral geniculate nucleus. These data indicate that, in the normal and visually deprived adult monkeys, NRF-2 alpha is regulated by neuronal activity at the transcriptional level.

  8. Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing

    Science.gov (United States)

    Mohr, Sabine; Ghanem, Eman; Smith, Whitney; Sheeter, Dennis; Qin, Yidan; King, Olga; Polioudakis, Damon; Iyer, Vishwanath R.; Hunicke-Smith, Scott; Swamy, Sajani; Kuersten, Scott; Lambowitz, Alan M.

    2013-01-01

    Mobile group II introns encode reverse transcriptases (RTs) that function in intron mobility (“retrohoming”) by a process that requires reverse transcription of a highly structured, 2–2.5-kb intron RNA with high processivity and fidelity. Although the latter properties are potentially useful for applications in cDNA synthesis and next-generation RNA sequencing (RNA-seq), group II intron RTs have been difficult to purify free of the intron RNA, and their utility as research tools has not been investigated systematically. Here, we developed general methods for the high-level expression and purification of group II intron-encoded RTs as fusion proteins with a rigidly linked, noncleavable solubility tag, and we applied them to group II intron RTs from bacterial thermophiles. We thus obtained thermostable group II intron RT fusion proteins that have higher processivity, fidelity, and thermostability than retroviral RTs, synthesize cDNAs at temperatures up to 81°C, and have significant advantages for qRT-PCR, capillary electrophoresis for RNA-structure mapping, and next-generation RNA sequencing. Further, we find that group II intron RTs differ from the retroviral enzymes in template switching with minimal base-pairing to the 3′ ends of new RNA templates, making it possible to efficiently and seamlessly link adaptors containing PCR-primer binding sites to cDNA ends without an RNA ligase step. This novel template-switching activity enables facile and less biased cloning of nonpolyadenylated RNAs, such as miRNAs or protein-bound RNA fragments. Our findings demonstrate novel biochemical activities and inherent advantages of group II intron RTs for research, biotechnological, and diagnostic methods, with potentially wide applications. PMID:23697550

  9. cDNA sequences and organization of IgM heavy chain genes in two holostean fish.

    Science.gov (United States)

    Wilson, M R; van Ravenstein, E; Miller, N W; Clem, L W; Middleton, D L; Warr, G W

    1995-01-01

    Immunoglobulin M heavy chain (mu) sequences of two holostean fish, the bowfin, Amia calva, and the longnose gar, Lepisosteus osseus, were amplified from spleen mRNA by RACE-PCR, cloned, and sequenced. Each mu chain showed the conserved four constant domain structure typical of a secreted mu chain. Southern blot analyses with specific heavy chain variable (VH) and constant (CH) region probes suggest that both fish possess an IgH locus that resembles that of the teleosts, amphibians, and mammals in its organization. The overall sequence similarity of gar and bowfin mu chains was 60% and 48% at the nucleotide and amino acid levels, respectively, while similarity to the mu chains of teleosts and elasmobranchs was lower. The bowfin mu chain possesses a distinctive proline-rich sequence at the C mu 1/C mu 2 boundary; a shorter proline-rich sequence is present at this position in the gar mu chain. Both gar and bowfin show, in their C mu 4 sequences, motifs that could serve as cryptic splice donor sites for the production of mRNA encoding the membrane-bound form of the mu chains, and the bowfin also shows a potential cryptic splice donor site in the C mu 3 exon.

  10. Model for a transcript map of human chromosome 21: isolation of new coding sequences from exon and enriched cDNA libraries.

    Science.gov (United States)

    Yaspo, M L; Gellen, L; Mott, R; Korn, B; Nizetic, D; Poustka, A M; Lehrach, H

    1995-08-01

    The construction of a transcriptional map for human chromosome 21 requires the generation of a specific catalogue of genes, together with corresponding mapping information. Towards this goal, we conducted a pilot study on a pool of random chromosome 21 cosmids representing 2 Mb of non-contiguous DNA. Exon-amplification and cDNA selection methods were used in combination to extract the coding content from these cosmids, and to derive expressed sequences libraries. These libraries and the source cosmid library were arrayed at high density for hybridisation screening. A strategy was used which related data obtained by multiple hybridisations of clones originating from one library, screened against the other libraries. In this way, it was possible to integrate the information with the physical map and to compare the gene recovery rate of each technique. cDNAs and exons were grouped into bins delineated by EcoRI cosmid fragments, and a subset of 91 cDNAs and 29 exons have been sequenced. These sequences defined 79 non-overlapping potential coding segments distributed in 24 transcriptional units, which were mapped along 21q. Northern blot analysis performed for a subset of cDNAs indicated the existence of a cognate transcript. Comparison to databases indicated three segments matching to known chromosome 21 genes: PFKL, COL6A1 and S100B and six segments matching to unmapped anonymous expressed sequence tags (ESTs). At the translated nucleotide level, strong homologies to known proteins were found with ATP-binding transporters of the ABC family and the dihydroorotase domain of pyrimidine synthetases. These data strongly suggest that bona fide partial genes have been isolated. Several of the newly isolated transcriptional units map to clinically important regions, in particular those involved in Down's syndrome, progressive myoclonus epilepsia and auto-immune polyglandular disease. The study presented here illustrates the complementarity of exon-amplification and cDNA

  11. Transcriptome analysis of the model protozoan, Tetrahymena thermophila, using Deep RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Jie Xiong

    Full Text Available BACKGROUND: The ciliated protozoan Tetrahymena thermophila is a well-studied single-celled eukaryote model organism for cellular and molecular biology. However, the lack of extensive T. thermophila cDNA libraries or a large expressed sequence tag (EST database limited the quality of the original genome annotation. METHODOLOGY/PRINCIPAL FINDINGS: This RNA-seq study describes the first deep sequencing analysis of the T. thermophila transcriptome during the three major stages of the life cycle: growth, starvation and conjugation. Uniquely mapped reads covered more than 96% of the 24,725 predicted gene models in the somatic genome. More than 1,000 new transcribed regions were identified. The great dynamic range of RNA-seq allowed detection of a nearly six order-of-magnitude range of measurable gene expression orchestrated by this cell. RNA-seq also allowed the first prediction of transcript untranslated regions (UTRs and an updated (larger size estimate of the T. thermophila transcriptome: 57 Mb, or about 55% of the somatic genome. Our study identified nearly 1,500 alternative splicing (AS events distributed over 5.2% of T. thermophila genes. This percentage represents a two order-of-magnitude increase over previous EST-based estimates in Tetrahymena. Evidence of stage-specific regulation of alternative splicing was also obtained. Finally, our study allowed us to completely confirm about 26.8% of the genes originally predicted by the gene finder, to correct coding sequence boundaries and intron-exon junctions for about a third, and to reassign microarray probes and correct earlier microarray data. CONCLUSIONS/SIGNIFICANCE: RNA-seq data significantly improve the genome annotation and provide a fully comprehensive view of the global transcriptome of T. thermophila. To our knowledge, 5.2% of T. thermophila genes with AS is the highest percentage of genes showing AS reported in a unicellular eukaryote. Tetrahymena thus becomes an excellent unicellular

  12. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons.

    Directory of Open Access Journals (Sweden)

    Magdalena Guardiola

    Full Text Available Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp. We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100-2,250 m. We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla, Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm of sediment was significantly different from deeper layers. We found that qualitative (presence-absence and quantitative (relative number of reads data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation

  13. Nucleotide sequence of the cDNA encoding the precursor of the beta subunit of rat lutropin.

    OpenAIRE

    Chin, W W; Godine, J E; Klein, D R; Chang, A S; Tan, L K; Habener, J F

    1983-01-01

    We have determined the nucleotide sequences of cDNAs encoding the precursor of the beta subunit of rat lutropin, a polypeptide hormone that regulates gonadal function, including the development of gametes and the production of steroid sex hormones. The cDNAs were prepared from poly(A)+ RNA derived from the pituitary glands of rats 4 weeks after ovariectomy and were cloned in bacterial plasmids. Bacterial colonies containing transfected plasmids were screened by hybridization with a 32P-labele...

  14. Deep sequencing of phage-displayed peptide libraries reveals sequence motif that detects norovirus

    Science.gov (United States)

    Hurwitz, Amy M.; Huang, Wanzhi; Estes, Mary K.; Atmar, Robert L.; Palzkill, Timothy

    2017-01-01

    Norovirus infections are the leading cause of non-bacterial gastroenteritis and result in about 21 million new cases and $2 billion in costs per year in the United States. Existing diagnostics have limited feasibility for point-of-care applications, so there is a clear need for more reliable, rapid, and simple-to-use diagnostic tools in order to contain outbreaks and prevent inappropriate treatments. In this study, a combination of phage display technology, deep sequencing and computational analysis was used to identify 12-mer peptides with specific binding to norovirus genotype GI.1 virus-like particles (VLPs). After biopanning, phage populations were sequenced and analyzed to identify a consensus peptide motif—YRSWXP. Two 12-mer peptides containing this sequence, NV-O-R5-3 and NV-O-R5-6, were further characterized to evaluate the motif's functional ability to detect VLPs and virus. Results indicated that these peptides effectively detect GI.1 VLPs in solid-phase peptide arrays, ELISAs and dot blots. Further, their specificity for the S-domain of the major capsid protein enables them to detect a wide range of GI and GII norovirus genotypes. Both peptides were able to detect virus in norovirus-positive clinical stool samples. Overall, the work reported here demonstrates the application of phage display coupled with next generation sequencing and computational analysis to uncover peptides with specific binding ability to a target protein for diagnostic applications. Further, the reagents characterized here can be integrated into existing diagnostic formats to detect clinically relevant genotypes of norovirus in stool. PMID:28035012

  15. PCR amplification and sequences of cDNA clones for the small and large subunits of ADP-glucose pyrophosphorylase from barley tissues.

    Science.gov (United States)

    Villand, P; Aalen, R; Olsen, O A; Lüthi, E; Lönneborg, A; Kleczkowski, L A

    1992-06-01

    Several cDNAs encoding the small and large subunit of ADP-glucose pyrophosphorylase (AGP) were isolated from total RNA of the starchy endosperm, roots and leaves of barley by polymerase chain reaction (PCR). Sets of degenerate oligonucleotide primers, based on previously published conserved amino acid sequences of plant AGP, were used for synthesis and amplification of the cDNAs. For either the endosperm, roots and leaves, the restriction analysis of PCR products (ca. 550 nucleotides each) has revealed heterogeneity, suggesting presence of three transcripts for AGP in the endosperm and roots, and up to two AGP transcripts in the leaf tissue. Based on the derived amino acid sequences, two clones from the endosperm, beps and bepl, were identified as coding for the small and large subunit of AGP, respectively, while a leaf transcript (blpl) encoded the putative large subunit of AGP. There was about 50% identity between the endosperm clones, and both of them were about 60% identical to the leaf cDNA. Northern blot analysis has indicated that beps and bepl are expressed in both the endosperm and roots, while blpl is detectable only in leaves. Application of the PCR technique in studies on gene structure and gene expression of plant AGP is discussed.

  16. Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

    Directory of Open Access Journals (Sweden)

    Liu He

    2017-10-01

    Full Text Available Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS and single-molecule real-time (SMRT sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET and one sucrose transporter (SUT are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT and four cellulose synthase (Ces genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.

  17. Complete cDNA sequence of the preproform of human pregnancy-associated plasma protein-A. Evidence for expression in the brain and induction by cAMP

    DEFF Research Database (Denmark)

    Haaning, Jesper; Oxvig, Claus; Overgaard, Michael Toft

    1996-01-01

    A cDNA that encodes the prepropeptide of pregnancy-associated plasma protein-A (preproPAPP-A), a putative metalloproteinase, has been cloned and sequenced. PAPP-A is synthesized in the placenta as a 1627-residue precursor preproprotein with a putative 22-residue signal peptide and a highly basic...

  18. cDNA sequence and tissue distribution of the mRNA for bovine and murine p11, the S100-related light chain of the protein-tyrosine kinase substrate p36 (calpactin I)

    DEFF Research Database (Denmark)

    Saris, Chris J M; Kristensen, Torsten; D’Eustachio, Peter

    1987-01-01

    We have isolated and sequenced cDNA clones of bovine nd murine pl 1 mRNAs. The nonpolyadenylated mRNAs are predicted to be 614 and 600 nucleotides, respectively. The p l l mRNAs both contain a 291 nucleotide open reading frame, preceded by a 5”untranslated region of 73 nucleotides in bovine p l l m...

  19. Cloning and sequence of cDNA encoding 1-aminocyclo- propane-1-carboxylate oxidase in Vanda flowers

    Directory of Open Access Journals (Sweden)

    Pattana Srifah Huehne

    2013-08-01

    Full Text Available The 1-aminocyclopropane-1-carboxylate oxidase (ACO gene in the final step of ethylene biosynthesis was isolated from ethylene-sensitive Vanda Miss Joaquim flowers. This consists of 1,242 base pairs (bp encoding for 326 amino acid residues. To investigate the specific divergence in orchid ACO sequences, the deduced Vanda ACO was aligned with five other orchid ACOs. The results reveal that the ACO sequences within Doritaenopsis, Phalaenopsis and Vanda show highly conserved and almost 95% identical homology, while the ACOs isolated from Cymbidium, Dendrobium and Cattleya are 8788% identical to Vanda ACO. In addition, the 2-oxoglutarate- Fe(II_oxygenase (Oxy domain of orchid ACOs consists of a higher degree of amino acid conservation than that of the non-haem dioxygenase (DIOX_N domain. The overall homology regions of Vanda ACO are commonly folded into 12 α-helices and 12 β-sheets similar to the three dimensional template-structure of Petunia ACO. This Vanda ACO cloned gene is highly expressed in flower tissue compared with root and leaf tissues. In particular, there is an abundance of ACO transcript accumulation in the column followed by the lip and the perianth of Vanda Miss Joaquim flowers at the fully-open stage.

  20. A Universal Next-Generation Sequencing Protocol To Generate Noninfectious Barcoded cDNA Libraries from High-Containment RNA Viruses

    Science.gov (United States)

    Moser, Lindsey A.; Ramirez-Carvajal, Lisbeth; Puri, Vinita; Pauszek, Steven J.; Matthews, Krystal; Dilley, Kari A.; Mullan, Clancy; McGraw, Jennifer; Khayat, Michael; Beeri, Karen; Yee, Anthony; Dugan, Vivien; Heise, Mark T.; Frieman, Matthew B.; Rodriguez, Luis L.; Bernard, Kristen A.; Wentworth, David E.

    2016-01-01

    ABSTRACT Several biosafety level 3 and/or 4 (BSL-3/4) pathogens are high-consequence, single-stranded RNA viruses, and their genomes, when introduced into permissive cells, are infectious. Moreover, many of these viruses are select agents (SAs), and their genomes are also considered SAs. For this reason, cDNAs and/or their derivatives must be tested to ensure the absence of infectious virus and/or viral RNA before transfer out of the BSL-3/4 and/or SA laboratory. This tremendously limits the capacity to conduct viral genomic research, particularly the application of next-generation sequencing (NGS). Here, we present a sequence-independent method to rapidly amplify viral genomic RNA while simultaneously abolishing both viral and genomic RNA infectivity across multiple single-stranded positive-sense RNA (ssRNA+) virus families. The process generates barcoded DNA amplicons that range in length from 300 to 1,000 bp, which cannot be used to rescue a virus and are stable to transport at room temperature. Our barcoding approach allows for up to 288 barcoded samples to be pooled into a single library and run across various NGS platforms without potential reconstitution of the viral genome. Our data demonstrate that this approach provides full-length genomic sequence information not only from high-titer virion preparations but it can also recover specific viral sequence from samples with limited starting material in the background of cellular RNA, and it can be used to identify pathogens from unknown samples. In summary, we describe a rapid, universal standard operating procedure that generates high-quality NGS libraries free of infectious virus and infectious viral RNA. IMPORTANCE This report establishes and validates a standard operating procedure (SOP) for select agents (SAs) and other biosafety level 3 and/or 4 (BSL-3/4) RNA viruses to rapidly generate noninfectious, barcoded cDNA amenable for next-generation sequencing (NGS). This eliminates the burden of testing all

  1. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum).

    Science.gov (United States)

    Ke, Tao; Dong, Caihua; Mao, Han; Zhao, Yingzhong; Chen, Hong; Liu, Hongyan; Dong, Xuyan; Tong, Chaobo; Liu, Shengyi

    2011-12-24

    Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants and serve as an abundant

  2. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum

    Directory of Open Access Journals (Sweden)

    Ke Tao

    2011-12-01

    Full Text Available Abstract Background Sesame (Sesamum indicum is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. Results A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1, PICKLE (PKL, WRINKLED1 (WRI1 and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs. Conclusions This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and

  3. Toward a better knowledge of the molecular evolution of phosphoenolpyruvate carboxylase by comparison of partial cDNA sequences.

    Science.gov (United States)

    Gehrig, H H; Heute, V; Kluge, M

    1998-01-01

    To get deeper insight into the evolution of phosphoenolpyruvate carboxylase we have identified PEPC fragments (about 1,100 bp) of another 12 plants species not yet investigated in this context. The selected plants include one Chlorophyta, two Bryophyta, four Pteridophyta, and five Spermatophyta species. The obtained phylogenetic trees on PEPC isoforms are the most complete ones up to now available. Independent of their manner of construction, the resulting dendrograms are very similar and fully consistent with the main topology as it is postulated for the evolution of the higher terrestrial plants. We found a distinct clustering of the PEPC sequences of the prokaryotes, the algae, and the spermatophytes. PEPC isoforms of the archegoniates are located in the phylogenetic trees between the algae and spermatophytes. Our results strengthen the view that the PEPC is a very useful molecular marker with which to visualize phylogenetic trends both on the metabolic and organismic levels.

  4. Deep Sequencing of RNA from Ancient Maize Kernels

    Science.gov (United States)

    Rasmussen, Morten; Cappellini, Enrico; Romero-Navarro, J. Alberto; Wales, Nathan; Alquezar-Planas, David E.; Penfield, Steven; Brown, Terence A.; Vielle-Calzada, Jean-Philippe; Montiel, Rafael; Jørgensen, Tina; Odegaard, Nancy; Jacobs, Michael; Arriaza, Bernardo; Higham, Thomas F. G.; Ramsey, Christopher Bronk; Willerslev, Eske; Gilbert, M. Thomas P.

    2013-01-01

    The characterization of biomolecules from ancient samples can shed otherwise unobtainable insights into the past. Despite the fundamental role of transcriptomal change in evolution, the potential of ancient RNA remains unexploited – perhaps due to dogma associated with the fragility of RNA. We hypothesize that seeds offer a plausible refuge for long-term RNA survival, due to the fundamental role of RNA during seed germination. Using RNA-Seq on cDNA synthesized from nucleic acid extracts, we validate this hypothesis through demonstration of partial transcriptomal recovery from two sources of ancient maize kernels. The results suggest that ancient seed transcriptomics may offer a powerful new tool with which to study plant domestication. PMID:23326310

  5. Whitefly (Bemisia tabaci genome project: analysis of sequenced clones from egg, instar, and adult (viruliferous and non-viruliferous cDNA libraries

    Directory of Open Access Journals (Sweden)

    Czosnek Henryk

    2006-04-01

    Full Text Available Abstract Background The past three decades have witnessed a dramatic increase in interest in the whitefly Bemisia tabaci, owing to its nature as a taxonomically cryptic species, the damage it causes to a large number of herbaceous plants because of its specialized feeding in the phloem, and to its ability to serve as a vector of plant viruses. Among the most important plant viruses to be transmitted by B. tabaci are those in the genus Begomovirus (family, Geminiviridae. Surprisingly, little is known about the genome of this whitefly. The haploid genome size for male B. tabaci has been estimated to be approximately one billion bp by flow cytometry analysis, about five times the size of the fruitfly Drosophila melanogaster. The genes involved in whitefly development, in host range plasticity, and in begomovirus vector specificity and competency, are unknown. Results To address this general shortage of genomic sequence information, we have constructed three cDNA libraries from non-viruliferous whiteflies (eggs, immature instars, and adults and two from adult insects that fed on tomato plants infected by two geminiviruses: Tomato yellow leaf curl virus (TYLCV and Tomato mottle virus (ToMoV. In total, the sequence of 18,976 clones was determined. After quality control, and removal of 5,542 clones of mitochondrial origin 9,110 sequences remained which included 3,843 singletons and 1,017 contigs. Comparisons with public databases indicated that the libraries contained genes involved in cellular and developmental processes. In addition, approximately 1,000 bases aligned with the genome of the B. tabaci endosymbiotic bacterium Candidatus Portiera aleyrodidarum, originating primarily from the egg and instar libraries. Apart from the mitochondrial sequences, the longest and most abundant sequence encodes vitellogenin, which originated from whitefly adult libraries, indicating that much of the gene expression in this insect is directed toward the production

  6. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  7. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    Science.gov (United States)

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  8. Analysis of expressed sequence tags (ESTs) from a normalized cDNA library and isolation of EST simple sequence repeats from the invasive cotton mealybug Phenacoccus solenopsis.

    Science.gov (United States)

    Li, Hui; Lang, Kun-Ling; Fu, Hai-Bin; Shen, Chang-Peng; Wan, Fang-Hao; Chu, Dong

    2015-12-01

    The cotton mealybug, Phenacoccus solenopsis Tinsley, is a serious and invasive pest. At present, genetic resources for studying P. solenopsis are limited, and this negatively affects genetic research on the organism and, consequently, translational work to improve management of this pest. In the present study, expressed sequence tags (ESTs) were analyzed from a normalized complementary DNA library of P. solenopsis. In addition, EST-derived microsatellite loci (also known as simple sequence repeats or SSRs) were isolated and characterized. A total of 1107 high-quality ESTs were acquired from the library. Clustering and assembly analysis resulted in 785 unigenes, which were classified functionally into 23 categories according to the Gene Ontology database. Seven EST-based SSR markers were developed in this study and are expected to be useful in characterizing how this invasive species was introduced, as well as providing insights into its genetic microevolution. © 2014 Institute of Zoology, Chinese Academy of Sciences.

  9. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    Science.gov (United States)

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments.

  10. Protein sequences bound to mineral surfaces persist into deep time

    DEFF Research Database (Denmark)

    Demarchi, Beatrice; Hall, Shaun; Roncal-Herrero, Teresa

    2016-01-01

    Proteins persist longer in the fossil record than DNA, but the longevity, survival mechanisms and substrates remain contested. Here, we demonstrate the role of mineral binding in preserving the protein sequence in ostrich (Struthionidae) eggshell, including from the palaeontological sites...

  11. Deep amplicon sequencing reveals mixed phytoplasma infection within single grapevine plants

    DEFF Research Database (Denmark)

    Nicolaisen, Mogens; Contaldo, Nicoletta; Makarova, Olga

    2011-01-01

    The diversity of phytoplasmas within single plants has not yet been fully investigated. In this project, deep amplicon sequencing was used to generate 50,926 phytoplasma sequences from 11 phytoplasma-infected grapevine samples from a PCR amplicon in the 5' end of the 16S region. After clustering ...

  12. Use of S1 nuclease in deep sequencing for detection of double-stranded RNA viruses.

    Science.gov (United States)

    Shimada, Saya; Nagai, Makoto; Moriyama, Hiromitsu; Fukuhara, Toshiyuki; Koyama, Satoshi; Omatsu, Tsutomu; Furuya, Tetsuya; Shirai, Junsuke; Mizutani, Tetsuya

    2015-09-01

    Metagenomic approach using next-generation DNA sequencing has facilitated the detection of many pathogenic viruses from fecal samples. However, in many cases, majority of the detected sequences originate from the host genome and bacterial flora in the gut. Here, to improve efficiency of the detection of double-stranded (ds) RNA viruses from samples, we evaluated the applicability of S1 nuclease on deep sequencing. Treating total RNA with S1 nuclease resulted in 1.5-28.4- and 10.1-208.9-fold increases in sequence reads of group A rotavirus in fecal and viral culture samples, respectively. Moreover, increasing coverage of mapping to reference sequences allowed for sufficient genotyping using analytical software. These results suggest that library construction using S1 nuclease is useful for deep sequencing in the detection of dsRNA viruses.

  13. A deep sequencing analysis of transcriptomes and the development ...

    Indian Academy of Sciences (India)

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3–39.8 million paired-end reads with 73 bp lengths were generated. The pooled reads from the ...

  14. Identification of Multiple Stress Responsive Genes by Sequencing a Normalized cDNA Library from Sea-Land Cotton (Gossypium barbadense L..

    Directory of Open Access Journals (Sweden)

    Bin Zhou

    Full Text Available Plants often face multiple stresses including drought, extreme temperature, salinity, nutrition deficiency and biotic stresses during growth and development. All the stresses result in a series of physiological and metabolic reactions and then generate reversible inhibition of metabolism and growth and can cause seriously irreversible damage, even death. At each stage of cotton growth, environmental stress conditions pose devastating threats to plant growth and development, especially yield and quality. Due to the complex stress conditions and unclear molecular mechanisms of stress response, there is an urgent need to explore the mechanisms of cotton response against abiotic stresses.A normalized cDNA library was constructed using Gossypium barbadense Hai-7124 treated with different stress conditions (heat, cold, salt, drought, potassium and phosphorus deficit and Verticillium dahliae infection. Random sequencing of this library generated 6,047 high-quality expressed sequence tags (ESTs. The ESTs were clustered and assembled into 3,135 uniESTs, composed of 2,497 contigs and 638 singletons. The blastx results demonstrated 2,746 unigenes showing significant similarity to known genes, 74 uniESTs displaying significant similarity to genes of predicted proteins, and 315 uniESTs remain uncharacterized. Functional classification unveiled the abundance of uniESTs in binding, catalytic activity, and structural molecule activity. Annotations of the uniESTs by the plant transcription factor database (PlantTFDB and Plant Stress Protein Database (PSPDB disclosed that transcription factors and stress-related genes were enriched in the current library. The expression of some transcription factors and specific stress-related genes were verified by RT-PCR under various stress conditions.Annotation results showed that a huge number of genes respond to stress in our study, such as MYB-related, C2H2, FAR1, bHLH, bZIP, MADS, and mTERF. These results will improve our

  15. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    Science.gov (United States)

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  16. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

    KAUST Repository

    Kulmanov, Maxat

    2017-09-27

    Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.

  17. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, H

    2012-06-06

    RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.

  18. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis.

    Science.gov (United States)

    Simons, Benjamin D

    2016-01-05

    Using deep sequencing technology, methods based on the sporadic acquisition of somatic DNA mutations in human tissues have been used to trace the clonal evolution of progenitor cells in diseased states. However, the potential of these approaches to explore cell fate behavior of normal tissues and the initiation of preneoplasia remain underexploited. Focusing on the results of a recent deep sequencing study of eyelid epidermis, we show that the quantitative analysis of mutant clone size provides a general method to resolve the pattern of normal stem cell fate and to detect and characterize the mutational signature of rare field transformations in human tissues, with implications for the early detection of preneoplasia.

  19. Deep sequencing analysis of the developing mouse brain reveals a novel microRNA

    OpenAIRE

    Ling, King-Hwa; Brautigan, Peter J; Hahn, Christopher N; Daish, Tasman; Rayner, John R; Cheah, Pike-See; Raison, Joy M; Piltz, Sandra; Mann, Jeffrey R; Mattiske, Deidre M; Thomas, Paul Q; Adelson, David L; Scott, Hamish S

    2011-01-01

    Abstract Background MicroRNAs (miRNAs) are small non-coding RNAs that can exert multilevel inhibition/repression at a post-transcriptional or protein synthesis level during disease or development. Characterisation of miRNAs in adult mammalian brains by deep sequencing has been reported previously. However, to date, no small RNA profiling of the developing brain has been undertaken using this method. We have performed deep sequencing and small RNA analysis of a developing (E15.5) mouse brain. ...

  20. Deep sequencing extends the diversity of human papillomaviruses in human skin.

    Science.gov (United States)

    Bzhalava, Davit; Mühr, Laila Sara Arroyo; Lagheden, Camilla; Ekström, Johanna; Forslund, Ola; Dillner, Joakim; Hultin, Emilie

    2014-07-24

    Most viruses in human skin are known to be human papillomaviruses (HPVs). Previous sequencing of skin samples has identified 273 different cutaneous HPV types, including 47 previously unknown types. In the present study, we wished to extend prior studies using deeper sequencing. This deeper sequencing without prior PCR of a pool of 142 whole genome amplified skin lesions identified 23 known HPV types, 3 novel putative HPV types and 4 non-HPV viruses. The complete sequence was obtained for one of the known putative types and almost the complete sequence was obtained for one of the novel putative types. In addition, sequencing of amplimers from HPV consensus PCR of 326 skin lesions detected 385 different HPV types, including 226 previously unknown putative types. In conclusion, metagenomic deep sequencing of human skin samples identified no less than 396 different HPV types in human skin, out of which 229 putative HPV types were previously unknown.

  1. Deep Sequencing Reveals a MicroRNA Expression Signature in Triple-Negative Breast Cancer.

    Science.gov (United States)

    Chang, Yao-Yin; Lai, Liang-Chuan; Tsai, Mong-Hsun; Chuang, Eric Y

    2018-01-01

    Deep sequencing is an advanced technology in genomic biology to detect the precise order of nucleotides in a strand of DNA/RNA molecule. The analysis of deep sequencing data also requires sophisticated knowledge in both computational software and bioinformatics. In this chapter, the procedures of deep sequencing analysis of microRNA (miRNA) transcriptome in triple-negative breast cancer and adjacent normal tissue are described in detail. As miRNAs are critical regulators of gene expression and many of them were previously reported to be associated with the malignant progression of human cancer, the analytical method that accurately identifies deregulated miRNAs in a specific type of cancer is thus important for the understanding of its tumor behavior. We obtained raw sequence reads of miRNA expression from 24 triple-negative breast cancers and 14 adjacent normal tissues using deep sequencing technology in this work. Expression data of miRNA reads were normalized with the quantile-quantile scaling method and were analyzed statistically. A miRNA expression signature composed of 25 differentially expressed miRNAs showed to be an effective classifier between triple-negative breast cancers and adjacent normal tissues in a hierarchical clustering analysis.

  2. Evaluation and Adaptation of a Laboratory-Based cDNA Library Preparation Protocol for Retrospective Sequencing of Archived MicroRNAs from up to 35-Year-Old Clinical FFPE Specimens

    Directory of Open Access Journals (Sweden)

    Olivier Loudig

    2017-03-01

    Full Text Available Formalin-fixed paraffin-embedded (FFPE specimens, when used in conjunction with patient clinical data history, represent an invaluable resource for molecular studies of cancer. Even though nucleic acids extracted from archived FFPE tissues are degraded, their molecular analysis has become possible. In this study, we optimized a laboratory-based next-generation sequencing barcoded cDNA library preparation protocol for analysis of small RNAs recovered from archived FFPE tissues. Using matched fresh and FFPE specimens, we evaluated the robustness and reproducibility of our optimized approach, as well as its applicability to archived clinical specimens stored for up to 35 years. We then evaluated this cDNA library preparation protocol by performing a miRNA expression analysis of archived breast ductal carcinoma in situ (DCIS specimens, selected for their relation to the risk of subsequent breast cancer development and obtained from six different institutions. Our analyses identified six miRNAs (miR-29a, miR-221, miR-375, miR-184, miR-363, miR-455-5p differentially expressed between DCIS lesions from women who subsequently developed an invasive breast cancer (cases and women who did not develop invasive breast cancer within the same time interval (control. Our thorough evaluation and application of this laboratory-based miRNA sequencing analysis indicates that the preparation of small RNA cDNA libraries can reliably be performed on older, archived, clinically-classified specimens.

  3. Evaluation and Adaptation of a Laboratory-Based cDNA Library Preparation Protocol for Retrospective Sequencing of Archived MicroRNAs from up to 35-Year-Old Clinical FFPE Specimens.

    Science.gov (United States)

    Loudig, Olivier; Wang, Tao; Ye, Kenny; Lin, Juan; Wang, Yihong; Ramnauth, Andrew; Liu, Christina; Stark, Azadeh; Chitale, Dhananjay; Greenlee, Robert; Multerer, Deborah; Honda, Stacey; Daida, Yihe; Spencer Feigelson, Heather; Glass, Andrew; Couch, Fergus J; Rohan, Thomas; Ben-Dov, Iddo Z

    2017-03-14

    Formalin-fixed paraffin-embedded (FFPE) specimens, when used in conjunction with patient clinical data history, represent an invaluable resource for molecular studies of cancer. Even though nucleic acids extracted from archived FFPE tissues are degraded, their molecular analysis has become possible. In this study, we optimized a laboratory-based next-generation sequencing barcoded cDNA library preparation protocol for analysis of small RNAs recovered from archived FFPE tissues. Using matched fresh and FFPE specimens, we evaluated the robustness and reproducibility of our optimized approach, as well as its applicability to archived clinical specimens stored for up to 35 years. We then evaluated this cDNA library preparation protocol by performing a miRNA expression analysis of archived breast ductal carcinoma in situ (DCIS) specimens, selected for their relation to the risk of subsequent breast cancer development and obtained from six different institutions. Our analyses identified six miRNAs (miR-29a, miR-221, miR-375, miR-184, miR-363, miR-455-5p) differentially expressed between DCIS lesions from women who subsequently developed an invasive breast cancer (cases) and women who did not develop invasive breast cancer within the same time interval (control). Our thorough evaluation and application of this laboratory-based miRNA sequencing analysis indicates that the preparation of small RNA cDNA libraries can reliably be performed on older, archived, clinically-classified specimens.

  4. Evaluation and Adaptation of a Laboratory-Based cDNA Library Preparation Protocol for Retrospective Sequencing of Archived MicroRNAs from up to 35-Year-Old Clinical FFPE Specimens

    Science.gov (United States)

    Loudig, Olivier; Wang, Tao; Ye, Kenny; Lin, Juan; Wang, Yihong; Ramnauth, Andrew; Liu, Christina; Stark, Azadeh; Chitale, Dhananjay; Greenlee, Robert; Multerer, Deborah; Honda, Stacey; Daida, Yihe; Spencer Feigelson, Heather; Glass, Andrew; Couch, Fergus J.; Rohan, Thomas; Ben-Dov, Iddo Z.

    2017-01-01

    Formalin-fixed paraffin-embedded (FFPE) specimens, when used in conjunction with patient clinical data history, represent an invaluable resource for molecular studies of cancer. Even though nucleic acids extracted from archived FFPE tissues are degraded, their molecular analysis has become possible. In this study, we optimized a laboratory-based next-generation sequencing barcoded cDNA library preparation protocol for analysis of small RNAs recovered from archived FFPE tissues. Using matched fresh and FFPE specimens, we evaluated the robustness and reproducibility of our optimized approach, as well as its applicability to archived clinical specimens stored for up to 35 years. We then evaluated this cDNA library preparation protocol by performing a miRNA expression analysis of archived breast ductal carcinoma in situ (DCIS) specimens, selected for their relation to the risk of subsequent breast cancer development and obtained from six different institutions. Our analyses identified six miRNAs (miR-29a, miR-221, miR-375, miR-184, miR-363, miR-455-5p) differentially expressed between DCIS lesions from women who subsequently developed an invasive breast cancer (cases) and women who did not develop invasive breast cancer within the same time interval (control). Our thorough evaluation and application of this laboratory-based miRNA sequencing analysis indicates that the preparation of small RNA cDNA libraries can reliably be performed on older, archived, clinically-classified specimens. PMID:28335433

  5. cDNA library generation from ribonucleoprotein particles.

    Science.gov (United States)

    Rederstorff, Mathieu; Hüttenhofer, Alexander

    2011-02-01

    Most, if not all, known noncoding RNAs (ncRNAs) are associated with RNA binding proteins, thus forming ribonucleoprotein particles or RNPs. Here we describe a protocol for the generation of a specialized cDNA library from RNPs, thereby increasing the proportion of functional ncRNA species in the library. To that end, cellular extracts are fractionated on 10-30% glycerol gradients. Subsequently, RNP-derived ncRNAs are isolated and 3'-tailed by cytidine triphosphate and poly(A) polymerase; this is followed by 5' adapter ligation by T4 RNA ligase. Reverse transcription of ncRNAs into cDNAs is carried out with an oligo-d(G) anchor primer. The generated cDNA libraries are subsequently submitted to high-throughput sequencing. This RNP selection procedure increases the probability of the presence of biologically relevant ncRNA species in the library compared with libraries generation methods that use size-selected, protein-devoid ncRNAs. The protocol enables the generation of deep-sequencing-compatible cDNA libraries that code for functional ncRNAs within 1 week.

  6. Ultra-deep sequencing reveals the subclonal structure and genomic evolution of oral squamous cell carcinoma

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin Jakob

    Background: Oral squamous cell carcinoma (OSCC), a subgroup of head and neck squamous cell carcinoma (HNSCC), is primarily caused by alcohol consumption and tobacco use. Recent DNA sequencing studies suggests that HNSCC are very heterogeneous between patients; however the intra-patient subclonal...... structure remains unexplored due to lack of sampling multiple tumor biopsies from each patient. Materials and methods: To examine the clonal structure and describe the genomic cancer evolution we applied whole-exome sequencing combined with targeted ultra-deep targeted sequencing on biopsies from 5stage IV...... OSCC patients. From each patient, a series of biopsies were sampled from 3 distinct geographical sites in primary tumor and 1 lymph node metastasis. A whole blood sample was taken as the matched reference. Results and discussion: Our results demonstrate that ultra-deep sequencing gives a level...

  7. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jacob

    2007-01-01

    with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression...

  8. The tetramethylammonium chloride method for screening of cDNA libraries using highly degenerate oligonucleotides obtained by backtranslation of amino-acid sequences

    DEFF Research Database (Denmark)

    Honoré, B; Madsen, Peder; Leffers, H

    1993-01-01

    We describe a method for screening of cDNA libraries with highly degenerate oligonucleotides using tetramethylammonium chloride (TMAC). This method is a convenient alternative to using probes generated by the polymerase chain reaction (PCR), especially when these cannot easily be made. Nylon...

  9. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jacob

    2007-01-01

    public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. RESULTS: Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which...

  10. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    Energy Technology Data Exchange (ETDEWEB)

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real

  11. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    Directory of Open Access Journals (Sweden)

    Chen Qi

    2011-02-01

    Full Text Available Abstract Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs. Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010. Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were

  12. A simple method for the parallel deep sequencing of full influenza A genomes

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen

    2011-01-01

    Given the major threat of influenza A to human and animal health, and its ability to evolve rapidly through mutation and reassortment, tools that enable its timely characterization are necessary to help monitor its evolution and spread. For this purpose, deep sequencing can be a very valuable too...

  13. Workup of Human Blood Samples for Deep Sequencing of HIV-1 Genomes

    NARCIS (Netherlands)

    Cornelissen, Marion; Gall, Astrid; van der Kuyl, Antoinette; Wymant, Chris; Blanquart, François; Fraser, Christophe; Berkhout, Ben

    2018-01-01

    We describe a detailed protocol for the manual workup of blood (plasma/serum) samples from individuals infected with the human immunodeficiency virus type 1 (HIV-1) for deep sequence analysis of the viral genome. The study optimizing the assay was performed in the context of the BEEHIVE (Bridging

  14. Efficient forward propagation of time-sequences in convolutional neural networks using Deep Shifting

    NARCIS (Netherlands)

    K.L. Groenland (Koen); S.M. Bohte (Sander)

    2016-01-01

    textabstractWhen a Convolutional Neural Network is used for on-the-fly evaluation of continuously updating time-sequences, many redundant convolution operations are performed. We propose the method of Deep Shifting, which remembers previously calculated results of convolution operations in order

  15. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    Science.gov (United States)

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  16. Deep sequencing of voodoo lily (Amorphophallus konjac): an approach to identify relevant genes involved in the synthesis of the hemicellulose glucomannan.

    Science.gov (United States)

    Gille, Sascha; Cheng, Kun; Skinner, Mary E; Liepman, Aaron H; Wilkerson, Curtis G; Pauly, Markus

    2011-09-01

    A Roche 454 cDNA deep sequencing experiment was performed on a developing corm of Amorphophallus konjac--also known as voodoo lily. The dominant storage polymer in the corm of this plant is the polysaccharide glucomannan, a hemicellulose known to exist in the cell walls of higher plants and a major component of plant biomass derived from softwoods. A total of 246 mega base pairs of sequence data was obtained from which 4,513 distinct contigs were assembled. Within this voodoo lily expressed sequence tag collection genes representing the carbohydrate related pathway of glucomannan biosynthesis were identified, including sucrose metabolism, nucleotide sugar conversion pathways for the formation of activated precursors as well as a putative glucomannan synthase. In vivo expression of the putative glucomannan synthase and subsequent in vitro activity assays unambiguously demonstrate that the enzyme has indeed glucomannan mannosyl- and glucosyl transferase activities. Based on the expressed sequence tag analysis hitherto unknown pathways for the synthesis of GDP-glucose, a necessary precursor for glucomannan biosynthesis, could be proposed. Moreover, the results highlight transcriptional bottlenecks for the synthesis of this hemicellulose.

  17. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    Science.gov (United States)

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.

  18. Ultra-deep sequencing of intra-host rabies virus populations during cross-species transmission.

    Science.gov (United States)

    Borucki, Monica K; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A; Messenger, Sharon; Allen, Jonathan E

    2013-11-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change.

  19. Ultra-deep sequencing of intra-host rabies virus populations during cross-species transmission.

    Directory of Open Access Journals (Sweden)

    Monica K Borucki

    2013-11-01

    Full Text Available One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350 in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009 and geographic location (northern vs. southern. A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change.

  20. A comprehensive deep sequencing strategy for full-length genomes of influenza A.

    Directory of Open Access Journals (Sweden)

    Dirk Höper

    Full Text Available Driven by the impact of influenza A viruses on human and animal health, much research is conducted on this pathogen. To support this research, we designed an all influenza A-embracing reverse transcription-PCR (RT-PCR for the generation of DNA from influenza A virus negative strand RNA genome segments for full-length genome deep sequencing on a Genome Sequencer FLX instrument. For high reliability, the RT-PCRs are designed such that every genome segment is divided into two amplicons and for the most variable segments redundancy is included. Moreover, to minimize the risk of contamination of diagnostic real-time PCRs by sequencing amplicons, RT-PCR does not generate amplicons that are amenable to RT-qPCR detection. With the presented protocol we were able to generate virtually all amplicons (99.3% success rate from isolates representing all so far known 16 hemagglutinin and 9 neuraminidase subtypes and from an additional 2009 pandemic influenza A H1N1 virus. Three isolates were sequenced to analyze the suitability of the DNA for sequencing. Moreover, we provide a short R script that disambiguates the sequences of the primers used. We show that using unambiguous primer sequences for read trimming prior to assembly with the genome sequencer assembler software results in higher quality of the final genome sequences. Using the disambiguated primer sequences, high quality full-length sequences for the three isolates used for sequencing trials could be established from the raw data in de novo assemblies.

  1. Metagenomes obtained by "deep sequencing" - what do they tell about the EBPR communities

    DEFF Research Database (Denmark)

    Albertsen, Mads; Saunders, Aaron Marc; Nielsen, Kåre Lehmann

    Albertsen Keywords: Metagenomics; Accumulibacter; Micro-diversity; Enhanced Biological Phosphorus Removal Introduction Metagenomics, or environmental genomics, provides comprehensive information about the entire microbial community of a certain ecosystem, e.g. a wastewater treatment plant. So far......, metagenomic analyses have been hampered by high costs and high level of expertise needed to conduct the investigations, but it is changing now with development of new technologies allowing analyses of billions of DNA sequences (deep-sequencing) and user-friendly pipelines for analyses of the huge data sets...... in Albertsen et al., (2011). Results and Discussion We sequenced two metagenomes from Aalborg East and West EBPR wastewater treatment plants at a depth of 12 and 8 Gb using Illumina short read sequencing. The EBPR plants form a distinct group when compared to metagenomes from a wide range of environments, both...

  2. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks.

    Science.gov (United States)

    Avsec, Žiga; Barekatain, Mohammadamin; Cheng, Jun; Gagneur, Julien

    2017-11-16

    Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries, or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at goo.gl/3yMY5w. avsec@in.tum.de; gagneur@in.tum.de. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  3. The Interlaboratory RObustness of Next-generation sequencing (IRON) study: a deep sequencing investigation of TET2, CBL and KRAS mutations by an international consortium involving 10 laboratories.

    NARCIS (Netherlands)

    Kohlmann, A.; Klein, H.U.; Weissmann, S.; Bresolin, S.; Chaplin, T.; Cuppens, H.; Haschke-Becher, E.; Garicochea, B.; Grossmann, V.; Hanczaruk, B.; Hebestreit, K.; Gabriel, C.; Iacobucci, I.; Jansen, J.H.; Kronnie, G. Te; Locht, L.T. van de; Martinelli, G.; McGowan, K.; Schweiger, M.R.; Timmermann, B.; Vandenberghe, P.; Young, B.D.; Dugas, M.; Haferlach, T.

    2011-01-01

    Massively parallel pyrosequencing allows sensitive deep sequencing to detect molecular aberrations. Thus far, data are limited on the technical performance in a clinical diagnostic setting. Here, we investigated as an international consortium the robustness, precision and reproducibility of amplicon

  4. Deep Sequencing Insights in Therapeutic shRNA Processing and siRNA Target Cleavage Precision

    Directory of Open Access Journals (Sweden)

    Hubert Denise

    2014-01-01

    Full Text Available TT-034 (PF-05095808 is a recombinant adeno-associated virus serotype 8 (AAV8 agent expressing three short hairpin RNA (shRNA pro-drugs that target the hepatitis C virus (HCV RNA genome. The cytosolic enzyme Dicer cleaves each shRNA into multiple, potentially active small interfering RNA (siRNA drugs. Using next-generation sequencing (NGS to identify and characterize active shRNAs maturation products, we observed that each TT-034–encoded shRNA could be processed into as many as 95 separate siRNA strands. Few of these appeared active as determined by Sanger 5′ RNA Ligase-Mediated Rapid Amplification of cDNA Ends (5-RACE and through synthetic shRNA and siRNA analogue studies. Moreover, NGS scrutiny applied on 5-RACE products (RACE-seq suggested that synthetic siRNAs could direct cleavage in not one, but up to five separate positions on targeted RNA, in a sequence-dependent manner. These data support an on-target mechanism of action for TT-034 without cytotoxicity and question the accepted precision of substrate processing by the key RNA interference (RNAi enzymes Dicer and siRNA-induced silencing complex (siRISC.

  5. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

    Directory of Open Access Journals (Sweden)

    Ehsaneddin Asgari

    Full Text Available We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec to refer to biological sequences in general with protein-vectors (ProtVec for proteins (amino-acid sequences and gene-vectors (GeneVec for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups. Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be

  6. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  7. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    DEFF Research Database (Denmark)

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke

    2016-01-01

    callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high...

  8. miRBase: annotating high confidence microRNAs using deep sequencing data.

    Science.gov (United States)

    Kozomara, Ana; Griffiths-Jones, Sam

    2014-01-01

    We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information.

  9. Ultra-deep sequencing reveals the subclonal structure and genomic evolution of oral squamous cell carcinoma

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin Jakob

    Background: Oral squamous cell carcinoma (OSCC), a subgroup of head and neck squamous cell carcinoma (HNSCC), is primarily caused by alcohol consumption and tobacco use. Recent DNA sequencing studies suggests that HNSCC are very heterogeneous between patients; however the intra-patient subclonal...... structure remains unexplored due to lack of sampling multiple tumor biopsies from each patient. Materials and methods: To examine the clonal structure and describe the genomic cancer evolution we applied whole-exome sequencing combined with targeted ultra-deep targeted sequencing on biopsies from 5stage IV...... of unprecedented high resolution enabling clear detection of subclonal structure and observation of otherwise undetectable mutations. Furthermore, we demonstrate that OSCC show a high degree of inter-patient heterogeneity but a low degree of intra-patient/tumor heterogeneity. However, some OSCC cancers contain...

  10. Exploring the Mechanisms of Gastrointestinal Cancer Development Using Deep Sequencing Analysis

    International Nuclear Information System (INIS)

    Matsumoto, Tomonori; Shimizu, Takahiro; Takai, Atsushi; Marusawa, Hiroyuki

    2015-01-01

    Next-generation sequencing (NGS) technologies have revolutionized cancer genomics due to their high throughput sequencing capacity. Reports of the gene mutation profiles of various cancers by many researchers, including international cancer genome research consortia, have increased over recent years. In addition to detecting somatic mutations in tumor cells, NGS technologies enable us to approach the subject of carcinogenic mechanisms from new perspectives. Deep sequencing, a method of optimizing the high throughput capacity of NGS technologies, allows for the detection of genetic aberrations in small subsets of premalignant and/or tumor cells in noncancerous chronically inflamed tissues. Genome-wide NGS data also make it possible to clarify the mutational signatures of each cancer tissue by identifying the precise pattern of nucleotide alterations in the cancer genome, providing new information regarding the mechanisms of tumorigenesis. In this review, we highlight these new methods taking advantage of NGS technologies, and discuss our current understanding of carcinogenic mechanisms elucidated from such approaches

  11. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    Directory of Open Access Journals (Sweden)

    Gomes Paula

    2010-10-01

    Full Text Available Abstract Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR and their RNA transcription level by quantitative PCR (q

  12. An introduction to deep learning on biological sequence data: examples and solutions.

    Science.gov (United States)

    Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae

    2017-11-15

    Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. Deep sequence characterisation of a divergent HPIV-4a from an adult with prolonged influenza-like illness

    Directory of Open Access Journals (Sweden)

    Katherine E. Arden

    2015-12-01

    Deep sequencing allowed identification and genomic characterisation of a possible pathogen from an ILI as well as being an important tool to aid future understanding of the linkages between viral genetic variation, transmission and disease prognosis.

  14. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  15. Deep sequencing discovery of novel and conserved microRNAs in trifoliate orange (Citrus trifoliata

    Directory of Open Access Journals (Sweden)

    Yu Huaping

    2010-07-01

    Full Text Available Abstract Background MicroRNAs (miRNAs play a critical role in post-transcriptional gene regulation and have been shown to control many genes involved in various biological and metabolic processes. There have been extensive studies to discover miRNAs and analyze their functions in model plant species, such as Arabidopsis and rice. Deep sequencing technologies have facilitated identification of species-specific or lowly expressed as well as conserved or highly expressed miRNAs in plants. Results In this research, we used Solexa sequencing to discover new microRNAs in trifoliate orange (Citrus trifoliata which is an important rootstock of citrus. A total of 13,106,753 reads representing 4,876,395 distinct sequences were obtained from a short RNA library generated from small RNA extracted from C. trifoliata flower and fruit tissues. Based on sequence similarity and hairpin structure prediction, we found that 156,639 reads representing 63 sequences from 42 highly conserved miRNA families, have perfect matches to known miRNAs. We also identified 10 novel miRNA candidates whose precursors were all potentially generated from citrus ESTs. In addition, five miRNA* sequences were also sequenced. These sequences had not been earlier described in other plant species and accumulation of the 10 novel miRNAs were confirmed by qRT-PCR analysis. Potential target genes were predicted for most conserved and novel miRNAs. Moreover, four target genes including one encoding IRX12 copper ion binding/oxidoreductase and three genes encoding NB-LRR disease resistance protein have been experimentally verified by detection of the miRNA-mediated mRNA cleavage in C. trifoliata. Conclusion Deep sequencing of short RNAs from C. trifoliata flowers and fruits identified 10 new potential miRNAs and 42 highly conserved miRNA families, indicating that specific miRNAs exist in C. trifoliata. These results show that regulatory miRNAs exist in agronomically important trifoliate orange

  16. Genetic variation of human papillomavirus type 16 in individual clinical specimens revealed by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Iwao Kukimoto

    Full Text Available Viral genetic diversity within infected cells or tissues, called viral quasispecies, has been mostly studied for RNA viruses, but has also been described among DNA viruses, including human papillomavirus type 16 (HPV16 present in cervical precancerous lesions. However, the extent of HPV genetic variation in cervical specimens, and its involvement in HPV-induced carcinogenesis, remains unclear. Here, we employ deep sequencing to comprehensively analyze genetic variation in the HPV16 genome isolated from individual clinical specimens. Through overlapping full-circle PCR, approximately 8-kb DNA fragments covering the whole HPV16 genome were amplified from HPV16-positive cervical exfoliated cells collected from patients with either low-grade squamous intraepithelial lesion (LSIL or invasive cervical cancer (ICC. Deep sequencing of the amplified HPV16 DNA enabled de novo assembly of the full-length HPV16 genome sequence for each of 7 specimens (5 LSIL and 2 ICC samples. Subsequent alignment of read sequences to the assembled HPV16 sequence revealed that 2 LSILs and 1 ICC contained nucleotide variations within E6, E1 and the non-coding region between E5 and L2 with mutation frequencies of 0.60% to 5.42%. In transient replication assays, a novel E1 mutant found in ICC, E1 Q381E, showed reduced ability to support HPV16 origin-dependent replication. In addition, partially deleted E2 genes were detected in 1 LSIL sample in a mixed state with the intact E2 gene. Thus, the methods used in this study provide a fundamental framework for investigating the influence of HPV somatic genetic variation on cervical carcinogenesis.

  17. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    Science.gov (United States)

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  18. Role of normalization in the elimination of abundant myelin sequences in spinal cord cDNA libraries produced by suppression subtractive hybridization.

    Science.gov (United States)

    Lathia, K B; Yan, Z; Clapshaw, P A

    2009-12-01

    Spinal cord libraries subtracted against visual cortex using suppression subtractive hybridization SSH are dominated by abundant gene sequences derived from myelin elements. We compared our subtracted library results of three of these abundant sequences to published expressed sequence tag libraries that are not normalized and not subtracted and presumed representatives of murine spinal cord mRNA abundance. We show that: all three abundant sequences, myelin basic protein (Mbp), proteolipid protein (Plp1) and Ferretin heavy chain (Fth1) are highly expressed in spinal cord when this structure is compared to visual cortex; myelin basic protein is represented in our subtracted libraries but at a low frequency, whereas Plp1 and Fth1 represent nearly one-third of all sequences in these libraries; mirror orientation selection, a procedure designed to reduce background sequences, generates libraries very similar in abundance to SSH; proteolipid protein can be reduced in these libraries by adding Plp1 sequences to the driver in the SSH procedure and also by subtracting Plp1 directly from tester and driver. We conclude that adequate normalization is essential to reduce the presence of abundant sequences in SSH libraries.

  19. Molecular cloning of growth hormone encoding cDNA of Indian ...

    Indian Academy of Sciences (India)

    A modified rapid amplification of cDNA ends (RACE) strategy has been developed for cloning highly conserved cDNA sequences. Using this modified method, the growth hormone (GH) encoding cDNA sequences of Labeo rohita, Cirrhina mrigala and Catla catla have been cloned, characterized and overexpressed in ...

  20. Molecular cloning of growth hormone encoding cDNA of Indian

    Indian Academy of Sciences (India)

    A modified rapid amplification of cDNA ends (RACE) strategy has been developed for cloning highly conserved cDNA sequences. Using this modified method, the growth hormone (GH) encoding cDNA sequences of Labeo rohita, Cirrhina mrigala and Catla catla have been cloned, characterized and overexpressed in ...

  1. Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.

    Science.gov (United States)

    Ragan, Chikako; Mowry, Bryan J; Bauer, Denis C

    2012-09-01

    Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.

  2. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    Science.gov (United States)

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  3. 3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.

    Science.gov (United States)

    Goldfarb, Katherine C; Cech, Thomas R

    2013-09-21

    Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.

  4. Graphical classification of DNA sequences of HLA alleles by deep learning.

    Science.gov (United States)

    Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi

    2018-04-01

    Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

  5. Virus pathotype and deep sequencing of the HA gene of a low pathogenicity H7N1 avian influenza virus causing mortality in Turkeys.

    Directory of Open Access Journals (Sweden)

    Munir Iqbal

    Full Text Available Low pathogenicity avian influenza (LPAI viruses of the H7 subtype generally cause mild disease in poultry. However the evolution of a LPAI virus into highly pathogenic avian influenza (HPAI virus results in the generation of a virus that can cause severe disease and death. The classification of these two pathotypes is based, in part, on disease signs and death in chickens, as assessed in an intravenous pathogenicity test, but the effect of LPAI viruses in turkeys is less well understood. During an investigation of LPAI virus infection of turkeys, groups of three-week-old birds inoculated with A/chicken/Italy/1279/99 (H7N1 showed severe disease signs and died or were euthanised within seven days of infection. Virus was detected in many internal tissues and organs from culled birds. To examine the possible evolution of the infecting virus to a highly pathogenic form in these turkeys, sequence analysis of the haemagglutinin (HA gene cleavage site was carried out by analysing multiple cDNA amplicons made from swabs and tissue sample extracts employing Sanger and Next Generation Sequencing. In addition, a RT-PCR assay to detect HPAI virus was developed. There was no evidence of the presence of HPAI virus in either the virus used as inoculum or from swabs taken from infected birds. However, a small proportion (<0.5% of virus carried in individual tracheal or liver samples did contain a molecular signature typical of a HPAI virus at the HA cleavage site. All the signature sequences were identical and were similar to HPAI viruses collected during the Italian epizootic in 1999/2000. We assume that the detection of HPAI virus in tissue samples following infection with A/chicken/Italy/1279/99 reflected amplification of a virus present at very low levels within the mixed inoculum but, strikingly, we observed no new HPAI virus signatures in the amplified DNA analysed by deep-sequencing.

  6. Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding

    Directory of Open Access Journals (Sweden)

    Douglas Carl J

    2008-01-01

    Full Text Available Abstract Background The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions. Results As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa × P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones

  7. Metatranscriptomic analysis of small RNAs present in soybean deep sequencing libraries

    Directory of Open Access Journals (Sweden)

    Lorrayne Gomes Molina

    2012-01-01

    Full Text Available A large number of small RNAs unrelated to the soybean genome were identified after deep sequencing of soybean small RNA libraries. A metatranscriptomic analysis was carried out to identify the origin of these sequences. Comparative analyses of small interference RNAs (siRNAs present in samples collected in open areas corresponding to soybean field plantations and samples from soybean cultivated in greenhouses under a controlled environment were made. Different pathogenic, symbiotic and free-living organisms were identified from samples of both growth systems. They included viruses, bacteria and different groups of fungi. This approach can be useful not only to identify potentially unknown pathogens and pests, but also to understand the relations that soybean plants establish with microorganisms that may affect, directly or indirectly, plant health and crop production.

  8. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, David (UWASH); (Scripps); (NRL)

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.

  9. cDNA encoding the chicken ortholog of the mouse dilute gene product. Sequence comparison reveals a myosin I subfamily with conserved C-terminal domains.

    Science.gov (United States)

    Sanders, G; Lichte, B; Meyer, H E; Kilimann, M W

    1992-10-26

    We report the cDNA-deduced primary structure of the chicken counterpart of the murine dilute gene product, a member of the myosin I family. Comparison of the chicken and mouse sequences reveals a distinct pattern of domains of high and low sequence conservation. An internal deletion of 25 amino acids probably reflects differential mRNA processing. Compared with other myosin heavy chain molecules, sequence similarity is highest with the MYO2 gene product of Saccharomyces cerevisiae. The MYO2 protein, implicated in vectorial vesicle transport, is homologous to the dilute protein over practically its entire length. In addition, the C-terminal domain of the dilute protein is highly similar to a putative glutamic acid decarboxylase sequence cloned from mouse brain. Alternatively, this closely related clone might represent an isoform of the dilute protein derived from a second gene, potentially involved in genetic conditions related to dilute.

  10. Identification of miRNAs and their target genes in developing soybean seeds by deep sequencing

    Directory of Open Access Journals (Sweden)

    Chen Shou-Yi

    2011-01-01

    Full Text Available Abstract Background MicroRNAs (miRNAs regulate gene expression by mediating gene silencing at transcriptional and post-transcriptional levels in higher plants. miRNAs and related target genes have been widely studied in model plants such as Arabidopsis and rice; however, the number of identified miRNAs in soybean (Glycine max is limited, and global identification of the related miRNA targets has not been reported in previous research. Results In our study, a small RNA library and a degradome library were constructed from developing soybean seeds for deep sequencing. We identified 26 new miRNAs in soybean by bioinformatic analysis and further confirmed their expression by stem-loop RT-PCR. The miRNA star sequences of 38 known miRNAs and 8 new miRNAs were also discovered, providing additional evidence for the existence of miRNAs. Through degradome sequencing, 145 and 25 genes were identified as targets of annotated miRNAs and new miRNAs, respectively. GO analysis indicated that many of the identified miRNA targets may function in soybean seed development. Additionally, a soybean homolog of Arabidopsis SUPPRESSOR OF GENE SLIENCING 3 (AtSGS3 was detected as a target of the newly identified miRNA Soy_25, suggesting the presence of feedback control of miRNA biogenesis. Conclusions We have identified large numbers of miRNAs and their related target genes through deep sequencing of a small RNA library and a degradome library. Our study provides more information about the regulatory network of miRNAs in soybean and advances our understanding of miRNA functions during seed development.

  11. Molecular cloning of lupin leghemoglobin cDNA

    DEFF Research Database (Denmark)

    Konieczny, A; Jensen, E O; Marcker, K A

    1987-01-01

    Poly(A)+ RNA isolated from root nodules of yellow lupin (Lupinus luteus, var. Ventus) has been used as a template for the construction of a cDNA library. The ds cDNA was synthesized and inserted into the Hind III site of plasmid pBR 322 using synthetic Hind III linkers. Clones containing sequences...... specific for nodules were selected by differential colony hybridization using 32P-labeled cDNA synthesized either from nodule poly(A)+ RNA or from poly(A)+ RNA of uninfected root as probes. Among the recombinant plasmids, the cDNA gene for leghemoglobin was identified. The protein structure derived from...... its nucleotide sequence was consistent with known amino acid sequence of lupin Lb II. The cloned lupin Lb cDNA hybridized to poly(A)+ RNA from nodules only, which is in accordance with the general concept, that leghemoglobin is expressed exclusively in nodules. Udgivelsesdato: 1987-null...

  12. Generation and analysis of a large-scale expressed sequence tags from a full-length enriched cDNA library of Siberian tiger (Panthera tigris altaica).

    Science.gov (United States)

    Guo, Yu; Liu, Changqing; Lu, Taofeng; Liu, Dan; Bai, Chunyu; Li, Xiangchen; Ma, Yuehui; Guan, Weijun

    2014-05-15

    In this study, a full-length enriched cDNA library was successfully constructed from Siberian tiger, the world's most endangered species. The titers of primary and amplified libraries were 1.28×10(6)pfu/mL and 1.59×10(10)pfu/mL respectively. The proportion of recombinants from unamplified library was 91.3% and the average length of exogenous inserts was 1.06kb. A total of 279 individual ESTs with sizes ranging from 316 to 1258bps were then analyzed. Furthermore, 204 unigenes were successfully annotated and involved in 49 functions of the GO classification, cell (175, 85.5%), cellular process (165, 80.9%), and binding (152, 74.5%) are the dominant terms. 198 unigenes were assigned to 156 KEGG pathways, and the pathways with the most representation are metabolic pathways (18, 9.1%). The proportion pattern of each COG subcategory was similar among Panthera tigris altaica, P. tigris tigris and Homo sapiens, and general function prediction only cluster (44, 15.8%) represents the largest group, followed by translation, ribosomal structure and biogenesis (33, 11.8%), replication, recombination and repair (24, 8.6%), and only 7.2% ESTs classified as novel genes. Moreover, the recombinant plasmid pET32a-TAT-COL6A2 was constructed, coded for the Trx-TAT-COL6A2 fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-COL6A2 recombinant protein was 2.64±0.18mg/mL. This library will provide a useful platform for the functional genome and transcriptome research of for the P. tigris and other felid animals in the future. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. SECLAF: A Webserver and Deep Neural Network Design Tool for Hierarchical Biological Sequence Classification.

    Science.gov (United States)

    Szalkai, Balázs; Grolmusz, Vince

    2018-02-27

    Artificial intelligence (AI) tools are gaining more and more ground each year in bioinformatics. Learning algorithms can be taught for specific tasks by using the existing enormous biological databases, and the resulting models can be used for the high-quality classification of novel, un-categorized data in numerous areas, including biological sequence analysis. Here we introduce SECLAF, a webserver that uses deep neural networks for hierarchical biological sequence classification. By applying SECLAF for residue-sequences, we have reported (Methods (2018), https://doi.org/10.1016/j.ymeth.2017.06.034) the most accurate multi-label protein classifier to date (UniProt -into 698 classes- AUC 99.99%; Gene Ontology -into 983 classes- AUC 99.45%). Our framework SECLAF can be applied for other sequence classification tasks, as we describe in the present contribution. The program SECLAF is implemented in Python, and is available for download, with example datasets at the website https://pitgroup.org/seclaf/. For Gene Ontology and UniProt based classifications a webserver is also available at the address above. grolmusz@pitgroup.org and szalkai@pitgroup.org.

  14. Deep sequencing reveals as-yet-undiscovered small RNAs in Escherichia coli

    Directory of Open Access Journals (Sweden)

    Hirano Reiko

    2011-08-01

    Full Text Available Abstract Background In Escherichia coli, approximately 100 regulatory small RNAs (sRNAs have been identified experimentally and many more have been predicted by various methods. To provide a comprehensive overview of sRNAs, we analysed the low-molecular-weight RNAs (E. coli with deep sequencing, because the regulatory RNAs in bacteria are usually 50-200 nt in length. Results We discovered 229 novel candidate sRNAs (≥ 50 nt with computational or experimental evidence of transcription initiation. Among them, the expression of seven intergenic sRNAs and three cis-antisense sRNAs was detected by northern blot analysis. Interestingly, five novel sRNAs are expressed from prophage regions and we note that these sRNAs have several specific characteristics. Furthermore, we conducted an evolutionary conservation analysis of the candidate sRNAs and summarised the data among closely related bacterial strains. Conclusions This comprehensive screen for E. coli sRNAs using a deep sequencing approach has shown that many as-yet-undiscovered sRNAs are potentially encoded in the E. coli genome. We constructed the Escherichia coli Small RNA Browser (ECSBrowser; http://rna.iab.keio.ac.jp/, which integrates the data for previously identified sRNAs and the novel sRNAs found in this study.

  15. Molecular cloning of lupin leghemoglobin cDNA

    DEFF Research Database (Denmark)

    Konieczny, A; Jensen, E O; Marcker, K A

    1987-01-01

    Poly(A)+ RNA isolated from root nodules of yellow lupin (Lupinus luteus, var. Ventus) has been used as a template for the construction of a cDNA library. The ds cDNA was synthesized and inserted into the Hind III site of plasmid pBR 322 using synthetic Hind III linkers. Clones containing sequences...... its nucleotide sequence was consistent with known amino acid sequence of lupin Lb II. The cloned lupin Lb cDNA hybridized to poly(A)+ RNA from nodules only, which is in accordance with the general concept, that leghemoglobin is expressed exclusively in nodules. Udgivelsesdato: 1987-null...

  16. Deep sequencing analysis of the developing mouse brain reveals a novel microRNA

    Directory of Open Access Journals (Sweden)

    Piltz Sandra

    2011-04-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are small non-coding RNAs that can exert multilevel inhibition/repression at a post-transcriptional or protein synthesis level during disease or development. Characterisation of miRNAs in adult mammalian brains by deep sequencing has been reported previously. However, to date, no small RNA profiling of the developing brain has been undertaken using this method. We have performed deep sequencing and small RNA analysis of a developing (E15.5 mouse brain. Results We identified the expression of 294 known miRNAs in the E15.5 developing mouse brain, which were mostly represented by let-7 family and other brain-specific miRNAs such as miR-9 and miR-124. We also discovered 4 putative 22-23 nt miRNAs: mm_br_e15_1181, mm_br_e15_279920, mm_br_e15_96719 and mm_br_e15_294354 each with a 70-76 nt predicted pre-miRNA. We validated the 4 putative miRNAs and further characterised one of them, mm_br_e15_1181, throughout embryogenesis. Mm_br_e15_1181 biogenesis was Dicer1-dependent and was expressed in E3.5 blastocysts and E7 whole embryos. Embryo-wide expression patterns were observed at E9.5 and E11.5 followed by a near complete loss of expression by E13.5, with expression restricted to a specialised layer of cells within the developing and early postnatal brain. Mm_br_e15_1181 was upregulated during neurodifferentiation of P19 teratocarcinoma cells. This novel miRNA has been identified as miR-3099. Conclusions We have generated and analysed the first deep sequencing dataset of small RNA sequences of the developing mouse brain. The analysis revealed a novel miRNA, miR-3099, with potential regulatory effects on early embryogenesis, and involvement in neuronal cell differentiation/function in the brain during late embryonic and early neonatal development.

  17. Single pass cDNA sequencing - a powerful tool to analyse gene expression in preparasytic juveniles of the southern root-knot nematode Meliodogyne incognita

    NARCIS (Netherlands)

    Dautova, M.; Rosso, M.N.; Abad, P.; Gommers, F.L.; Bakker, J.; Smant, G.

    2001-01-01

    Expressed sequence tags (EST) have been widely used to assist in gene discovery in various organisms (e.g., Arabidopsis thaliana, Caenorhabditis elegans, Mus musculus, and Homo sapiens). In this paper we describe an EST project, which aims to investigate gene expression in Meloidogyne incognita at

  18. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2011-03-01

    Full Text Available Abstract Background Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs. Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. Findings We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads, and a complete genome re-sequencing (45.3 Million reads. This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. Conclusions This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.

  19. Anchoring a Defined Sequence to the 55' Ends of mRNAs : The Bolt to Clone Rare Full Length mRNAs and Generate cDNA Libraries porn a Few Cells.

    Science.gov (United States)

    Baptiste, J; Milne Edwards, D; Delort, J; Mallet, J

    1993-01-01

    Among numerous applications, the polymerase chain reaction (PCR) (1,2) provides a convenient means to clone 5' ends of rare mRNAs and to generate cDNA libraries from tissue available in amounts too low to be processed by conventional methods. Basically, the amplification of cDNAs by the PCR requires the availability of the sequences of two stretches of the molecule to be amplified. A sequence can easily be imposed at the 5' end of the first-strand cDNAs (corresponding to the 3' end of the mRNAs) by priming the reverse transcription with a specific primer (for cloning the 5' end of rare messenger) or with an oligonucleotide tailored with a poly (dT) stretch (for cDNA library construction), taking advantage of the poly (A) sequence that is located at the 3' end of mRNAs. Several strategies have been devised to tag the 3' end of the ss-cDNAs (corresponding to the 55' end of the mRNAs). We (3) and others have described strategies based on the addition of a homopolymeric dG (4,5) or dA (6,7) tail using terminal deoxyribonucleotide transferase (TdT) ("anchor-PCR" [4]). However, this strategy has important limitations. The TdT reaction is difficult to control and has a low efficiency (unpublished observations). But most importantly, the return primers containing a homopolymeric (dC or dT) tail generate nonspecific amplifications, a phenomenon that prevents the isolation of low abundance mRNA species and/or interferes with the relative abundance of primary clones in the library. To circumvent these drawbacks, we have used two approaches. First, we devised a strategy based on a cRNA enrichment procedure, which has been useful to eliminate nonspecific-PCR products and to allow detection and cloning of cDNAs of low abundance (3). More recently, to avoid the nonspecific amplification resulting from the annealing of the homopolymeric tail oligonucleotide, we have developed a novel anchoring strategy that is based on the ligation of an oligonucleotide to the 35' end of ss

  20. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Directory of Open Access Journals (Sweden)

    Kudrna David

    2011-03-01

    Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae

  1. Deep Sequencing Analysis of Nucleolar Small RNAs: RNA Isolation and Library Preparation.

    Science.gov (United States)

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    The nucleolus is a subcellular compartment with a key essential function in ribosome biogenesis. The nucleolus is rich in noncoding RNAs, mostly the ribosomal RNAs and small nucleolar RNAs. Surprisingly, also several miRNAs have been detected in the nucleolus, raising the question as to whether other small RNA species are present and functional in the nucleolus. We have developed a strategy for stepwise enrichment of nucleolar small RNAs from the total nucleolar RNA extracts and subsequent construction of nucleolar small RNA libraries which are suitable for deep sequencing. Our method successfully isolates the small RNA population from total RNAs and monitors the RNA quality in each step to ensure that small RNAs recovered represent the actual small RNA population in the nucleolus and not degradation products from larger RNAs. We have further applied this approach to characterize the distribution of small RNAs in different cellular compartments.

  2. Polymorphism identification and improved genome annotation of Brassica rapa through Deep RNA sequencing.

    Science.gov (United States)

    Devisetty, Upendra Kumar; Covington, Michael F; Tat, An V; Lekkala, Saradadevi; Maloof, Julin N

    2014-08-12

    The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes-R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)-using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/. Copyright © 2014 Devisetty et al.

  3. Molecular cloning and characterization of a cDNA encoding ...

    African Journals Online (AJOL)

    enoh

    2012-03-29

    Nanjing) co., Ltd. The nucleotide sequences of these primers are as follows: ..... Ebizuka Y (2000). Molecular cloning and characterization of a cDNA for Glycyrrhiza glabra cycloartenol synthase. Biol. Pharm. Bull. 23(2):231-234.

  4. Deep sequencing reveals low incidence of endogenous LINE-1 retrotransposition in human induced pluripotent stem cells.

    Directory of Open Access Journals (Sweden)

    Hubert Arokium

    Full Text Available Long interspersed element-1 (LINE-1 or L1 retrotransposition induces insertional mutations that can result in diseases. It was recently shown that the copy number of L1 and other retroelements is stable in induced pluripotent stem cells (iPSCs. However, by using an engineered reporter construct over-expressing L1, another study suggests that reprogramming activates L1 mobility in iPSCs. Given the potential of human iPSCs in therapeutic applications, it is important to clarify whether these cells harbor somatic insertions resulting from endogenous L1 retrotransposition. Here, we verified L1 expression during and after reprogramming as well as potential somatic insertions driven by the most active human endogenous L1 subfamily (L1Hs. Our results indicate that L1 over-expression is initiated during the reprogramming process and is subsequently sustained in isolated clones. To detect potential somatic insertions in iPSCs caused by L1Hs retotransposition, we used a novel sequencing strategy. As opposed to conventional sequencing direction, we sequenced from the 3' end of L1Hs to the genomic DNA, thus enabling the direct detection of the polyA tail signature of retrotransposition for verification of true insertions. Deep coverage sequencing thus allowed us to detect seven potential somatic insertions with low read counts from two iPSC clones. Negative PCR amplification in parental cells, presence of a polyA tail and absence from seven L1 germline insertion databases highly suggested true somatic insertions in iPSCs. Furthermore, these insertions could not be detected in iPSCs by PCR, likely due to low abundance. We conclude that L1Hs retrotransposes at low levels in iPSCs and therefore warrants careful analyses for genotoxic effects.

  5. Deep RNA sequencing of the skeletal muscle transcriptome in swimming fish.

    Directory of Open Access Journals (Sweden)

    Arjan P Palstra

    Full Text Available Deep RNA sequencing (RNA-seq was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss with the specific objective to identify expressed genes and quantify the transcriptomic effects of swimming-induced exercise. Pubertal autumn-spawning seawater-raised female rainbow trout were rested (n = 10 or swum (n = 10 for 1176 km at 0.75 body-lengths per second in a 6,000-L swim-flume under reproductive conditions for 40 days. Red and white muscle RNA of exercised and non-exercised fish (4 lanes was sequenced and resulted in 15-17 million reads per lane that, after de novo assembly, yielded 149,159 red and 118,572 white muscle contigs. Most contigs were annotated using an iterative homology search strategy against salmonid ESTs, the zebrafish Danio rerio genome and general Metazoan genes. When selecting for large contigs (>500 nucleotides, a number of novel rainbow trout gene sequences were identified in this study: 1,085 and 1,228 novel gene sequences for red and white muscle, respectively, which included a number of important molecules for skeletal muscle function. Transcriptomic analysis revealed that sustained swimming increased transcriptional activity in skeletal muscle and specifically an up-regulation of genes involved in muscle growth and developmental processes in white muscle. The unique collection of transcripts will contribute to our understanding of red and white muscle physiology, specifically during the long-term reproductive migration of salmonids.

  6. On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach.

    Directory of Open Access Journals (Sweden)

    Yu-Hui Qu

    Full Text Available DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew's correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins.

  7. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Anne Bruun Krøigård

    Full Text Available Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  8. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples.

    Science.gov (United States)

    Matranga, Christian B; Andersen, Kristian G; Winnicki, Sarah; Busby, Michele; Gladden, Adrianne D; Tewhey, Ryan; Stremlau, Matthew; Berlin, Aaron; Gire, Stephen K; England, Eleina; Moses, Lina M; Mikkelsen, Tarjei S; Odia, Ikponmwonsa; Ehiane, Philomena E; Folarin, Onikepe; Goba, Augustine; Kahn, S Humarr; Grant, Donald S; Honko, Anna; Hensley, Lisa; Happi, Christian; Garry, Robert F; Malboeuf, Christine M; Birren, Bruce W; Gnirke, Andreas; Levin, Joshua Z; Sabeti, Pardis C

    2014-01-01

    We have developed a robust RNA sequencing method for generating complete de novo assemblies with intra-host variant calls of Lassa and Ebola virus genomes in clinical and biological samples. Our method uses targeted RNase H-based digestion to remove contaminating poly(rA) carrier and ribosomal RNA. This depletion step improves both the quality of data and quantity of informative reads in unbiased total RNA sequencing libraries. We have also developed a hybrid-selection protocol to further enrich the viral content of sequencing libraries. These protocols have enabled rapid deep sequencing of both Lassa and Ebola virus and are broadly applicable to other viral genomics studies.

  9. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease.

    Science.gov (United States)

    Zhou, Fusheng; Cao, Hongzhi; Zuo, Xianbo; Zhang, Tao; Zhang, Xiaoguang; Liu, Xiaomin; Xu, Ricong; Chen, Gang; Zhang, Yuanwei; Zheng, Xiaodong; Jin, Xin; Gao, Jinping; Mei, Junpu; Sheng, Yujun; Li, Qibin; Liang, Bo; Shen, Juan; Shen, Changbing; Jiang, Hui; Zhu, Caihong; Fan, Xing; Xu, Fengping; Yue, Min; Yin, Xianyong; Ye, Chen; Zhang, Cuicui; Liu, Xiao; Yu, Liang; Wu, Jinghua; Chen, Mengyun; Zhuang, Xuehan; Tang, Lili; Shao, Haojing; Wu, Longmao; Li, Jian; Xu, Yu; Zhang, Yijie; Zhao, Suli; Wang, Yu; Li, Ge; Xu, Hanshi; Zeng, Lei; Wang, Jianan; Bai, Mingzhou; Chen, Yanling; Chen, Wei; Kang, Tian; Wu, Yanyan; Xu, Xun; Zhu, Zhengwei; Cui, Yong; Wang, Zaixing; Yang, Chunjun; Wang, Peiguang; Xiang, Leihong; Chen, Xiang; Zhang, Anping; Gao, Xinghua; Zhang, Furen; Xu, Jinhua; Zheng, Min; Zheng, Jie; Zhang, Jianzhong; Yu, Xueqing; Li, Yingrui; Yang, Sen; Yang, Huanming; Wang, Jian; Liu, Jianjun; Hammarström, Lennart; Sun, Liangdan; Wang, Jun; Zhang, Xuejun

    2016-07-01

    The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders.

  10. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    Directory of Open Access Journals (Sweden)

    Aurélien Chateigner

    2015-07-01

    Full Text Available Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%. K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs. Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential.

  11. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    Directory of Open Access Journals (Sweden)

    Nacu Serban

    2011-01-01

    Full Text Available Abstract Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs, have been estimated using expressed sequence tag (EST libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal

  12. [Whole cDNA sequence cloning and expression of chicken L-FABP gene and its relationship with lipid deposition of hybrid chickens].

    Science.gov (United States)

    Yu, Ying; Wang, Dong; Sun, Dong-Xiao; Xu, Gui-Yun; Li, Jun-Ying; Zhang, Yuan

    2011-07-01

    Liver fatty acid-binding protein (L-FABP) is closely related to intracellular transportation and deposition of lipids. A positive differential displayed fragment was found in the liver tissue among Silkie (CC), CAU-brown chicken (CD), and their reciprocal hybrids (CD and DC) at 8 weeks-old using differential display RT-PCR techniques (DDRT-PCR). Through recycling, sequencing, and alignment analysis, the fragment was identified as chicken liver fatty acid-binding protein gene (L-FABP, GenBank accession number AY321365). Reverse Northern dot blot and semi-quantitative RT-PCR revealed that the avian L-FABP gene was over-expressed in the liver tissue of the reciprocal hybrids (CD and DC) compared to their parental lines (CC and DD), which was consistent with the fact that higher abdomen fat weight and wider inter-muscular fat width observed in the reciprocal hybrids. Considering the higher expression of L-FABP may contribute to the increased lipid deposition in the hybrid chickens, the functional study of avian L-FABP is warranted in future.

  13. Ultra-deep sequencing of mouse mitochondrial DNA: mutational patterns and their origins.

    Directory of Open Access Journals (Sweden)

    Adam Ameur

    2011-03-01

    Full Text Available Somatic mutations of mtDNA are implicated in the aging process, but there is no universally accepted method for their accurate quantification. We have used ultra-deep sequencing to study genome-wide mtDNA mutation load in the liver of normally- and prematurely-aging mice. Mice that are homozygous for an allele expressing a proof-reading-deficient mtDNA polymerase (mtDNA mutator mice have 10-times-higher point mutation loads than their wildtype siblings. In addition, the mtDNA mutator mice have increased levels of a truncated linear mtDNA molecule, resulting in decreased sequence coverage in the deleted region. In contrast, circular mtDNA molecules with large deletions occur at extremely low frequencies in mtDNA mutator mice and can therefore not drive the premature aging phenotype. Sequence analysis shows that the main proportion of the mutation load in heterozygous mtDNA mutator mice and their wildtype siblings is inherited from their heterozygous mothers consistent with germline transmission. We found no increase in levels of point mutations or deletions in wildtype C57Bl/6N mice with increasing age, thus questioning the causative role of these changes in aging. In addition, there was no increased frequency of transversion mutations with time in any of the studied genotypes, arguing against oxidative damage as a major cause of mtDNA mutations. Our results from studies of mice thus indicate that most somatic mtDNA mutations occur as replication errors during development and do not result from damage accumulation in adult life.

  14. Transcriptome walking: a laboratory-oriented GUI-based approach to mRNA identification from deep-sequenced data.

    Science.gov (United States)

    French, Andrew S

    2012-12-05

    Deep sequencing technology provides efficient and economical production of large numbers of randomly positioned, relatively short, estimates of base identities in DNA molecules. Application of this technology to mRNA samples allows rapid examination of the molecular genetic environment in individual cells or tissues, the transcriptome. However, assembly of such short sequences into complete mRNA creates a challenge that limits the usefulness of the technology, particularly when no, or limited, genomic data is available. Several approaches to this problem have been developed, but there is still no general method to rapidly obtain an mRNA sequence from deep sequence data when a specific molecule, or family of molecules, are of interest. A frequent requirement is to identify specific mRNA molecules from tissues that are being investigated by methods such as electrophysiology, immunocytology and pharmacology. To be widely useful, any approach must be relatively simple to use in the laboratory by operators without extensive statistical or bioinformatics knowledge, and with readily available hardware. An approach was developed that allows de novo assembly of individual mRNA sequences in two linked stages: sequence discovery and sequence completion. Both stages rely on computer assisted, Graphical User Interface (GUI)-guided, user interaction with the data, but proceed relatively efficiently once discovery is complete. The method grows a discovered sequence by repeated passes through the complete raw data in a series of steps, and is hence termed 'transcriptome walking'. All of the operations required for transcriptome analysis are combined in one program that presents a relatively simple user interface and runs on a standard desktop, or laptop computer, but takes advantage of multi-core processors, when available. Complete mRNA sequence identifications usually require less than 24 hours. This approach has already identified previously unknown mRNA sequences in two animal

  15. CHH family peptides from an 'eyeless' deep-sea hydrothermal vent shrimp, Rimicaris kairei: characterization and sequence analysis.

    Science.gov (United States)

    Qian, Ye-Qing; Dai, Li; Yang, Jin-Shu; Yang, Fan; Chen, Dian-Fu; Fujiwara, Yoshihiro; Tsuchida, Shinji; Nagasawa, Hiromichi; Yang, Wei-Jun

    2009-09-01

    The crustacean eyestalk synthesizes and secretes several structurally-related peptides belonging to the crustacean hyperglycemic hormone (CHH) family, which are considered major physiological regulators during the crustacean life cycle. However, it is intriguing that eyestalks of many hydrothermal vent crustaceans prove to have varying degrees of reduction. In the present study, we characterized full-length cDNAs encoding two important eyestalk hormones of the CHH family, CHH and VIH (vitellogenesis-inhibiting hormone), from the 'eyeless' hydrothermal vent shrimp Rimicaris kairei. The two isoforms of Chh cDNA were 1027 and 1877 bp in length, respectively, and the deduced preprohormones contained 137 and 138 aa, respectively. The Vih cDNA was 907 bp in length, encoding a putative preprohormone of 113 aa. When compared with other known protein sequences of CHHs and VIHs, these polypeptides from hydrothermal vents show high similarity with their non-vent counterparts. These results may provide evidence for the mechanisms of eyestalk reduction and vent-adapting evolution of crustaceans. The hydrothermal vent shrimp with reduced eyestalks may take a different evolutionary pathway than eyestalk-holding crustaceans, and the reduced eyestalks can be considered a good example for the investigation of the diversity of crustacean evolution in different environments.

  16. Identification of MicroRNAs in Meloidogyne incognita Using Deep Sequencing.

    Directory of Open Access Journals (Sweden)

    Yunsheng Wang

    Full Text Available MicroRNAs play important regulatory roles in eukaryotic lineages. In this paper, we employed deep sequencing technology to sequence and identify microRNAs in M. incognita genome, which is one of the important plant parasitic nematodes. We identified 102 M. incognita microRNA genes, which can be grouped into 71 nonredundant miRNAs based on mature sequences. Among the 71 miRANs, 27 are known miRNAs and 44 are novel miRNAs. We identified seven miRNA clusters in M. incognita genome. Four of the seven clusters, miR-100/let-7, miR-71-1/miR-2a-1, miR-71-2/miR-2a-2 and miR-279/miR-2b are conserved in other species. We validated the expressions of 5 M. incognita microRNAs, including 3 known microRNAs (miR-71, miR-100b and let-7 and 2 novel microRNAs (NOVEL-1 and NOVEL-2, using RT-PCR. We can detect all 5 microRNAs. The expression levels of four microRNAs obtained using RT-PCR were consistent with those obtained by high-throughput sequencing except for those of let-7. We also examined how M. incognita miRNAs are conserved in four other nematodes species: C. elegans, A. suum, B. malayi and P. pacificus. We found that four microRNAs, miR-100, miR-92, miR-279 and miR-137, exist only in genomes of parasitic nematodes, but do not exist in the genomes of the free living nematode C. elegans. Our research created a unique resource for the research of plant parasitic nematodes. The candidate microRNAs could help elucidate the genomic structure, gene regulation, evolutionary processes, and developmental features of plant parasitic nematodes and nematode-plant interaction.

  17. Complete nucleotide sequences and construction of full-length infectious cDNA clones of cucumber green mottle mosaic virus (CGMMV) in a versatile newly developed binary vector including both 35S and T7 promoters.

    Science.gov (United States)

    Park, Chan-Hwan; Ju, Hye-Kyoung; Han, Jae-Yeong; Park, Jong-Seo; Kim, Ik-Hyun; Seo, Eun-Young; Kim, Jung-Kyu; Hammond, John; Lim, Hyoun-Sub

    2017-04-01

    Seed-transmitted viruses have caused significant damage to watermelon crops in Korea in recent years, with cucumber green mottle mosaic virus (CGMMV) infection widespread as a result of infected seed lots. To determine the likely origin of CGMMV infection, we collected CGMMV isolates from watermelon and melon fields and generated full-length infectious cDNA clones. The full-length cDNAs were cloned into newly constructed binary vector pJY, which includes both the 35S and T7 promoters for versatile usage (agroinfiltration and in vitro RNA transcription) and a modified hepatitis delta virus ribozyme sequence to precisely cleave RNA transcripts at the 3' end of the tobamovirus genome. Three CGMMV isolates (OMpj, Wpj, and Mpj) were separately evaluated for infectivity in Nicotiana benthamiana, demonstrated by either Agroinfiltration or inoculation with in vitro RNA transcripts. CGMMV nucleotide identities to other tobamoviruses were calculated from pairwise alignments using DNAMAN. CGMMV identities were 49.89% to tobacco mosaic virus; 49.85% to pepper mild mottle virus; 50.47% to tomato mosaic virus; 60.9% to zucchini green mottle mosaic virus; and 60.96% to kyuri green mottle mosaic virus, confirming that CGMMV is a distinct species most similar to other cucurbit-infecting tobamoviruses. We further performed phylogenetic analysis to determine relationships of our new Korean CGMMV isolates to previously characterized isolates from Canada, China, India, Israel, Japan, Korea, Russia, Spain, and Taiwan available from NCBI. Analysis of CGMMV amino acid sequences showed three major clades, broadly typified as 'Russian,' 'Israeli,' and 'Asian' groups. All of our new Korean isolates fell within the 'Asian' clade. Neither the 128 nor 186 kDa RdRps of the three new isolates showed any detectable gene silencing suppressor function.

  18. Identification of miRNAs involved in fruit ripening in Cavendish bananas by deep sequencing.

    Science.gov (United States)

    Bi, Fangcheng; Meng, Xiangchun; Ma, Chao; Yi, Ganjun

    2015-10-13

    MicroRNAs (miRNAs) are a family of non-coding small RNAs that play an important regulatory role in various biological processes. Previous studies have reported that miRNAs are closely related to the ripening process in model plants. However, the miRNAs that are closely involved in the banana fruit ripening process remain unknown. Here, we investigated the miRNA populations from banana fruits in response to ethylene or 1-MCP treatment using a deep sequencing approach and bioinformatics analysis combined with quantitative RT-PCR validation. A total of 125 known miRNAs and 26 novel miRNAs were identified from three libraries. MiRNA profiling of bananas in response to ethylene treatment compared with 1-MCP treatment showed differential expression of 82 miRNAs. Furthermore, the differentially expressed miRNAs were predicted to target a total of 815 target genes. Interestingly, some targets were annotated as transcription factors and other functional proteins closely involved in the development and the ripening process in other plant species. Analysis by qRT-PCR validated the contrasting expression patterns between several miRNAs and their target genes. The miRNAome of the banana fruit in response to ethylene or 1-MCP treatment were identified by high-throughput sequencing. A total of 82 differentially expressed miRNAs were found to be closely associated with the ripening process. The miRNA target genes encode transcription factors and other functional proteins, including SPL, APETALA2, EIN3, E3 ubiquitin ligase, β-galactosidase, and β-glucosidase. These findings provide valuable information for further functional research of the miRNAs involved in banana fruit ripening.

  19. Deep sequencing of 71 candidate genes to characterize variation associated with alcohol dependence

    Science.gov (United States)

    Clark, Shaunna L.; Adkins, Daniel E.; Kumar, Gaurav; Aberg, Karolina A.; Nerella, Sri; Xie, Linying; Collins, Ann L.; Crowley, James J.; Quackenbush, Corey R.; Hilliard, Christopher E.; Shabalin, Andrey A.; Vrieze, Scott I.; Peterson, Roseann E.; Copeland, William E.; Silberg, Judy L.; McGue, Matt; Maes, Hermine; Iacono, William G.; Sullivan, Patrick F.; Costello, Elizabeth J.; van den Oord, Edwin J.

    2017-01-01

    Background Previous genome-wide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. Methods We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminegic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism and excretion of drugs. We performed single locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. Results No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value < 0.10 level: a genic enhancer for ADHFE1(p=1.47×10−05; q=0.019), an alcohol dehydrogenase, and ADORA1(p=5.29×10−05;q=0.035), an adenosine receptor that belongs to a G-protein coupled receptor gene family. Conclusions To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. PMID:28196272

  20. Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae

    Directory of Open Access Journals (Sweden)

    Clark Virginia L

    2011-01-01

    Full Text Available Abstract Background Maintenance of an anaerobic denitrification system in the obligate human pathogen, Neisseria gonorrhoeae, suggests that an anaerobic lifestyle may be important during the course of infection. Furthermore, mounting evidence suggests that reduction of host-produced nitric oxide has several immunomodulary effects on the host. However, at this point there have been no studies analyzing the complete gonococcal transcriptome response to anaerobiosis. Here we performed deep sequencing to compare the gonococcal transcriptomes of aerobically and anaerobically grown cells. Using the information derived from this sequencing, we discuss the implications of the robust transcriptional response to anaerobic growth. Results We determined that 198 chromosomal genes were differentially expressed (~10% of the genome in response to anaerobic conditions. We also observed a large induction of genes encoded within the cryptic plasmid, pJD1. Validation of RNA-seq data using translational-lacZ fusions or RT-PCR demonstrated the RNA-seq results to be very reproducible. Surprisingly, many genes of prophage origin were induced anaerobically, as well as several transcriptional regulators previously unknown to be involved in anaerobic growth. We also confirmed expression and regulation of a small RNA, likely a functional equivalent of fnrS in the Enterobacteriaceae family. We also determined that many genes found to be responsive to anaerobiosis have also been shown to be responsive to iron and/or oxidative stress. Conclusions Gonococci will be subject to many forms of environmental stress, including oxygen-limitation, during the course of infection. Here we determined that the anaerobic stimulon in gonococci was larger than previous studies would suggest. Many new targets for future research have been uncovered, and the results derived from this study may have helped to elucidate factors or mechanisms of virulence that may have otherwise been overlooked.

  1. Multiple platform assessment of the EGF dependent transcriptome by microarray and deep tag sequencing analysis

    Science.gov (United States)

    2011-01-01

    Background Epidermal Growth Factor (EGF) is a key regulatory growth factor activating many processes relevant to normal development and disease, affecting cell proliferation and survival. Here we use a combined approach to study the EGF dependent transcriptome of HeLa cells by using multiple long oligonucleotide based microarray platforms (from Agilent, Operon, and Illumina) in combination with digital gene expression profiling (DGE) with the Illumina Genome Analyzer. Results By applying a procedure for cross-platform data meta-analysis based on RankProd and GlobalAncova tests, we establish a well validated gene set with transcript levels altered after EGF treatment. We use this robust gene list to build higher order networks of gene interaction by interconnecting associated networks, supporting and extending the important role of the EGF signaling pathway in cancer. In addition, we find an entirely new set of genes previously unrelated to the currently accepted EGF associated cellular functions. Conclusions We propose that the use of global genomic cross-validation derived from high content technologies (microarrays or deep sequencing) can be used to generate more reliable datasets. This approach should help to improve the confidence of downstream in silico functional inference analyses based on high content data. PMID:21699700

  2. Deep sequencing reveals unique small RNA repertoire that is regulated during head regeneration in Hydra magnipapillata.

    Science.gov (United States)

    Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda

    2013-01-07

    Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem-loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping-pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration.

  3. Genomic region operation kit for flexible processing of deep sequencing data.

    Science.gov (United States)

    Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

    2013-01-01

    Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.

  4. Normalizing cDNA libraries.

    Science.gov (United States)

    Bogdanov, Ekaterina A; Shagina, Irina; Barsova, Ekaterina V; Kelmanson, Ilya; Shagin, Dmitry A; Lukyanov, Sergey A

    2010-04-01

    The characterization of rare messages in cDNA libraries is complicated by the substantial variations that exist in the abundance levels of different transcripts in cells and tissues. The equalization (normalization) of cDNA is a helpful approach for decreasing the prevalence of abundant transcripts, thereby facilitating the assessment of rare transcripts. This unit provides a method for duplex-specific nuclease (DSN)-based normalization, which allows for the fast and reliable equalization of cDNA, thereby facilitating the generation of normalized, full-length-enriched cDNA libraries, and enabling efficient RNA analyses. (c) 2010 by John Wiley & Sons, Inc.

  5. High-Throughput Plasmid cDNA Library Screening

    Energy Technology Data Exchange (ETDEWEB)

    Wan, Kenneth H.; Yu, Charles; George, Reed A.; Carlson, JosephW.; Hoskins, Roger A.; Svirskas, Robert; Stapleton, Mark; Celniker, SusanE.

    2006-05-24

    Libraries of cDNA clones are valuable resources foranalysing the expression, structure, and regulation of genes, as well asfor studying protein functions and interactions. Full-length cDNA clonesprovide information about intron and exon structures, splice junctionsand 5'- and 3'-untranslated regions (UTRs). Open reading frames (ORFs)derived from cDNA clones can be used to generate constructs allowingexpression of native proteins and N- or C-terminally tagged proteins.Thus, obtaining full-length cDNA clones and sequences for most or allgenes in an organism is critical for understanding genome functions.Expressed sequence tag (EST) sequencing samples cDNA libraries at random,which is most useful at the beginning of large-scale screening projects.However, as projects progress towards completion, the probability ofidentifying unique cDNAs via EST sequencing diminishes, resulting in poorrecovery of rare transcripts. We describe an adapted, high-throughputprotocol intended for recovery of specific, full-length clones fromplasmid cDNA libraries in five days.

  6. Genome-wide identification of Schistosoma japonicum microRNAs using a deep-sequencing approach.

    Directory of Open Access Journals (Sweden)

    Jian Huang

    Full Text Available BACKGROUND: Human schistosomiasis is one of the most prevalent and serious parasitic diseases worldwide. Schistosoma japonicum is one of important pathogens of this disease. MicroRNAs (miRNAs are a large group of non-coding RNAs that play important roles in regulating gene expression and protein translation in animals. Genome-wide identification of miRNAs in a given organism is a critical step to facilitating our understanding of genome organization, genome biology, evolution, and posttranscriptional regulation. METHODOLOGY/PRINCIPAL FINDINGS: We sequenced two small RNA libraries prepared from different stages of the life cycle of S. japonicum, immature schistosomula and mature pairing adults, through a deep DNA sequencing approach, which yielded approximately 12 million high-quality short sequence reads containing a total of approximately 2 million non-redundant tags. Based on a bioinformatics pipeline, we identified 176 new S. japonicum miRNAs, of which some exhibited a differential pattern of expression between the two stages. Although 21 S. japonicum miRNAs are orthologs of known miRNAs within the metazoans, some nucleotides at many positions of Schistosoma miRNAs, such as miR-8, let-7, miR-10, miR-31, miR-92, miR-124, and miR-125, are indeed significantly distinct from other bilaterian orthologs. In addition, both miR-71 and some miR-2 family members in tandem are found to be clustered in a reversal direction model on two genomic loci, and two pairs of novel S. japonicum miRNAs were derived from sense and antisense DNA strands at the same genomic loci. CONCLUSIONS/SIGNIFICANCE: The collection of S. japonicum miRNAs could be used as a new platform to study the genomic structure, gene regulation and networks, evolutionary processes, development, and host-parasite interactions. Some S. japonicum miRNAs and their clusters could represent the ancestral forms of the conserved orthologues and a model for the genesis of novel miRNAs.

  7. High-throughput verification of transcriptional starting sites by Deep-RACE

    DEFF Research Database (Denmark)

    Olivarius, Signe; Plessy, Charles; Carninci, Piero

    2009-01-01

    We present a high-throughput method for investigating the transcriptional starting sites of genes of interest, which we named Deep-RACE (Deep–rapid amplification of cDNA ends). Taking advantage of the latest sequencing technology, it allows the parallel analysis of multiple genes and is free of t...

  8. Screening of cDNA libraries on glass slide microarrays.

    Science.gov (United States)

    Berger, Dave K; Crampton, Bridget G; Hein, Ingo; Vos, Wiesner

    2007-01-01

    A quantitative screening method was developed to evaluate the quality of cDNA libraries constructed by suppression subtraction hybridization (SSH) or other enrichment techniques. The SSH technique was adapted to facilitate screening of the resultant library on a small number of glass slide microarrays. A simple data analysis pipeline named SSHscreen using "linear models for microarray data" (limma) functions in the R computing environment was developed to identify clones in the cDNA libraries that are significantly differentially expressed, and to determine if they were rare or abundant in the original treated sample. This approach facilitates the choice of clones from the cDNA library for further analysis, such as DNA sequencing, Northern blotting, RT-PCR, or detailed expression profiling using a custom cDNA microarray. Furthermore, this strategy is particularly useful for studies of nonmodel organisms for which there is little genome sequence information.

  9. [cDNA library construction from panicle meristem of finger millet].

    Science.gov (United States)

    Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B

    2014-01-01

    The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.

  10. Microbial Dark Matter: Unusual intervening sequences in 16S rRNA genes of candidate phyla from the deep subsurface

    Energy Technology Data Exchange (ETDEWEB)

    Jarett, Jessica; Stepanauskas, Ramunas; Kieft, Thomas; Onstott, Tullis; Woyke, Tanja

    2014-03-17

    The Microbial Dark Matter project has sequenced genomes from over 200 single cells from candidate phyla, greatly expanding our knowledge of the ecology, inferred metabolism, and evolution of these widely distributed, yet poorly understood lineages. The second phase of this project aims to sequence an additional 800 single cells from known as well as potentially novel candidate phyla derived from a variety of environments. In order to identify whole genome amplified single cells, screening based on phylogenetic placement of 16S rRNA gene sequences is being conducted. Briefly, derived 16S rRNA gene sequences are aligned to a custom version of the Greengenes reference database and added to a reference tree in ARB using parsimony. In multiple samples from deep subsurface habitats but not from other habitats, a large number of sequences proved difficult to align and therefore to place in the tree. Based on comparisons to reference sequences and structural alignments using SSU-ALIGN, many of these ?difficult? sequences appear to originate from candidate phyla, and contain intervening sequences (IVSs) within the 16S rRNA genes. These IVSs are short (39 - 79 nt) and do not appear to be self-splicing or to contain open reading frames. IVSs were found in the loop regions of stem-loop structures in several different taxonomic groups. Phylogenetic placement of sequences is strongly affected by IVSs; two out of three groups investigated were classified as different phyla after their removal. Based on data from samples screened in this project, IVSs appear to be more common in microbes occurring in deep subsurface habitats, although the reasons for this remain elusive.

  11. Mitochondrial matR sequences help to resolve deep phylogenetic relationships in rosids

    Directory of Open Access Journals (Sweden)

    Dilcher David L

    2007-11-01

    Full Text Available Abstract Background Rosids are a major clade in the angiosperms containing 13 orders and about one-third of angiosperm species. Recent molecular analyses recognized two major groups (i.e., fabids with seven orders and malvids with three orders. However, phylogenetic relationships within the two groups and among fabids, malvids, and potentially basal rosids including Geraniales, Myrtales, and Crossosomatales remain to be resolved with more data and a broader taxon sampling. In this study, we obtained DNA sequences of the mitochondrial matR gene from 174 species representing 72 families of putative rosids and examined phylogenetic relationships and phylogenetic utility of matR in rosids. We also inferred phylogenetic relationships within the "rosid clade" based on a combined data set of 91 taxa and four genes including matR, two plastid genes (rbcL, atpB, and one nuclear gene (18S rDNA. Results Comparison of mitochondrial matR and two plastid genes (rbcL and atpB showed that the synonymous substitution rate in matR was approximately four times slower than those of rbcL and atpB; however, the nonsynonymous substitution rate in matR was relatively high, close to its synonymous substitution rate, indicating that the matR has experienced a relaxed evolutionary history. Analyses of our matR sequences supported the monophyly of malvids and most orders of the rosids. However, fabids did not form a clade; instead, the COM clade of fabids (Celastrales, Oxalidales, Malpighiales, and Huaceae was sister to malvids. Analyses of the four-gene data set suggested that Geraniales and Myrtales were successively sister to other rosids, and that Crossosomatales were sister to malvids. Conclusion Compared to plastid genes such as rbcL and atpB, slowly evolving matR produced less homoplasious but not less informative substitutions. Thus, matR appears useful in higher-level angiosperm phylogenetics. Analysis of matR alone identified a novel deep relationship within

  12. Deep sequencing reveals a novel closterovirus associated with wild rose leaf rosette disease.

    Science.gov (United States)

    He, Yan; Yang, Zuokun; Hong, Ni; Wang, Guoping; Ning, Guogui; Xu, Wenxing

    2015-06-01

    A bizarre virus-like symptom of a leaf rosette formed by dense small leaves on branches of wild roses (Rosa multiflora Thunb.), designated as 'wild rose leaf rosette disease' (WRLRD), was observed in China. To investigate the presumed causal virus, a wild rose sample affected by WRLRD was subjected to deep sequencing of small interfering RNAs (siRNAs) for a complete survey of the infecting viruses and viroids. The assembly of siRNAs led to the reconstruction of the complete genomes of three known viruses, namely Apple stem grooving virus (ASGV), Blackberry chlorotic ringspot virus (BCRV) and Prunus necrotic ringspot virus (PNRSV), and of a novel virus provisionally named 'rose leaf rosette-associated virus' (RLRaV). Phylogenetic analysis clearly placed RLRaV alongside members of the genus Closterovirus, family Closteroviridae. Genome organization of RLRaV RNA (17,653 nucleotides) showed 13 open reading frames (ORFs), except ORF1 and the quintuple gene block, most of which showed no significant similarities with known viral proteins, but, instead, had detectable identities to fungal or bacterial proteins. Additional novel molecular features indicated that RLRaV seems to be the most complex virus among the known genus members. To our knowledge, this is the first report of WRLRD and its associated closterovirus, as well as two ilarviruses and one capilovirus, infecting wild roses. Our findings present novel information about the closterovirus and the aetiology of this rose disease which should facilitate its control. More importantly, the novel features of RLRaV help to clarify the molecular and evolutionary features of the closterovirus. © 2014 BSPP AND JOHN WILEY & SONS LTD.

  13. The Ebola virus VP35 protein binds viral immunostimulatory and host RNAs identified through deep sequencing.

    Directory of Open Access Journals (Sweden)

    Kari A Dilley

    Full Text Available Ebola virus and Marburg virus are members of the Filovirdae family and causative agents of hemorrhagic fever with high fatality rates in humans. Filovirus virulence is partially attributed to the VP35 protein, a well-characterized inhibitor of the RIG-I-like receptor pathway that triggers the antiviral interferon (IFN response. Prior work demonstrates the ability of VP35 to block potent RIG-I activators, such as Sendai virus (SeV, and this IFN-antagonist activity is directly correlated with its ability to bind RNA. Several structural studies demonstrate that VP35 binds short synthetic dsRNAs; yet, there are no data that identify viral immunostimulatory RNAs (isRNA or host RNAs bound to VP35 in cells. Utilizing a SeV infection model, we demonstrate that both viral isRNA and host RNAs are bound to Ebola and Marburg VP35s in cells. By deep sequencing the purified VP35-bound RNA, we identified the SeV copy-back defective interfering (DI RNA, previously identified as a robust RIG-I activator, as the isRNA bound by multiple filovirus VP35 proteins, including the VP35 protein from the West African outbreak strain (Makona EBOV. Moreover, RNAs isolated from a VP35 RNA-binding mutant were not immunostimulatory and did not include the SeV DI RNA. Strikingly, an analysis of host RNAs bound by wild-type, but not mutant, VP35 revealed that select host RNAs are preferentially bound by VP35 in cell culture. Taken together, these data support a model in which VP35 sequesters isRNA in virus-infected cells to avert RIG-I like receptor (RLR activation.

  14. Cloning of the cDNA for human 12-lipoxygenase

    International Nuclear Information System (INIS)

    Izumi, T.; Hoshiko, S.; Radmark, O.; Samuelsson, B.

    1990-01-01

    A full-length cDNA clone encoding 12-lipoxygenase was isolated from a human platelet cDNA library by using a cDNA for human reticulocyte 15-lipoxygenase as probe for the initial screening. The cDNA had an open reading frame encoding 662 amino acid residues with a calculated molecular weight of 75,590. Three independent clones revealed minor heterogeneities in their DNA sequences. Thus, in three positions of the deduced amino acid sequence, there is a choice between two different amino acids. The deduced sequence from the clone plT3 showed 65% identity with human reticulocyte 15-lipoxygenase and 42% identity with human leukocyte 5-lipoxygenase. The 12-lipoxygenase cDNA recognized a 3.0-kilobase mRNA species in platelets and human erythroleukemia cells (HEL cells). Phorbol 12-tetradecanoyl 13-acetate induced megakaryocytic differentiation of HEL cells and 12-lipoxygenase activity and increased mRNA for 12-lipoxygenase. The identity of the cloned 12-lipoxygenase was assured by expression in a mammalian cell line (COS cells). Human platelet 12-lipoxygenase has been difficult to purify to homogeneity. The cloning of this cDNA will increase the possibilities to elucidate the structure and function of this enzyme

  15. Soluble forms of tumor necrosis factor receptors (TNF-Rs). The cDNA for the type I TNF-R, cloned using amino acid sequence data of its soluble form, encodes both the cell surface and a soluble form of the receptor

    DEFF Research Database (Denmark)

    Nophar, Y; Kemper, O; Brakebusch, C

    1990-01-01

    the extracellular domain of the type I TNF-R matches the COOH-terminal sequence of TBPI. Amino acid sequences in the extracellular domain also fully match other sequences found in TBPI. On the other hand, amino acid sequences in the soluble form of the type II TNF-R (TBPII), while indicating a marked homology...... found to have effects characteristic of TNF, including stimulating phosphorylation of specific cellular proteins. Oligonucleotide probes designed on the basis of the NH2-terminal amino acid sequence of TBPI were used to clone the cDNA for the structurally related cell surface type 1 TNF-R. It is notable...... that although this receptor can signal the phosphorylation of cellular proteins, it appears from its amino acid sequence to be devoid of intrinsic protein kinase activity. The extracellular domain of the receptor is composed of four internal cysteine-rich repeats, homologous to structures repeated four times...

  16. The subclonal structure and genomic evolution of oral squamous cell carcinoma revealed by ultra-deep sequencing

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin J

    2017-01-01

    Recent studies suggest that head and neck squamous cell carcinomas are very heterogeneous between patients; however the subclonal structure remains unexplored mainly due to studies using only a single biopsy per patient. To deconvolutethe clonal structure and describe the genomic cancer evolution......, we applied whole-exome sequencing combined with ultra-deep targeted sequencing on oral squamous cell carcinomas (OSCC). From each patient, a set of biopsies was sampled from distinct geographical sites in primary tumor and lymph node metastasis.We demonstrate that the included OSCCs show a high...

  17. Constructing and detecting a cDNA library for mites.

    Science.gov (United States)

    Hu, Li; Zhao, YaE; Cheng, Juan; Yang, YuanJun; Li, Chen; Lu, ZhaoHui

    2015-10-01

    RNA extraction and construction of complementary DNA (cDNA) library for mites have been quite challenging due to difficulties in acquiring tiny living mites and breaking their hard chitin. The present study is to explore a better method to construct cDNA library for mites that will lay the foundation on transcriptome and molecular pathogenesis research. We selected Psoroptes cuniculi as an experimental subject and took the following steps to construct and verify cDNA library. First, we combined liquid nitrogen grinding with TRIzol for total RNA extraction. Then, switching mechanism at 5' end of the RNA transcript (SMART) technique was used to construct full-length cDNA library. To evaluate the quality of cDNA library, the library titer and recombination rate were calculated. The reliability of cDNA library was detected by sequencing and analyzing positive clones and genes amplified by specific primers. The results showed that the RNA concentration was 836 ng/μl and the absorbance ratio at 260/280 nm was 1.82. The library titer was 5.31 × 10(5) plaque-forming unit (PFU)/ml and the recombination rate was 98.21%, indicating that the library was of good quality. In the 33 expressed sequence tags (ESTs) of P. cuniculi, two clones of 1656 and 1658 bp were almost identical with only three variable sites detected, which had an identity of 99.63% with that of Psoroptes ovis, indicating that the cDNA library was reliable. Further detection by specific primers demonstrated that the 553-bp Pso c II gene sequences of P. cuniculi had an identity of 98.56% with those of P. ovis, confirming that the cDNA library was not only reliable but also feasible.

  18. Development of a simple and powerful method, cDNA AFLP-SSPAG ...

    African Journals Online (AJOL)

    Differential cDNAs were easily obtained from silver stained cDNA-AFLP separated on polyacylamide gels. The cDNA was then reamplified, cloned and fragments were sequenced. Sequenced clones were used as probes in northern dot blot analyses and library screening. Full-length cDNA was cloned from a library ...

  19. cDNA library preparation.

    Science.gov (United States)

    Kooiker, Maarten; Xue, Gang-Ping

    2014-01-01

    The construction of full-length cDNA libraries allows researchers to study gene expression and protein interactions and undertake gene discovery. Recent improvements allow the construction of high-quality cDNA libraries, with small amounts of mRNA. In parallel, these improvements allow for the incorporation of adapters into the cDNA, both at the 5' and 3' end of the cDNA. The 3' adapter is attached to the oligo-dT primer that is used by the reverse transcriptase, whereas the 5' adapter is incorporated by the template switching properties of the MMLV reverse transcriptase. This allows directional cloning and eliminates inefficient steps like adapter ligation, phosphorylation, and methylation. Another important step in the construction of high-quality cDNA libraries is the normalization. The difference in the levels of expression between genes might be several orders of magnitude. Therefore, it is essential that the cDNA library is normalized. With a recently discovered enzyme, duplex-specific nuclease, it is possible to normalize the cDNA library, based on the fact that more abundant molecules are more likely to reanneal after denaturation compared to rare molecules.

  20. Distinctive Drug-resistant Mutation Profiles and Interpretations of HIV-1 Proviral DNA Revealed by Deep Sequencing in Reverse Transcriptase.

    Science.gov (United States)

    Yin, Qian Qian; Li, Zhen Peng; Zhao, Hai; Pan, Dong; Wang, Yan; Xu, Wei Si; Xing, Hui; Feng, Yi; Jiang, Shi Bo; Shao, Yi Ming; Ma, Li Ying

    2016-04-01

    To investigate distinctive features in drug-resistant mutations (DRMs) and interpretations for reverse transcriptase inhibitors (RTIs) between proviral DNA and paired viral RNA in HIV-1-infected patients. Forty-three HIV-1-infected individuals receiving first-line antiretroviral therapy were recruited to participate in a multicenter AIDS Cohort Study in Anhui and Henan Provinces in China in 2004. Drug resistance genotyping was performed by bulk sequencing and deep sequencing on the plasma and whole blood of 77 samples, respectively. Drug-resistance interpretation was compared between viral RNA and paired proviral DNA. Compared with bulk sequencing, deep sequencing could detect more DRMs and samples with DRMs in both viral RNA and proviral DNA. The mutations M184I and M230I were more prevalent in proviral DNA than in viral RNA (Fisher's exact test, PDNA, and 5 of these samples with different DRMs between proviral DNA and paired viral RNA showed a higher level of drug resistance to the first-line drugs. Considering 'minority resistant variants', 22 samples (28.57%) were associated with a higher level of drug resistance to the tested RTIs for proviral DNA when compared with paired viral RNA. Compared with viral RNA, the distinctive information of DRMs and drug resistance interpretations for proviral DNA could be obtained by deep sequencing, which could provide more detailed and precise information for drug resistance monitoring and the rational design of optimal antiretroviral therapy regimens. Copyright © 2016 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  1. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes.

    Science.gov (United States)

    Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai; Tian, Bin

    2018-01-04

    PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3' region extraction and deep sequencing (3'READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3' ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. A Systematic Assessment of Accuracy in Detecting Somatic Mosaic Variants by Deep Amplicon Sequencing: Application to NF2 Gene.

    Directory of Open Access Journals (Sweden)

    Elisa Contini

    Full Text Available The accurate detection of low-allelic variants is still challenging, particularly for the identification of somatic mosaicism, where matched control sample is not available. High throughput sequencing, by the simultaneous and independent analysis of thousands of different DNA fragments, might overcome many of the limits of traditional methods, greatly increasing the sensitivity. However, it is necessary to take into account the high number of false positives that may arise due to the lack of matched control samples. Here, we applied deep amplicon sequencing to the analysis of samples with known genotype and variant allele fraction (VAF followed by a tailored statistical analysis. This method allowed to define a minimum value of VAF for detecting mosaic variants with high accuracy. Then, we exploited the estimated VAF to select candidate alterations in NF2 gene in 34 samples with unknown genotype (30 blood and 4 tumor DNAs, demonstrating the suitability of our method. The strategy we propose optimizes the use of deep amplicon sequencing for the identification of low abundance variants. Moreover, our method can be applied to different high throughput sequencing approaches to estimate the background noise and define the accuracy of the experimental design.

  3. An introduction to Deep learning on biological sequence data - Examples and solutions

    DEFF Research Database (Denmark)

    Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten

    2017-01-01

    Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use...... libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology....... Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively...

  4. Method for construction of normalized cDNA libraries

    Science.gov (United States)

    Soares, Marcelo B.; Efstratiadis, Argiris

    1998-01-01

    This invention provides a method to normalize a directional cDNA library constructed in a vector that allows propagation in single-stranded circle form comprising: (a) propagating the directional cDNA library in single-stranded circles; (b) generating fragments complementary to the 3' noncoding sequence of the single-stranded circles in the library to produce partial duplexes; (c) purifying the partial duplexes; (d) melting and reassociating the purified partial duplexes to appropriate Cot; and (e) purifying the unassociated single-stranded circles, thereby generating a normalized cDNA library. This invention also provides normalized cDNA libraries generated by the above-described method and uses of the generated libraries.

  5. Identification and characterization of a maize-associated mastrevirus in China by deep sequencing small RNA populations.

    Science.gov (United States)

    Chen, Sha; Huang, Qingqing; Wu, Liqi; Qian, Yajuan

    2015-10-05

    Maize streak Reunion virus (MSRV) is a member of the Mastrevirus genus in the family Geminiviridae. Of the diverse and increasing number of mastrevirus species found so far, only Wheat dwarf virus and Sweetpotato symptomless virus 1 have been discovered in China. Recently, a novel, unbiased approach based on deep sequencing of small interfering RNAs followed by de novo assembly of siRNA, has greatly offered opportunities for plant virus identification. Samples collected from maize leaves was deep sequencing for virus identification. Subsequently, the assay of PCR, rolling circle amplification and Southern blot were used to confirm the presence of a mastrevirus. Maize streak Reunion virus Yunnan isolate (MSRV-[China:Yunnan 06:2014], abbreviated to MSRV-YN) was identified from maize collected from Yunnan Province, China, by small RNA deep sequencing. The complete genome of this virus was ascertained as 2,880 nucleotides long by conventional sequencing. A phylogenetic analysis showed it shared 96.3 % nucleotide sequence identity with the isolate of Maize streak Reunion virus from La Reunion Island. To our knowledge, this is the first identification of MSRV in China. Analyses of the viral derived small interfering RNAs (vsiRNAs) profile showed that the most abundant MSRV-YN vsiRNAs were 21, 22 and 24 nt long and biased for A and G at their 5' terminal residue. There was a slightly higher representation of MSRV-YN siRNAs derived from the virion-sense strand genome than the complementary-sense strand genome. Moreover, MSRV-YN vsiRNAs were not uniformly distributed along the genome, and hotspots were detected in the movement protein and coat protein-coding region. A mastrevirus MSRV-YN collected in Yunnan Province, China, was identified by small RNA deep sequencing. This vsiRNAs profile derived from MSRV-YN was characterized, which might contribute to get an insight into the host RNA silencing defense induced by MSRV-YN, and provide guidelines on designing antiviral

  6. The Subclonal Structure and Genomic Evolution of Oral Squamous Cell Carcinoma Revealed by Ultra-deep Sequencing

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin Jakob

    Background: Oral squamous cell carcinoma (OSCC), a subgroup of head and neck squamous cell carcinoma (HNSCC), is primarily caused by alcohol consumption and tobacco use. Recent DNA sequencing studies suggests that HNSCC are very heterogeneous between patients; however the intra-patient subclonal...... structure remains unexplored due to lack of sampling multiple tumor biopsies from each patient. Materials and methods: To examine the clonal structure and describe the genomic cancer evolution we applied whole-exome sequencing combined with targeted ultra-deep targeted sequencing on biopsies from 5stage IV...... of unprecedented high resolution enabling clear detection of subclonal structure and observation of otherwise undetectable mutations. Furthermore, we demonstrate that OSCC show a high degree of inter-patient heterogeneity but a low degree of intra-patient/tumor heterogeneity. However, some OSCC cancers contain...

  7. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

    OpenAIRE

    Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon

    2007-01-01

    A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequenc...

  8. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Directory of Open Access Journals (Sweden)

    Jonathan Z Li

    Full Text Available The impact of raltegravir-resistant HIV-1 minority variants (MVs on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs.A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser.Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001. Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454.In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  9. High-resolution deep sequencing reveals biodiversity, population structure, and persistence of HIV-1 quasispecies within host ecosystems

    Directory of Open Access Journals (Sweden)

    Yin Li

    2012-12-01

    Full Text Available Abstract Background Deep sequencing provides the basis for analysis of biodiversity of taxonomically similar organisms in an environment. While extensively applied to microbiome studies, population genetics studies of viruses are limited. To define the scope of HIV-1 population biodiversity within infected individuals, a suite of phylogenetic and population genetic algorithms was applied to HIV-1 envelope hypervariable domain 3 (Env V3 within peripheral blood mononuclear cells from a group of perinatally HIV-1 subtype B infected, therapy-naïve children. Results Biodiversity of HIV-1 Env V3 quasispecies ranged from about 70 to 270 unique sequence clusters across individuals. Viral population structure was organized into a limited number of clusters that included the dominant variants combined with multiple clusters of low frequency variants. Next generation viral quasispecies evolved from low frequency variants at earlier time points through multiple non-synonymous changes in lineages within the evolutionary landscape. Minor V3 variants detected as long as four years after infection co-localized in phylogenetic reconstructions with early transmitting viruses or with subsequent plasma virus circulating two years later. Conclusions Deep sequencing defines HIV-1 population complexity and structure, reveals the ebb and flow of dominant and rare viral variants in the host ecosystem, and identifies an evolutionary record of low-frequency cell-associated viral V3 variants that persist for years. Bioinformatics pipeline developed for HIV-1 can be applied for biodiversity studies of virome populations in human, animal, or plant ecosystems.

  10. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    Science.gov (United States)

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, PE138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  11. Genome-wide small nucleolar RNA expression analysis of lung cancer by next-generation deep sequencing.

    Science.gov (United States)

    Gao, Lu; Ma, Jie; Mannoor, Kaiissar; Guarnera, Maria A; Shetty, Amol; Zhan, Min; Xing, Lingxiao; Stass, Sanford A; Jiang, Feng

    2015-03-15

    Emerging evidence indicates that small nucleolar RNAs (snoRNAs), a class of small noncoding RNAs, may play important function in tumorigenesis. Nonsmall-cell lung cancer (NSCLC) is the number one cancer killer for men and women. Systematically characterizing snoRNAs in NSCLC will develop biomarkers for its early detection and prognostication. We used next-generation deep sequencing to comprehensively characterize snoRNA profiles in 12 NSCLC tissues. We used quantitative reverse transcription polymerase chain reaction (qRT-PCR) to verify the findings in 40 surgical Stage I NSCLC specimens and 126 frozen NSCLC tissues of different stages. The 126 NSCLC tissues were divided into a training set and a testing set. Deep sequencing identified 458 snoRNAs, of which, 29 had a ≥3.0-fold expression level change in Stage I NSCLC tissues versus normal tissues. qRT-PCR analysis showed that 16 of 29 snoRNAs exhibited consistent changes with deep sequencing data. The 16 snoRNAs exhibited 0.75-0.94 area under receiver-operator characteristic curve values in distinguishing lung tumor from normal lung tissues (all ≤0.0001) with 70.0-95.0% sensitivity and 70.0-95.0% specificity. Six genes (snoRA47, snoRA68, snoRA78, snoRA21, snoRD28 and snoRD66) were identified whose expressions were associated with overall survival of the NSCLC patients. A prediction model consisting of three genes (snoRA47, snoRA68 and snoRA78) was developed in the training set of 77 cases, which could significantly predict overall survival of the NSCLC patients (p < 0.0001). The prognostic performance of the prediction model was confirmed in the testing set of 49 NSCLC patients. The identified snoRNA signatures may provide potential biomarkers for the early detection and prognostication of NSCLC. © 2014 UICC.

  12. Deep Sequencing of Cell-Free Peripheral Blood DNA as a Reliable Method for Confirming the Diagnosis of Myelodysplastic Syndrome.

    Science.gov (United States)

    Albitar, Ferras; Ma, Wanlong; Diep, Kevin; De Dios, Ivan; Agersborg, Sally; Thangavelu, Maya; Brodie, Steve; Albitar, Maher

    2016-07-01

    Demonstrating the presence of myelodysplastic syndrome (MDS)-specific molecular abnormalities can aid in diagnosis and patient management. We explored the potential of using peripheral blood (PB) cell-free DNA (cf-DNA) and next-generation sequencing (NGS). We performed NGS on a panel of 14 target genes using total nucleic acid extracted from the plasma of 16 patients, all of whom had confirmed diagnoses for early MDS with blasts DNA from the same patients was sequenced using conventional Sanger sequencing and NGS. Deep sequencing of the cf-DNA identified one or more mutated gene(s), confirming the diagnosis of MDS in all cases. Five samples (31%) showed abnormalities in cf-DNA by NGS that were not detected by Sanger sequencing on cellular PB DNA. NGS of PB cell DNA showed the same findings as those of cf-DNA in four of five patients, but failed to show a mutation in the RUNX1 gene that was detected in one patient's cf-DNA. Mutant allele frequency was significantly higher in cf-DNA compared with cellular DNA (p = 0.008). These data suggest that cf-DNA when analyzed using NGS is a reliable approach for detecting molecular abnormalities in MDS and should be used to determine if bone marrow aspiration and biopsy are necessary.

  13. Quantitative deep sequencing reveals dynamic HIV-1 escape and large population shifts during CCR5 antagonist therapy in vivo.

    Directory of Open Access Journals (Sweden)

    Athe M N Tsibris

    2009-05-01

    Full Text Available High-throughput sequencing platforms provide an approach for detecting rare HIV-1 variants and documenting more fully quasispecies diversity. We applied this technology to the V3 loop-coding region of env in samples collected from 4 chronically HIV-infected subjects in whom CCR5 antagonist (vicriviroc [VVC] therapy failed. Between 25,000-140,000 amplified sequences were obtained per sample. Profound baseline V3 loop sequence heterogeneity existed; predicted CXCR4-using populations were identified in a largely CCR5-using population. The V3 loop forms associated with subsequent virologic failure, either through CXCR4 use or the emergence of high-level VVC resistance, were present as minor variants at 0.8-2.8% of baseline samples. Extreme, rapid shifts in population frequencies toward these forms occurred, and deep sequencing provided a detailed view of the rapid evolutionary impact of VVC selection. Greater V3 diversity was observed post-selection. This previously unreported degree of V3 loop sequence diversity has implications for viral pathogenesis, vaccine design, and the optimal use of HIV-1 CCR5 antagonists.

  14. Molecular cloning and mammalian expression of human beta 2-glycoprotein I cDNA

    DEFF Research Database (Denmark)

    Kristensen, Torsten; Schousboe, Inger; Boel, Espen

    1991-01-01

    Human β2-glycoprotein (β2gpI) cDNA was isolated from a liver cDNA library and sequenced. The cDNA encoded a 19-residue hydrophobic signal peptide followed by the mature β2gpI of 326 amino acid residues. In liver and in the hepatoma cell line HepG2 there are two mRNA species of about 1.4 and 4.3 kb...

  15. Simultaneous identification of DNA and RNA viruses present in pig faeces using process-controlled deep sequencing.

    Directory of Open Access Journals (Sweden)

    Jana Sachsenröder

    Full Text Available BACKGROUND: Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2 with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. RESULTS: The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9% and mammalian viruses (23.9%; 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV, represents a novel pig virus. CONCLUSION: The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures

  16. The natriuretic peptide/helokinestatin precursor from Mexican beaded lizard (Heloderma horridum) venom: Amino acid sequence deduced from cloned cDNA and identification of two novel encoded helokinestatins.

    Science.gov (United States)

    Ma, Chengbang; Yang, Mu; Zhou, Mei; Wu, Yuxin; Wang, Lei; Chen, Tianbao; Ding, Anwei; Shaw, Chris

    2011-06-01

    Natriuretic peptides are common components of reptile venoms and molecular cloning of their biosynthetic precursors has revealed that in snakes, they co-encode bradykinin-potentiating peptides and in venomous lizards, some co-encode bradykinin inhibitory peptides such as the helokinestatins. The common natriuretic peptide/helokinestatin precursor of the Gila Monster, Heloderma suspectum, encodes five helokinestatins of differing primary structures. Here we report the molecular cloning of a natriuretic peptide/helokinestatin precursor cDNA from a venom-derived cDNA library of the Mexican beaded lizard (Heloderma horridum). Deduction of the primary structure of the encoded precursor protein from this cloned cDNA template revealed that it consisted of 196 amino acid residues encoding a single natriuretic peptide and five helokinestatins. While the natriuretic peptide was of identical primary structure to its Gila Monster (H. suspectum) homolog, the encoded helokinestatins were not, with this region of the common precursor displaying some significant differences to its H. suspectum homolog. The helokinestatin-encoding region contained a single copy of helokinestatin-1, 2 copies of helokinestatin-3 and single copies of 2 novel peptides, (Phe)(5)-helokinestatin-2 (VPPAFVPLVPR) and helokinestatin-6 (GPPFNPPPFVDYEPR). All predicted peptides were found in reverse phase HPLC fractions of the same venom. Synthetic replicates of both novel helokinestatins were found to antagonize the relaxing effect of bradykinin on rat tail artery smooth muscle. Thus lizard venom continues to provide a source of novel biologically active peptides. Copyright © 2011. Published by Elsevier Inc.

  17. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    Science.gov (United States)

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species.

  18. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis

    Directory of Open Access Journals (Sweden)

    Qian Ding

    2015-01-01

    Full Text Available Simple sequence repeats (SSRs are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%, amplicons were successfully generated with high quality. Seventeen (89.5% showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  19. [Construction and identification of the expression library of album pollen allergens cDNA].

    Science.gov (United States)

    Zhang, Jie; Sun, Xiu-zhen; Yan, Hong; Zhang, Ni; Feng, Xiang-li

    2011-05-01

    To construct and identify the express library of album pollen allergens cDNA. Total RNA were extracted from the album pollen with TRIzol reagent and the mRNA was isolate for the amplify followed. A double stranded cDNA (ds cDNA) was synthesized using primers containing Xho I and Poly(dT) sequence by ZAP Express®cDNA synthesis kit. The ds cDNA was modified and purified by gel chromatography, and then the cDNA fragment with the length of more than 400 bp containing sticky ends was obtained. The cDNA fragment was ligated with Uni-ZAP XR vector and subsequently treated with in vitro packaging using phage by ZAP-cDNA express GigapackIII Gold cloning kit. The express library of album pollen cDNA was constructed by in vitro packaging. The recombination rate and the lengths of fragments inserted of the cDNA library were detected by polymerase chain reaction. The titer and the recombination rate of cDNA expression library constructed were 9.7×10(5) and 100%, respectively. The capacity of the library was 4.85 Pfu. The average length of cDNA fragments inserted was about 1.0 kb. Based on the capacity of cDNA expression library constructed and the length of cDNA insertion fragments, the cDNA expression library constructed is qualified to screening target cDNA clone, laying the foundation for preparation of gene recombinant allergen pollen vaccine.

  20. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.

    Science.gov (United States)

    Margulies, Elliott H; Cooper, Gregory M; Asimenos, George; Thomas, Daryl J; Dewey, Colin N; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B; Bickel, Peter; Holmes, Ian; Mullikin, James C; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A; Rosenbloom, Kate R; Kent, W James; Bouffard, Gerard G; Guan, Xiaobin; Hansen, Nancy F; Idol, Jacquelyn R; Maduro, Valerie V B; Maskeri, Baishali; McDowell, Jennifer C; Park, Morgan; Thomas, Pamela J; Young, Alice C; Blakesley, Robert W; Muzny, Donna M; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Jiang, Huaiyang; Weinstock, George M; Gibbs, Richard A; Graves, Tina; Fulton, Robert; Mardis, Elaine R; Wilson, Richard K; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B; Chang, Jean L; Lindblad-Toh, Kerstin; Lander, Eric S; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A; Moore, Richard A; Matthewson, Carrie A; Schein, Jacqueline E; Marra, Marco A; Antonarakis, Stylianos E; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D; Sidow, Arend

    2007-06-01

    A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.

  1. cDNA cloning, structural analysis, SNP detection and tissue ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 96; Issue 2. cDNA cloning, structural analysis, SNP detection and tissue ... Abstract. Insulin-like growth factor 1 (IGF1) plays an important role in growth, reproduction, foetal development and cell proliferation. The present study was conducted to clone and sequence the ...

  2. Geochemical features and effects on deep-seated fluids during the May-June 2012 southern Po Valley seismic sequence

    Directory of Open Access Journals (Sweden)

    Francesco Italiano

    2012-10-01

    Full Text Available A periodic sampling of the groundwaters and dissolved and free gases in selected deep wells located in the area affected by the May-June 2012 southern Po Valley seismic sequence has provided insight into seismogenic-induced changes of the local aquifer systems. The results obtained show progressive changes in the fluid geochemistry, allowing it to be established that deep-seated fluids were mobilized during the seismic sequence and reached surface layers along faults and fractures, which generated significant geochemical anomalies. The May-June 2012 seismic swarm (mainshock on May 29, 2012, M 5.8; 7 shocks M >5, about 200 events 3 > M > 5 induced several modifications in the circulating fluids. This study reports the preliminary results obtained for the geochemical features of the waters and gases collected over the epicentral area from boreholes drilled at different depths, thus intercepting water and gases with different origins and circulation. The aim of the investigations was to improve our knowledge of the fluids circulating over the seismic area (e.g. origin, provenance, interactions, mixing of different components, temporal changes. This was achieved by collecting samples from both shallow and deep-drilled boreholes, and then, after the selection of the relevant sites, we looked for temporal changes with mid-to-long-term monitoring activity following a constant sampling rate. This allowed us to gain better insight into the relationships between the fluid circulation and the faulting activity. The sampling sites are listed in Table 1, along with the analytical results of the gas phase. […

  3. A simple and novel method for RNA-seq library preparation of single cell cDNA analysis by hyperactive Tn5 transposase.

    Science.gov (United States)

    Brouilette, Scott; Kuersten, Scott; Mein, Charles; Bozek, Monika; Terry, Anna; Dias, Kerith-Rae; Bhaw-Rosun, Leena; Shintani, Yasunori; Coppen, Steven; Ikebe, Chiho; Sawhney, Vinit; Campbell, Niall; Kaneko, Masahiro; Tano, Nobuko; Ishida, Hidekazu; Suzuki, Ken; Yashiro, Kenta

    2012-10-01

    Deep sequencing of single cell-derived cDNAs offers novel insights into oncogenesis and embryogenesis. However, traditional library preparation for RNA-seq analysis requires multiple steps with consequent sample loss and stochastic variation at each step significantly affecting output. Thus, a simpler and better protocol is desirable. The recently developed hyperactive Tn5-mediated library preparation, which brings high quality libraries, is likely one of the solutions. Here, we tested the applicability of hyperactive Tn5-mediated library preparation to deep sequencing of single cell cDNA, optimized the protocol, and compared it with the conventional method based on sonication. This new technique does not require any expensive or special equipment, which secures wider availability. A library was constructed from only 100 ng of cDNA, which enables the saving of precious specimens. Only a few steps of robust enzymatic reaction resulted in saved time, enabling more specimens to be prepared at once, and with a more reproducible size distribution among the different specimens. The obtained RNA-seq results were comparable to the conventional method. Thus, this Tn5-mediated preparation is applicable for anyone who aims to carry out deep sequencing for single cell cDNAs. Copyright © 2012 Wiley Periodicals, Inc.

  4. Construction and analysis of a cDNA library from yellow-fruit ginseng

    African Journals Online (AJOL)

    The total RNA was isolated from yellow-fruit ginseng (Panax ginseng C.A. Meyer) leaf tissue. A cDNA library of panax ginseng leaves was constructed by using pDNR-LIB vector according to the SMART cDNA library construction kit protocol. We obtained 378 high quality sequences (GenBank accession number: ...

  5. Genomic and cDNA cloning of a novel mouse lipoxygenase gene

    NARCIS (Netherlands)

    Willems van Dijk, K.; Steketee, K.; Havekes, L.; Frants, R.; Hofker, M.

    1995-01-01

    A novel 12- and 15-lipoxygenase related gene was isolated from a mouse strain 129 genomic phage library in a screen with a human 15-lipoxygenase cDNA probe. The complete genomic sequence revealed 14 exons and 13 introns covering 7.3 kb of DNA. The splice junctions were verified from the cDNA

  6. Mitochondrial genome sequences reveal deep divergences among Anopheles punctulatus sibling species in Papua New Guinea

    Directory of Open Access Journals (Sweden)

    Logue Kyle

    2013-02-01

    Full Text Available Abstract Background Members of the Anopheles punctulatus group (AP group are the primary vectors of human malaria in Papua New Guinea. The AP group includes 13 sibling species, most of them morphologically indistinguishable. Understanding why only certain species are able to transmit malaria requires a better comprehension of their evolutionary history. In particular, understanding relationships and divergence times among Anopheles species may enable assessing how malaria-related traits (e.g. blood feeding behaviours, vector competence have evolved. Methods DNA sequences of 14 mitochondrial (mt genomes from five AP sibling species and two species of the Anopheles dirus complex of Southeast Asia were sequenced. DNA sequences from all concatenated protein coding genes (10,770 bp were then analysed using a Bayesian approach to reconstruct phylogenetic relationships and date the divergence of the AP sibling species. Results Phylogenetic reconstruction using the concatenated DNA sequence of all mitochondrial protein coding genes indicates that the ancestors of the AP group arrived in Papua New Guinea 25 to 54 million years ago and rapidly diverged to form the current sibling species. Conclusion Through evaluation of newly described mt genome sequences, this study has revealed a divergence among members of the AP group in Papua New Guinea that would significantly predate the arrival of humans in this region, 50 thousand years ago. The divergence observed among the mtDNA sequences studied here may have resulted from reproductive isolation during historical changes in sea-level through glacial minima and maxima. This leads to a hypothesis that the AP sibling species have evolved independently for potentially thousands of generations. This suggests that the evolution of many phenotypes, such as insecticide resistance will arise independently in each of the AP sibling species studied here.

  7. Hepatitis C virus deep sequencing for sub-genotype identification in mixed infections: A real-life experience.

    Science.gov (United States)

    Del Campo, José A; Parra-Sánchez, Manuel; Figueruela, Blanca; García-Rey, Silvia; Quer, Josep; Gregori, Josep; Bernal, Samuel; Grande, Lourdes; Palomares, José C; Romero-Gómez, Manuel

    2018-02-01

    The effectiveness of the new generation of hepatitis C treatments named direct-acting antiviral agents (DAAs) depends on the genotype, subtype, and resistance-associated substitutions present in individual patients. The aim of this study was to evaluate a massive sequencing platform for the analysis of genotypes and subtypes of hepatitis C virus (HCV) in order to optimize therapy. A total of 84 patients with hepatitis C were analyzed. The routine genotyping methodology for HCV used at the study institution (Versant HCV Assay, LiPA) was compared with a deep sequencing platform (454/GS-Junior and Illumina MiSeq). The mean viral load in these HCV patients was 6.89×10 6 ±7.02×10 5 . Viral genotypes analyzed by LiPA were distributed as follows: 26% genotype 1a (22/84), 55% genotype 1b (46/84), 1% genotype 1 (1/84), 2.5% genotype 3 (2/84), 6% genotype 3a (5/84), 6% genotype 4a/c/d (5/84). When analyzed by deep sequencing, the samples were distributed as follows: 27% genotype 1a (23/84), 56% genotype 1b (47/84), 8% genotype 3a (7/84), 5% genotype 4d (4/84), 2.5% genotype 4f (2/84). Six of the 84 patients (7%) were infected with more than one subtype. Among these, 33% (2/6) failed DAA-based triple therapy. The detection of mixed infection could explain some treatment failures. Accurate determination of viral genotypes and subtypes would allow optimal patient management and improve the effectiveness of DAA therapy. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  8. Deep-sequencing revealed Citrus bark cracking viroid (CBCVd) as a highly aggressive pathogen on hop

    Czech Academy of Sciences Publication Activity Database

    Jakše, J.; Radišek, S.; Pokorn, T.; Matoušek, Jaroslav; Javornik, B.

    2015-01-01

    Roč. 64, č. 4 (2015), s. 831-842 ISSN 0032-0862 R&D Projects: GA MŠk(CZ) LH14255 Institutional support: RVO:60077344 Keywords : Bioinformatic * Citrus bark cracking viroid * Hop * Next-generation sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 2.383, year: 2015

  9. High diversity of picornaviruses in rats from different continents revealed by deep sequencing

    DEFF Research Database (Denmark)

    Arn Hansen, Thomas; Mollerup, Sarah; Nguyen, Nam-Phuong

    2016-01-01

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norv...

  10. Deep sequencing of uveal melanoma identifies a recurrent mutation in PLCB4

    DEFF Research Database (Denmark)

    Johansson, Peter; Aoude, Lauren G; Wadt, Karin

    2016-01-01

    Next generation sequencing of uveal melanoma (UM) samples has identified a number of recurrent oncogenic or loss-of-function mutations in key driver genes including: GNAQ, GNA11, EIF1AX, SF3B1 and BAP1. To search for additional driver mutations in this tumor type we carried out whole-genome or wh......Next generation sequencing of uveal melanoma (UM) samples has identified a number of recurrent oncogenic or loss-of-function mutations in key driver genes including: GNAQ, GNA11, EIF1AX, SF3B1 and BAP1. To search for additional driver mutations in this tumor type we carried out whole......, instead, a BRCA mutation signature predominated. In addition to mutations in the known UM driver genes, we found a recurrent mutation in PLCB4 (c.G1888T, p.D630Y, NM_000933), which was validated using Sanger sequencing. The identical mutation was also found in published UM sequence data (1 of 56 tumors...

  11. RNAi-mediated endogene silencing in strawberry fruit: detection of primary and secondary siRNAs by deep sequencing.

    Science.gov (United States)

    Härtl, Katja; Kalinowski, Gregor; Hoffmann, Thomas; Preuss, Anja; Schwab, Wilfried

    2017-05-01

    RNA interference (RNAi) has been exploited as a reverse genetic tool for functional genomics in the nonmodel species strawberry (Fragaria × ananassa) since 2006. Here, we analysed for the first time different but overlapping nucleotide sections (>200 nt) of two endogenous genes, FaCHS (chalcone synthase) and FaOMT (O-methyltransferase), as inducer sequences and a transitive vector system to compare their gene silencing efficiencies. In total, ten vectors were assembled each containing the nucleotide sequence of one fragment in sense and corresponding antisense orientation separated by an intron (inverted hairpin construct, ihp). All sequence fragments along the full lengths of both target genes resulted in a significant down-regulation of the respective gene expression and related metabolite levels. Quantitative PCR data and successful application of a transitive vector system coinciding with a phenotypic change suggested propagation of the silencing signal. The spreading of the signal in strawberry fruit in the 3' direction was shown for the first time by the detection of secondary small interfering RNAs (siRNAs) outside of the primary targets by deep sequencing. Down-regulation of endogenes by the transitive method was less effective than silencing by ihp constructs probably because the numbers of primary siRNAs exceeded the quantity of secondary siRNAs by three orders of magnitude. Besides, we observed consistent hotspots of primary and secondary siRNA formation along the target sequence which fall within a distance of less than 200 nt. Thus, ihp vectors seem to be superior over the transitive vector system for functional genomics in strawberry fruit. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  12. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing.

    Directory of Open Access Journals (Sweden)

    Will Fischer

    2010-08-01

    Full Text Available We used ultra-deep sequencing to obtain tens of thousands of HIV-1 sequences from regions targeted by CD8+ T lymphocytes from longitudinal samples from three acutely infected subjects, and modeled viral evolution during the critical first weeks of infection. Previous studies suggested that a single virus established productive infection, but these conclusions were tempered because of limited sampling; now, we have greatly increased our confidence in this observation through modeling the observed earliest sample diversity based on vastly more extensive sampling. Conventional sequencing of HIV-1 from acute/early infection has shown different patterns of escape at different epitopes; we investigated the earliest escapes in exquisite detail. Over 3-6 weeks, ultradeep sequencing revealed that the virus explored an extraordinary array of potential escape routes in the process of evading the earliest CD8 T-lymphocyte responses--using 454 sequencing, we identified over 50 variant forms of each targeted epitope during early immune escape, while only 2-7 variants were detected in the same samples via conventional sequencing. In contrast to the diversity seen within epitopes, non-epitope regions, including the Envelope V3 region, which was sequenced as a control in each subject, displayed very low levels of variation. In early infection, in the regions sequenced, the consensus forms did not have a fitness advantage large enough to trigger reversion to consensus amino acids in the absence of immune pressure. In one subject, a genetic bottleneck was observed, with extensive diversity at the second time point narrowing to two dominant escape forms by the third time point, all within two months of infection. Traces of immune escape were observed in the earliest samples, suggesting that immune pressure is present and effective earlier than previously reported; quantifying the loss rate of the founder virus suggests a direct role for CD8 T-lymphocyte responses

  13. Characterization of a deep-coverage carrot (Daucus carota L.) BAC library and initial analysis of BAC-end sequences.

    Science.gov (United States)

    Cavagnaro, Pablo F; Chung, Sang-Min; Szklarczyk, Marek; Grzebelus, Dariusz; Senalik, Douglas; Atkins, Anne E; Simon, Philipp W

    2009-03-01

    Carrot is the most economically important member of the Apiaceae family and a major source of provitamin A carotenoids in the human diet. However, carrot molecular resources are relatively underdeveloped, hampering a number of genetic studies. Here, we report on the synthesis and characterization of a bacterial artificial chromosome (BAC) library of carrot. The library is 17.3-fold redundant and consists of 92,160 clones with an average insert size of 121 kb. To provide an overview of the composition and organization of the carrot nuclear genome we generated and analyzed 2,696 BAC-end sequences (BES) from nearly 2,000 BACs, totaling 1.74 Mb of BES. This analysis revealed that 14% of the BES consists of known repetitive elements, with transposable elements representing more than 80% of this fraction. Eleven novel carrot repetitive elements were identified, covering 8.5% of the BES. Analysis of microsatellites showed a comparably low frequency for these elements in the carrot BES. Comparisons of the translated BES with protein databases indicated that approximately 10% of the carrot genome represents coding sequences. Moreover, among eight dicot species used for comparison purposes, carrot BES had highest homology to protein-coding sequences from tomato. This deep-coverage library will aid carrot breeding and genetics.

  14. Transfer RNA detection by small RNA deep sequencing and disease association with myelodysplastic syndromes.

    Science.gov (United States)

    Guo, Yan; Bosompem, Amma; Mohan, Sanjay; Erdogan, Begum; Ye, Fei; Vickers, Kasey C; Sheng, Quanhu; Zhao, Shilin; Li, Chung-I; Su, Pei-Fang; Jagasia, Madan; Strickland, Stephen A; Griffiths, Elizabeth A; Kim, Annette S

    2015-09-24

    Although advances in sequencing technologies have popularized the use of microRNA (miRNA) sequencing (miRNA-seq) for the quantification of miRNA expression, questions remain concerning the optimal methodologies for analysis and utilization of the data. The construction of a miRNA sequencing library selects RNA by length rather than type. However, as we have previously described, miRNAs represent only a subset of the species obtained by size selection. Consequently, the libraries obtained for miRNA sequencing also contain a variety of additional species of small RNAs. This study looks at the prevalence of these other species obtained from bone marrow aspirate specimens and explores the predictive value of these small RNAs in the determination of response to therapy in myelodysplastic syndromes (MDS). Paired pre and post treatment bone marrow aspirate specimens were obtained from patients with MDS who were treated with either azacytidine or decitabine (24 pre-treatment specimens, 23 post-treatment specimens) with 22 additional non-MDS control specimens. Total RNA was extracted from these specimens and submitted for next generation sequencing after an additional size exclusion step to enrich for small RNAs. The species of small RNAs were enumerated, single nucleotide variants (SNVs) identified, and finally the differential expression of tRNA-derived species (tDRs) in the specimens correlated with diseasestatus and response to therapy. Using miRNA sequencing data generated from bone marrow aspirate samples of patients with known MDS (N = 47) and controls (N = 23), we demonstrated that transfer RNA (tRNA) fragments (specifically tRNA halves, tRHs) are one of the most common species of small RNA isolated from size selection. Using tRNA expression values extracted from miRNA sequencing data, we identified six tRNA fragments that are differentially expressed between MDS and normal samples. Using the elastic net method, we identified four tRNAs-derived small RNAs (t

  15. A Retrospective Examination of Feline Leukemia Subgroup Characterization: Viral Interference Assays to Deep Sequencing

    Directory of Open Access Journals (Sweden)

    Elliott S. Chiu

    2018-01-01

    Full Text Available Feline leukemia virus (FeLV was the first feline retrovirus discovered, and is associated with multiple fatal disease syndromes in cats, including lymphoma. The original research conducted on FeLV employed classical virological techniques. As methods have evolved to allow FeLV genetic characterization, investigators have continued to unravel the molecular pathology associated with this fascinating agent. In this review, we discuss how FeLV classification, transmission, and disease-inducing potential have been defined sequentially by viral interference assays, Sanger sequencing, PCR, and next-generation sequencing. In particular, we highlight the influences of endogenous FeLV and host genetics that represent FeLV research opportunities on the near horizon.

  16. Deep sequencing of the oral microbiome reveals signatures of periodontal disease.

    Directory of Open Access Journals (Sweden)

    Bo Liu

    Full Text Available The oral microbiome, the complex ecosystem of microbes inhabiting the human mouth, harbors several thousands of bacterial types. The proliferation of pathogenic bacteria within the mouth gives rise to periodontitis, an inflammatory disease known to also constitute a risk factor for cardiovascular disease. While much is known about individual species associated with pathogenesis, the system-level mechanisms underlying the transition from health to disease are still poorly understood. Through the sequencing of the 16S rRNA gene and of whole community DNA we provide a glimpse at the global genetic, metabolic, and ecological changes associated with periodontitis in 15 subgingival plaque samples, four from each of two periodontitis patients, and the remaining samples from three healthy individuals. We also demonstrate the power of whole-metagenome sequencing approaches in characterizing the genomes of key players in the oral microbiome, including an unculturable TM7 organism. We reveal the disease microbiome to be enriched in virulence factors, and adapted to a parasitic lifestyle that takes advantage of the disrupted host homeostasis. Furthermore, diseased samples share a common structure that was not found in completely healthy samples, suggesting that the disease state may occupy a narrow region within the space of possible configurations of the oral microbiome. Our pilot study demonstrates the power of high-throughput sequencing as a tool for understanding the role of the oral microbiome in periodontal disease. Despite a modest level of sequencing (~2 lanes Illumina 76 bp PE and high human DNA contamination (up to ~90% we were able to partially reconstruct several oral microbes and to preliminarily characterize some systems-level differences between the healthy and diseased oral microbiomes.

  17. MicroRNA identity and abundance in porcine skeletal muscles determined by deep sequencing

    DEFF Research Database (Denmark)

    Nielsen, M; Hansen, J H; Hedegaard, J

    2010-01-01

    MicroRNAs (miRNA) are short single-stranded RNA molecules that regulate gene expression post-transcriptionally by binding to complementary sequences in the 3' untranslated region (3' UTR) of target mRNAs. MiRNAs participate in the regulation of myogenesis, and identification of the complete set o...... that highly expressed miRNAs are involved in skeletal muscle development and regeneration, signal transduction, cell-cell and cell-extracellular matrix communication and neural development and function....

  18. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing.

    Science.gov (United States)

    Shen, Yingjia; Venu, R C; Nobuta, Kan; Wu, Xiaohui; Notibala, Varun; Demirci, Caghan; Meyers, Blake C; Wang, Guo-Liang; Ji, Guoli; Li, Qingshun Q

    2011-09-01

    Polyadenylation sites mark the ends of mRNA transcripts. Alternative polyadenylation (APA) may alter sequence elements and/or the coding capacity of transcripts, a mechanism that has been demonstrated to regulate gene expression and transcriptome diversity. To study the role of APA in transcriptome dynamics, we analyzed a large-scale data set of RNA "tags" that signify poly(A) sites and expression levels of mRNA. These tags were derived from a wide range of tissues and developmental stages that were mutated or exposed to environmental treatments, and generated using digital gene expression (DGE)-based protocols of the massively parallel signature sequencing (MPSS-DGE) and the Illumina sequencing-by-synthesis (SBS-DGE) sequencing platforms. The data offer a global view of APA and how it contributes to transcriptome dynamics. Upon analysis of these data, we found that ∼60% of Arabidopsis genes have multiple poly(A) sites. Likewise, ∼47% and 82% of rice genes use APA, supported by MPSS-DGE and SBS-DGE tags, respectively. In both species, ∼49%-66% of APA events were mapped upstream of annotated stop codons. Interestingly, 10% of the transcriptomes are made up of APA transcripts that are differentially distributed among developmental stages and in tissues responding to environmental stresses, providing an additional level of transcriptome dynamics. Examples of pollen-specific APA switching and salicylic acid treatment-specific APA clearly demonstrated such dynamics. The significance of these APAs is more evident in the 3034 genes that have conserved APA events between rice and Arabidopsis.

  19. MicroRNA repertoire for functional genome research in tilapia identified by deep sequencing.

    Science.gov (United States)

    Yan, Biao; Wang, Zhen-Hua; Zhu, Chang-Dong; Guo, Jin-Tao; Zhao, Jin-Liang

    2014-08-01

    The Nile tilapia (Oreochromis niloticus; Cichlidae) is an economically important species in aquaculture and occupies a prominent position in the aquaculture industry. MicroRNAs (miRNAs) are a class of noncoding RNAs that post-transcriptionally regulate gene expression involved in diverse biological and metabolic processes. To increase the repertoire of miRNAs characterized in tilapia, we used the Illumina/Solexa sequencing technology to sequence a small RNA library using pooled RNA sample isolated from the different developmental stages of tilapia. Bioinformatic analyses suggest that 197 conserved and 27 novel miRNAs are expressed in tilapia. Sequence alignments indicate that all tested miRNAs and miRNAs* are highly conserved across many species. In addition, we characterized the tissue expression patterns of five miRNAs using real-time quantitative PCR. We found that miR-1/206, miR-7/9, and miR-122 is abundantly expressed in muscle, brain, and liver, respectively, implying a potential role in the regulation of tissue differentiation or the maintenance of tissue identity. Overall, our results expand the number of tilapia miRNAs, and the discovery of miRNAs in tilapia genome contributes to a better understanding the role of miRNAs in regulating diverse biological processes.

  20. cDNA library construction of two human Demodexspecies.

    Science.gov (United States)

    Niu, DongLing; Wang, RuiLing; Zhao, YaE; Yang, Rui; Hu, Li; Lei, YuYang; Dan, WeiChao

    2017-06-01

    The research of Demodex, a type of pathogen causing various dermatoses in animals and human beings, is lacking at RNA level. This study aims at extracting RNA and constructing cDNA library for Demodex. First, P. cuniculiand D. farinaewere mixed to establish homogenization method for RNA extraction. Second, D. folliculorumand D. breviswere collected and preserved in Trizol, which were mixed with D. farinaerespectively to extract RNA. Finally, cDNA library was constructed and its quality was assessed. The results indicated that for D. folliculorum& D. farinae, the recombination rate of cDNA library was 90.67% and the library titer was 7.50 × 104 pfu/ml. 17 of the 59 positive clones were predicted to be of D. folliculorum; For D. brevis& D. farinae, the recombination rate was 90.96% and the library titer was 7.85 x104 pfu/ml. 40 of the 59 positive clones were predicted to be of D. brevis. Further detection by specific primers demonstrated that mtDNA cox1, cox3and ATP6 detected from cDNA libraries had 96.52%-99.73% identities with the corresponding sequences in GenBank. In conclusion, the cDNA libraries constructed for Demodexmixed with D. farinaewere successful and could satisfy the requirements for functional genes detection.

  1. Identification of microRNAs Involved in the Host Response to Enterovirus 71 Infection by a Deep Sequencing Approach

    Directory of Open Access Journals (Sweden)

    Lunbiao Cui

    2010-01-01

    Full Text Available Role of microRNA (miRNA has been highlighted in pathogen-host interactions recently. To identify cellular miRNAs involved in the host response to enterovirus 71 (EV71 infection, we performed a comprehensive miRNA profiling in EV71-infected Hep2 cells through deep sequencing. 64 miRNAs were found whose expression levels changed for more than 2-fold in response to EV71 infection. Gene ontology analysis revealed that many of these mRNAs play roles in neurological process, immune response, and cell death pathways, which are known to be associated with the extreme virulence of EV71. To our knowledge, this is the first study on host miRNAs expression alteration response to EV71 infection. Our findings supported the hypothesis that certain miRNAs might be essential in the host-pathogen interactions.

  2. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Lai-Ping Wong

    2014-05-01

    Full Text Available South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP. The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP. SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  3. Deep sequencing of plant and animal DNA contained within traditional Chinese medicines reveals legality issues and health safety concerns.

    Directory of Open Access Journals (Sweden)

    Megan L Coghlan

    Full Text Available Traditional Chinese medicine (TCM has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus and Saiga antelope (Saiga tatarica. Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety

  4. Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

    Directory of Open Access Journals (Sweden)

    Xiaobai Li

    Full Text Available Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs, 41,690 into 58 gene ontology (GO terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium.

  5. Identification of conserved and novel microRNAs in cashmere goat skin by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Zhihong Liu

    Full Text Available MicroRNAs (miRNAs are a class of small RNAs that play significant roles in regulating the expression of the post-transcriptional skin and hair follicle gene. In recent years, extensive studies on these microRNAs have been carried out in mammals such as mice, rats, pigs and cattle. By comparison, the number of microRNAs that have been identified in goats is relatively low; and in particular, the miRNAs associated with the processes of skin and hair follicle development remain largely unknown. In this study, areas of skin where the cashmere grows in anagen were sampled. A total of 10,943,292 reads were obtained using Solexa sequencing, a high-throughput sequencing technology. From 10,644,467 reads, we identified 3,381 distinct reads and after applying the classification statistics we obtained 316 miRNAs. Among them, using conservative identification, we found that 68 miRNAs (55 of these are confirmed to match known sheep and goat miRNAs in miRBase are conserved in goat and have been reported in NCBI; the remaining 248 miRNA were conserved in other species but have not been reported in goat. Furthermore, we identified 22 novel miRNAs. Both the known and novel miRNAs were confirmed by a second sequencing using the same method as was used in the first. This study confirmed the authenticity of 316 known miRNAs and the discovery of 22 novel miRNAs in goat. We found that the miRNAs that were co-expressed in goat and sheep were located in the same region of the respective chromosomes and may play an essential role in skin and follicle development. Identificaton of novel miRNAs resulted in significant enrichment of the repertoire of goat miRNAs.

  6. Sequence-of-events-driven automation of the deep space network

    Science.gov (United States)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  7. Deep sequencing uncovers numerous small RNAs on all four replicons of the plant pathogen Agrobacterium tumefaciens

    Science.gov (United States)

    Wilms, Ina; Overlöper, Aaron; Nowrousian, Minou; Sharma, Cynthia M.; Narberhaus, Franz

    2012-01-01

    Agrobacterium species are capable of interkingdom gene transfer between bacteria and plants. The genome of Agrobacterium tumefaciens consists of a circular and a linear chromosome, the At-plasmid and the Ti-plasmid, which harbors bacterial virulence genes required for tumor formation in plants. Little is known about promoter sequences and the small RNA (sRNA) repertoire of this and other α-proteobacteria. We used a differential RNA sequencing (dRNA-seq) approach to map transcriptional start sites of 388 annotated genes and operons. In addition, a total number of 228 sRNAs was revealed from all four Agrobacterium replicons. Twenty-two of these were confirmed by independent RNA gel blot analysis and several sRNAs were differentially expressed in response to growth media, growth phase, temperature or pH. One sRNA from the Ti-plasmid was massively induced under virulence conditions. The presence of 76 cis-antisense sRNAs, two of them on the reverse strand of virulence genes, suggests considerable antisense transcription in Agrobacterium. The information gained from this study provides a valuable reservoir for an in-depth understanding of sRNA-mediated regulation of the complex physiology and infection process of Agrobacterium. PMID:22336765

  8. Evolutionary Relations of Hexanchiformes Deep-Sea Sharks Elucidated by Whole Mitochondrial Genome Sequences

    Directory of Open Access Journals (Sweden)

    Keiko Tanaka

    2013-01-01

    Full Text Available Hexanchiformes is regarded as a monophyletic taxon, but the morphological and genetic relationships between the five extant species within the order are still uncertain. In this study, we determined the whole mitochondrial DNA (mtDNA sequences of seven sharks including representatives of the five Hexanchiformes, one squaliform, and one carcharhiniform and inferred the phylogenetic relationships among those species and 12 other Chondrichthyes (cartilaginous fishes species for which the complete mitogenome is available. The monophyly of Hexanchiformes and its close relation with all other Squaliformes sharks were strongly supported by likelihood and Bayesian phylogenetic analysis of 13,749 aligned nucleotides of 13 protein coding genes and two rRNA genes that were derived from the whole mDNA sequences of the 19 species. The phylogeny suggested that Hexanchiformes is in the superorder Squalomorphi, Chlamydoselachus anguineus (frilled shark is the sister species to all other Hexanchiformes, and the relations within Hexanchiformes are well resolved as Chlamydoselachus, (Notorynchus, (Heptranchias, (Hexanchus griseus, H. nakamurai. Based on our phylogeny, we discussed evolutionary scenarios of the jaw suspension mechanism and gill slit numbers that are significant features in the sharks.

  9. DEEPre: sequence-based enzyme EC number prediction by deep learning

    KAUST Repository

    Li, Yu

    2017-10-20

    Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre\\'s ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.

  10. Evolutionary Relations of Hexanchiformes Deep-Sea Sharks Elucidated by Whole Mitochondrial Genome Sequences

    Science.gov (United States)

    Tanaka, Keiko; Tomita, Taketeru; Suzuki, Shingo; Hosomichi, Kazuyoshi; Sano, Kazumi; Doi, Hiroyuki; Kono, Azumi; Inoko, Hidetoshi; Kulski, Jerzy K.; Tanaka, Sho

    2013-01-01

    Hexanchiformes is regarded as a monophyletic taxon, but the morphological and genetic relationships between the five extant species within the order are still uncertain. In this study, we determined the whole mitochondrial DNA (mtDNA) sequences of seven sharks including representatives of the five Hexanchiformes, one squaliform, and one carcharhiniform and inferred the phylogenetic relationships among those species and 12 other Chondrichthyes (cartilaginous fishes) species for which the complete mitogenome is available. The monophyly of Hexanchiformes and its close relation with all other Squaliformes sharks were strongly supported by likelihood and Bayesian phylogenetic analysis of 13,749 aligned nucleotides of 13 protein coding genes and two rRNA genes that were derived from the whole mDNA sequences of the 19 species. The phylogeny suggested that Hexanchiformes is in the superorder Squalomorphi, Chlamydoselachus anguineus (frilled shark) is the sister species to all other Hexanchiformes, and the relations within Hexanchiformes are well resolved as Chlamydoselachus, (Notorynchus, (Heptranchias, (Hexanchus griseus, H. nakamurai))). Based on our phylogeny, we discussed evolutionary scenarios of the jaw suspension mechanism and gill slit numbers that are significant features in the sharks. PMID:24089661

  11. DEEPre: sequence-based enzyme EC number prediction by deep learning.

    Science.gov (United States)

    Li, Yu; Wang, Sheng; Umarov, Ramzan; Xie, Bingqing; Fan, Ming; Li, Lihua; Gao, Xin

    2018-03-01

    Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms. The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre. xin.gao@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  12. Deep sequencing of subseafloor eukaryotic rRNA reveals active Fungi across marine subsurface provinces.

    Directory of Open Access Journals (Sweden)

    William Orsi

    Full Text Available The deep marine subsurface is a vast habitat for microbial life where cells may live on geologic timescales. Because DNA in sediments may be preserved on long timescales, ribosomal RNA (rRNA is suggested to be a proxy for the active fraction of a microbial community in the subsurface. During an investigation of eukaryotic 18S rRNA by amplicon pyrosequencing, unique profiles of Fungi were found across a range of marine subsurface provinces including ridge flanks, continental margins, and abyssal plains. Subseafloor fungal populations exhibit statistically significant correlations with total organic carbon (TOC, nitrate, sulfide, and dissolved inorganic carbon (DIC. These correlations are supported by terminal restriction length polymorphism (TRFLP analyses of fungal rRNA. Geochemical correlations with fungal pyrosequencing and TRFLP data from this geographically broad sample set suggests environmental selection of active Fungi in the marine subsurface. Within the same dataset, ancient rRNA signatures were recovered from plants and diatoms in marine sediments ranging from 0.03 to 2.7 million years old, suggesting that rRNA from some eukaryotic taxa may be much more stable than previously considered in the marine subsurface.

  13. Deep Canonical Time Warping for Simultaneous Alignment and Representation Learning of Sequences.

    Science.gov (United States)

    Trigeorgis, George; Nicolaou, Mihalis A; Schuller, Bjorn W; Zafeiriou, Stefanos

    2018-05-01

    Machine learning algorithms for the analysis of time-series often depend on the assumption that utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards temporal alignment are either applied directly on the observation space or simply utilise linear projections-thus failing to capture complex, hierarchical non-linear representations that may prove beneficial, especially when dealing with multi-modal data (e.g., visual and acoustic information). To this end, we present Deep Canonical Time Warping (DCTW), a method that automatically learns non-linear representations of multiple time-series that are (i) maximally correlated in a shared subspace, and (ii) temporally aligned. Furthermore, we extend DCTW to a supervised setting, where during training, available labels can be utilised towards enhancing the alignment process. By means of experiments on four datasets, we show that the representations learnt significantly outperform state-of-the-art methods in temporal alignment, elegantly handling scenarios with heterogeneous feature sets, such as the temporal alignment of acoustic and visual information.

  14. Analysis of microRNA profile of Anopheles sinensis by deep sequencing and bioinformatic approaches.

    Science.gov (United States)

    Feng, Xinyu; Zhou, Xiaojian; Zhou, Shuisen; Wang, Jingwen; Hu, Wei

    2018-03-12

    microRNAs (miRNAs) are small non-coding RNAs widely identified in many mosquitoes. They are reported to play important roles in development, differentiation and innate immunity. However, miRNAs in Anopheles sinensis, one of the Chinese malaria mosquitoes, remain largely unknown. We investigated the global miRNA expression profile of An. sinensis using Illumina Hiseq 2000 sequencing. Meanwhile, we applied a bioinformatic approach to identify potential miRNAs in An. sinensis. The identified miRNA profiles were compared and analyzed by two approaches. The selected miRNAs from the sequencing result and the bioinformatic approach were confirmed with qRT-PCR. Moreover, target prediction, GO annotation and pathway analysis were carried out to understand the role of miRNAs in An. sinensis. We identified 49 conserved miRNAs and 12 novel miRNAs by next-generation high-throughput sequencing technology. In contrast, 43 miRNAs were predicted by the bioinformatic approach, of which two were assigned as novel. Comparative analysis of miRNA profiles by two approaches showed that 21 miRNAs were shared between them. Twelve novel miRNAs did not match any known miRNAs of any organism, indicating that they are possibly species-specific. Forty miRNAs were found in many mosquito species, indicating that these miRNAs are evolutionally conserved and may have critical roles in the process of life. Both the selected known and novel miRNAs (asi-miR-281, asi-miR-184, asi-miR-14, asi-miR-nov5, asi-miR-nov4, asi-miR-9383, and asi-miR-2a) could be detected by quantitative real-time PCR (qRT-PCR) in the sequenced sample, and the expression patterns of these miRNAs measured by qRT-PCR were in concordance with the original miRNA sequencing data. The predicted targets for the known and the novel miRNAs covered many important biological roles and pathways indicating the diversity of miRNA functions. We also found 21 conserved miRNAs and eight counterparts of target immune pathway genes in An. sinensis

  15. Focused Evolution of HIV-1 Neutralizing Antibodies Revealed by Structures and Deep Sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Xueling; Zhou, Tongqing; Zhu, Jiang; Zhang, Baoshan; Georgiev, Ivelin; Wang, Charlene; Chen, Xuejun; Longo, Nancy S.; Louder, Mark; McKee, Krisha; O’Dell, Sijy; Perfetto, Stephen; Schmidt, Stephen D.; Shi, Wei; Wu, Lan; Yang, Yongping; Yang, Zhi-Yong; Yang, Zhongjia; Zhang, Zhenhai; Bonsignori, Mattia; Crump, John A.; Kapiga, Saidi H.; Sam, Noel E.; Haynes, Barton F.; Simek, Melissa; Burton, Dennis R.; Koff, Wayne C.; Doria-Rose, Nicole A.; Connors, Mark; Mullikin, James C.; Nabel, Gary J.; Roederer, Mario; Shapiro, Lawrence; Kwong, Peter D.; Mascola, John R. (Tumaini); (NIH); (Duke); (Kilimanjaro Repro.); (IAVI)

    2013-03-04

    Antibody VRC01 is a human immunoglobulin that neutralizes about 90% of HIV-1 isolates. To understand how such broadly neutralizing antibodies develop, we used x-ray crystallography and 454 pyrosequencing to characterize additional VRC01-like antibodies from HIV-1-infected individuals. Crystal structures revealed a convergent mode of binding for diverse antibodies to the same CD4-binding-site epitope. A functional genomics analysis of expressed heavy and light chains revealed common pathways of antibody-heavy chain maturation, confined to the IGHV1-2*02 lineage, involving dozens of somatic changes, and capable of pairing with different light chains. Broadly neutralizing HIV-1 immunity associated with VRC01-like antibodies thus involves the evolution of antibodies to a highly affinity-matured state required to recognize an invariant viral structure, with lineages defined from thousands of sequences providing a genetic roadmap of their development.

  16. cDNA fingerprinting of osteoprogenitor cells to isolate differentiation stage-specific genes.

    OpenAIRE

    Candeliere, G A; Rao, Y; Floh, A; Sandler, S D; Aubin, J E

    1999-01-01

    A cDNA fingerprinting strategy was developed to identify genes based on their differential expression pattern during osteoblast development. Preliminary biological and molecular staging of cDNA pools prepared by global amplification PCR allowed discrim-inating choices to be made in selection of expressed sequence tags (ESTs) to be isolated. Sequencing of selected ESTs confirmed that both known and novel genes can be isolated from any developmental stage of interest, e.g. from primitive progen...

  17. CDNA cloning, characterization and expression of an endosperm-specific barley peroxidase

    DEFF Research Database (Denmark)

    Rasmussen, Søren Kjærsgård; Welinder, K.G.; Hejgaard, J.

    1991-01-01

    A barley peroxidase (BP 1) of pI ca. 8.5 and M(r) 37000 has been purified from mature barley grains. Using antibodies towards peroxidase BP 1, a cDNA clone (pcR7) was isolated from cDNA expression library. The nucleotide sequence of pcR7 gave a derived amino acid sequence identical to the 158 C...

  18. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars.

    Science.gov (United States)

    Kim, Jungeun; Park, June Hyun; Lim, Chan Ju; Lim, Jae Yun; Ryu, Jee-Youn; Lee, Bong-Woo; Choi, Jae-Pil; Kim, Woong Bom; Lee, Ha Yeon; Choi, Yourim; Kim, Donghyun; Hur, Cheol-Goo; Kim, Sukweon; Noh, Yoo-Sun; Shin, Chanseok; Kwon, Suk-Yoon

    2012-11-21

    Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants--making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: 'Vital', 'Maroussia', and 'Sympathy' and Rosa rugosa Thunb., respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic resource which can be used to better understand

  19. Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models

    Directory of Open Access Journals (Sweden)

    Nouar AlDahoul

    2018-01-01

    Full Text Available Human detection in videos plays an important role in various real life applications. Most of traditional approaches depend on utilizing handcrafted features which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN, pretrained CNN feature extractor, and hierarchical extreme learning machine for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running. Experimental results demonstrated that the proposed methods are successful for human detection task. Pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with soft-max and 91.7% with Support Vector Machines (SVM. H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU, H-ELM’s training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high performance Graphical Processing Unit (GPU.

  20. Identification of Retinopathy of Prematurity Related miRNAs in Hyperoxia-Induced Neonatal Rats by Deep Sequencing

    Directory of Open Access Journals (Sweden)

    Ruibin Zhao

    2014-12-01

    Full Text Available Retinopathy of prematurity (ROP remains a major problem for many preterm infants. MicroRNAs (miRNAs are a class of small noncoding RNAs that regulate gene expression at the posttranscriptional level and have been studied in many diseases. To understand the roles of miRNAs in ROP model rats, we constructed two small RNA libraries from the plasma of hyperoxia-induced rats and normal controls. Sequencing data revealed that 44 down-regulated microRNAs and 22 up-regulated microRNAs from the hyperoxia-induced rats were identified by deep sequencing technology. Some of the differentially expressed miRNAs were confirmed by quantitative reverse transcription-PCR (qRT-PCR. A total of 594 target genes of the differentially expressed microRNAs were identified using a bioinformatics approach. Functional annotation analysis indicated that a number of pathways might be involved in angiogenesis, cell proliferation and cell differentiation, which might be involved in the genesis and development of ROP. The elevated expression level of the vascular endothelial growth factor (VEGF protein in the hyperoxia-induced neonatal rats was also confirmed by enzyme linked immunosorbent assay (ELISA. This study provides some insights into the molecular mechanisms that underlie ROP development, thereby aiding the diagnosis and treatment of this disease.

  1. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing

    Science.gov (United States)

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-01-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  2. Deep transcriptome sequencing reveals differences in global gene expression between normal and pale, soft, and exudative turkey meat.

    Science.gov (United States)

    Malila, Y; Carr, K M; Ernst, C W; Velleman, S G; Reed, K M; Strasburg, G M

    2014-03-01

    Previous studies from our laboratory suggested that differential expression of genes between normal and pale, soft, and exudative (PSE) turkey is associated with development of the PSE syndrome. However, a detailed understanding of molecular mechanisms responsible for the development of this meat defect remains unclear. The objective of this study was to extend and complement our previous work by using deep transcriptome RNA sequence analysis to compare the respective transcriptome profiles and identify molecular mechanisms responsible for the etiology of PSE turkey meat. Turkey breasts (n = 43) were previously classified as normal or PSE using marinade uptake as an indicator of quality (high = normal; low = PSE). Total RNA from breast muscle samples with the highest (n = 4) and lowest (n = 4) marinade uptake were isolated and sequenced using the Illumina GA(IIX) platform. The results indicated differential expression of 494 loci (false discovery rate turkey was suggested by both dramatic downregulation of pyruvate dehydrogenase kinase, isozyme 4 (PDK4) mRNA, the most downregulated gene, and a decrease in the protein product (P = 0.0007) as determined by immunoblot analysis. These results support the hypothesis that differential expression of several genes and their protein products contribute to development of PSE turkey.

  3. Deciphering KRAS and NRAS mutated clone dynamics in MLL-AF4 paediatric leukaemia by ultra deep sequencing analysis.

    Science.gov (United States)

    Trentin, Luca; Bresolin, Silvia; Giarin, Emanuela; Bardini, Michela; Serafin, Valentina; Accordi, Benedetta; Fais, Franco; Tenca, Claudya; De Lorenzo, Paola; Valsecchi, Maria Grazia; Cazzaniga, Giovanni; Kronnie, Geertruy Te; Basso, Giuseppe

    2016-10-04

    To induce and sustain the leukaemogenic process, MLL-AF4+ leukaemia seems to require very few genetic alterations in addition to the fusion gene itself. Studies of infant and paediatric patients with MLL-AF4+ B cell precursor acute lymphoblastic leukaemia (BCP-ALL) have reported mutations in KRAS and NRAS with incidences ranging from 25 to 50%. Whereas previous studies employed Sanger sequencing, here we used next generation amplicon deep sequencing for in depth evaluation of RAS mutations in 36 paediatric patients at diagnosis of MLL-AF4+ leukaemia. RAS mutations including those in small sub-clones were detected in 63.9% of patients. Furthermore, the mutational analysis of 17 paired samples at diagnosis and relapse revealed complex RAS clone dynamics and showed that the mutated clones present at relapse were almost all originated from clones that were already detectable at diagnosis and survived to the initial therapy. Finally, we showed that mutated patients were indeed characterized by a RAS related signature at both transcriptional and protein levels and that the targeting of the RAS pathway could be of beneficial for treatment of MLL-AF4+ BCP-ALL clones carrying somatic RAS mutations.

  4. Deep sequencing and proteomic analysis of the microRNA-induced silencing complex in human red blood cells.

    Science.gov (United States)

    Azzouzi, Imane; Moest, Hansjoerg; Wollscheid, Bernd; Schmugge, Markus; Eekels, Julia J M; Speer, Oliver

    2015-05-01

    During maturation, erythropoietic cells extrude their nuclei but retain their ability to respond to oxidant stress by tightly regulating protein translation. Several studies have reported microRNA-mediated regulation of translation during terminal stages of erythropoiesis, even after enucleation. In the present study, we performed a detailed examination of the endogenous microRNA machinery in human red blood cells using a combination of deep sequencing analysis of microRNAs and proteomic analysis of the microRNA-induced silencing complex. Among the 197 different microRNAs detected, miR-451a was the most abundant, representing more than 60% of all read sequences. In addition, miR-451a and its known target, 14-3-3ζ mRNA, were bound to the microRNA-induced silencing complex, implying their direct interaction in red blood cells. The proteomic characterization of endogenous Argonaute 2-associated microRNA-induced silencing complex revealed 26 cofactor candidates. Among these cofactors, we identified several RNA-binding proteins, as well as motor proteins and vesicular trafficking proteins. Our results demonstrate that red blood cells contain complex microRNA machinery, which might enable immature red blood cells to control protein translation independent of de novo nuclei information. Copyright © 2015 ISEH - International Society for Experimental Hematology. Published by Elsevier Inc. All rights reserved.

  5. Deep sequencing reveals the complex and coordinated transcriptional regulation of genes related to grain quality in rice cultivars

    Directory of Open Access Journals (Sweden)

    An Gynheung

    2011-04-01

    Full Text Available Abstract Background Milling yield and eating quality are two important grain quality traits in rice. To identify the genes involved in these two traits, we performed a deep transcriptional analysis of developing seeds using both massively parallel signature sequencing (MPSS and sequencing-by-synthesis (SBS. Five MPSS and five SBS libraries were constructed from 6-day-old developing seeds of Cypress (high milling yield, LaGrue (low milling yield, Ilpumbyeo (high eating quality, YR15965 (low eating quality, and Nipponbare (control. Results The transcriptomes revealed by MPSS and SBS had a high correlation co-efficient (0.81 to 0.90, and about 70% of the transcripts were commonly identified in both types of the libraries. SBS, however, identified 30% more transcripts than MPSS. Among the highly expressed genes in Cypress and Ilpumbyeo, over 100 conserved cis regulatory elements were identified. Numerous specifically expressed transcription factor (TF genes were identified in Cypress (282, LaGrue (312, Ilpumbyeo (363, YR15965 (260, and Nipponbare (357. Many key grain quality-related genes (i.e., genes involved in starch metabolism, aspartate amino acid metabolism, storage and allergenic protein synthesis, and seed maturation that were expressed at high levels underwent alternative splicing and produced antisense transcripts either in Cypress or Ilpumbyeo. Further, a time course RT-PCR analysis confirmed a higher expression level of genes involved in starch metabolism such as those encoding ADP glucose pyrophosphorylase (AGPase and granule bound starch synthase I (GBSS I in Cypress than that in LaGrue during early seed development. Conclusion This study represents the most comprehensive analysis of the developing seed transcriptome of rice available to date. Using two high throughput sequencing methods, we identified many differentially expressed genes that may affect milling yield or eating quality in rice. Many of the identified genes are involved

  6. Deep sequencing reveals the complex and coordinated transcriptional regulation of genes related to grain quality in rice cultivars.

    Science.gov (United States)

    Venu, Rc; Sreerekha, Mv; Nobuta, Kan; Beló, André; Ning, Yuese; An, Gynheung; Meyers, Blake C; Wang, Guo-Liang

    2011-04-14

    Milling yield and eating quality are two important grain quality traits in rice. To identify the genes involved in these two traits, we performed a deep transcriptional analysis of developing seeds using both massively parallel signature sequencing (MPSS) and sequencing-by-synthesis (SBS). Five MPSS and five SBS libraries were constructed from 6-day-old developing seeds of Cypress (high milling yield), LaGrue (low milling yield), Ilpumbyeo (high eating quality), YR15965 (low eating quality), and Nipponbare (control). The transcriptomes revealed by MPSS and SBS had a high correlation co-efficient (0.81 to 0.90), and about 70% of the transcripts were commonly identified in both types of the libraries. SBS, however, identified 30% more transcripts than MPSS. Among the highly expressed genes in Cypress and Ilpumbyeo, over 100 conserved cis regulatory elements were identified. Numerous specifically expressed transcription factor (TF) genes were identified in Cypress (282), LaGrue (312), Ilpumbyeo (363), YR15965 (260), and Nipponbare (357). Many key grain quality-related genes (i.e., genes involved in starch metabolism, aspartate amino acid metabolism, storage and allergenic protein synthesis, and seed maturation) that were expressed at high levels underwent alternative splicing and produced antisense transcripts either in Cypress or Ilpumbyeo. Further, a time course RT-PCR analysis confirmed a higher expression level of genes involved in starch metabolism such as those encoding ADP glucose pyrophosphorylase (AGPase) and granule bound starch synthase I (GBSS I) in Cypress than that in LaGrue during early seed development. This study represents the most comprehensive analysis of the developing seed transcriptome of rice available to date. Using two high throughput sequencing methods, we identified many differentially expressed genes that may affect milling yield or eating quality in rice. Many of the identified genes are involved in the biosynthesis of starch, aspartate

  7. Isolation of full-length putative rat lysophospholipase cDNA using improved methods for mRNA isolation and cDNA cloning

    International Nuclear Information System (INIS)

    Han, J.H.; Stratowa, C.; Rutter, W.J.

    1987-01-01

    The authors have cloned a full-length putative rat pancreatic lysophospholipase cDNA by an improved mRNA isolation method and cDNA cloning strategy using [ 32 P]-labelled nucleotides. These new methods allow the construction of a cDNA library from the adult rat pancreas in which the majority of recombinant clones contained complete sequences for the corresponding mRNAs. A previously recognized but unidentified long and relatively rare cDNA clone containing the entire sequence from the cap site at the 5' end to the poly(A) tail at the 3' end of the mRNA was isolated by single-step screening of the library. The size, amino acid composition, and the activity of the protein expressed in heterologous cells strongly suggest this mRNA codes for lysophospholipase

  8. cDNA cloning of a major allergen from timothy grass (Phleum pratense) pollen; characterization of the recombinant Phl pV allergen

    NARCIS (Netherlands)

    Vrtala, S.; Sperr, W. R.; Reimitzer, I.; van Ree, R.; Laffer, S.; Müller, W. D.; Valent, P.; Lechner, K.; Rumpold, H.; Kraft, D.

    1993-01-01

    We isolated a cDNA encoding a major grass pollen allergen from a timothy grass (Phleum pratense) pollen expression cDNA library using allergic patients' IgE. The complete cDNA encoded an allergen that binds IgE from about 80% of grass pollen-allergic patients. Significant sequence homology was found

  9. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    Directory of Open Access Journals (Sweden)

    Kim Jungeun

    2012-11-01

    Full Text Available Abstract Background Roses (Rosa sp., which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO terms, Plant Ontology (PO terms, and MIPS Functional Catalogue (FunCat terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a

  10. Systematic Analysis of Small RNAs Associated with Human Mitochondria by Deep Sequencing: Detailed Analysis of Mitochondrial Associated miRNA

    Science.gov (United States)

    Sripada, Lakshmi; Tomar, Dhanendra; Prajapati, Paresh; Singh, Rochika; Singh, Arun Kumar; Singh, Rajesh

    2012-01-01

    Mitochondria are one of the central regulators of many cellular processes beyond its well established role in energy metabolism. The inter-organellar crosstalk is critical for the optimal function of mitochondria. Many nuclear encoded proteins and RNA are imported to mitochondria. The translocation of small RNA (sRNA) including miRNA to mitochondria and other sub-cellular organelle is still not clear. We characterized here sRNA including miRNA associated with human mitochondria by cellular fractionation and deep sequencing approach. Mitochondria were purified from HEK293 and HeLa cells for RNA isolation. The sRNA library was generated and sequenced using Illumina system. The analysis showed the presence of unique population of sRNA associated with mitochondria including miRNA. Putative novel miRNAs were characterized from unannotated sRNA sequences. The study showed the association of 428 known, 196 putative novel miRNAs to mitochondria of HEK293 and 327 known, 13 putative novel miRNAs to mitochondria of HeLa cells. The alignment of sRNA to mitochondrial genome was also studied. The targets were analyzed using DAVID to classify them in unique networks using GO and KEGG tools. Analysis of identified targets showed that miRNA associated with mitochondria regulates critical cellular processes like RNA turnover, apoptosis, cell cycle and nucleotide metabolism. The six miRNAs (counts >1000) associated with mitochondria of both HEK293 and HeLa were validated by RT-qPCR. To our knowledge, this is the first systematic study demonstrating the associations of sRNA including miRNA with mitochondria that may regulate site-specific turnover of target mRNA important for mitochondrial related functions. PMID:22984580

  11. Characterization of microRNAs in Taenia saginata of zoonotic significance by Solexa deep sequencing and bioinformatics analysis.

    Science.gov (United States)

    Ai, L; Xu, M J; Chen, M X; Zhang, Y N; Chen, S H; Guo, J; Cai, Y C; Zhou, X N; Zhu, X Q; Chen, J X

    2012-06-01

    The beef tapeworm Taenia saginata infects human beings with symptoms ranging from nausea, abdominal discomfort to digestive disturbances and intestinal blockage. In the present study, microRNA (miRNA) expressing profile in adult T. saginata was analyzed using Solexa deep sequencing and bioinformatics analysis. A total of 15.8 million reads was obtained by Solexa sequencing, and 13.3 million clean reads (1.73 million unique sequences) was obtained after removing reads smaller than 18 nt. Ten conserved miRNAs corresponding to 607,382 reads were found when matching the reads against known miRNAs of Schistosoma japonicum in miRBase database. The miR-71 had the most abundant expression in T. saginata, followed by miR-219-5p, but some other common miRNAs such as let-7, miR-40, and miR-103 were not identified in T. saginata. Nucleotide bias analysis found that the known miRNAs showed high bias and the uracil was the dominant nucleotide, particularly at the first and 11th positions which were almost at the beginning and middle of conserved miRNAs. One novel miRNA (Tsa-miR-001) corresponding to ten precursors was identified and confirmed by stem-loop RT-PCR. To our knowledge, this is the first report of miRNA profiles in T. saginata, which will contribute to better understanding of the complex biology of this zoonotic trematode. The reported data of T. saginata miRNAs should provide valuable references for miRNA studies of closed related zoonotic Taenia cestodes such as Taenia solium and Taenia asiatica.

  12. Trehalose as a good candidate for enriching full-length cDNAs in cDNA library construction.

    Science.gov (United States)

    Chen, Lei; Cao, Lixue; Zhou, Longhai; Jing, Yudong; Chen, Zuozhou; Deng, Cheng; Shen, Yu; Chen, Liangbiao

    2007-01-10

    It has been reported that the disaccharide trehalose is capable of increasing the thermostability and thermoactivity of reverse transcriptase, and therefore improving the length of cDNA synthesis. However, no test has been done on how the disaccharide trehalose performs in the context of the entire cDNA synthesis processes, or whether it can seamlessly integrate into the commercially available cDNA synthesis kit. In this report, we optimized a protocol to incorporate trehalose in the Stratagene's cDNA library construction kit in order to demonstrate great improvement in cDNA's length (average length of 1.8 kb in the trehalose group versus 1.0 kb in the control). Sequence analysis of the cDNA clones showed that the addition of trehalose did not increase the error rate of the RT products but greatly increase the quantity of full-length in cDNA library.

  13. Deep sequencing reveals new aspects of progesterone receptor signaling in breast cancer cells.

    Directory of Open Access Journals (Sweden)

    Anastasia Kougioumtzi

    Full Text Available Despite the pleiotropic effects of the progesterone receptor in breast cancer, the molecular mechanisms in play remain largely unknown. To gain a global view of the PR-orchestrated networks, we used next-generation sequencing to determine the progestin-regulated transcriptome in T47D breast cancer cells. We identify a large number of PR target genes involved in critical cellular programs, such as regulation of transcription, apoptosis, cell motion and angiogenesis. Integration of the transcriptomic data with the PR-binding profiling of hormonally treated cells identifies numerous components of the small-GTPases signaling pathways as direct PR targets. Progestin-induced deregulation of the small GTPases may contribute to the PR's role in mammary tumorigenesis. Transcript expression analysis reveals significant expression changes of specific transcript variants in response to the extracellular hormonal stimulus. Using the NET1 gene as an example, we show that the PR can dictate alternative promoter usage leading to the upregulation of an isoform that may play a role in metastatic breast cancer. Future studies should aim to characterize these selectively regulated variants and evaluate their clinical utility in prognosis and targeted therapy of hormonally responsive breast tumors.

  14. Deep sequencing and ecological characterization of gut microbial communities of diverse bumble bee species.

    Directory of Open Access Journals (Sweden)

    Haw Chuan Lim

    Full Text Available Gut bacterial communities of bumble bees are correlated with defense against pathogens. Further understanding this host-microbe association is vitally important as bumble bees are currently experiencing global population declines, potentially due in part to emergent diseases. In this study, we used pyrosequencing and community fingerprinting (ARISA to characterize the gut microbial communities of nine bumble species from across the Bombus phylogeny. Overall, we delimited 74 bacterial taxa (operational taxonomic units or OTUs belonging to Betaproteobacteria, Gammaproteobacteria, Bacilli, Actinobacteria, Flavobacteria and Alphaproteobacteria. Each bacterial community was taxonomically simple, containing an average of 1.9 common (relative abundance per sample > 5% bacterial OTUs. The most abundant and prevalent (occurring in 92% of the samples bacterial OTU, based on 16S rRNA sequences, closely matched that of the previously described Betaproteobacteria species Snodgrassella alvi. Bacteria that were first described in bee-related external environments dominated a number of gut bacterial communities, suggesting that they are not strictly dependent on the internal gut environment. The ARISA data showed a correlation between bacterial community structures and the geographic locations where the bees were sampled, suggesting that at least a subset of the bacterial species may be transmitted environmentally. Using light and fluorescent microscopy, we demonstrated that the gut bacteria form a biofilm on the internal epithelial surface of the ileum, corroborating results obtained from Apis mellifera.

  15. Deep Sequencing of Porphyromonas gingivalis and comparative transcriptome analysis of a LuxS mutant

    Directory of Open Access Journals (Sweden)

    Takanoi eHirano

    2012-06-01

    Full Text Available Porphyromonas gingivalis is a major etiological agent and chronic and aggressive forms of periodontal disease. The organism is an assacharolytic anaerobe and is a constituent of mixed species biofilms in a variety of microenvironments in the oral cavity. P. gingivalis expresses a range of virulence factors over which it exerts tight control. High-throughput sequencing technologies provide the opportunity to relate functional genomics to basic biology. In this study we report qualitative and quantitative RNA-Seq analysis of the transcriptome of P. gingivalis. We have also applied RNA-Seq to the transcriptome of a ΔluxS mutant of P. gingivalis deficient in AI-2-mediated bacterial communication. The transcriptome analysis confirmed the expression of all predicted ORFs for strain ATCC 33277, including 854 hypothetical proteins, and allowed the identification of hitherto unknown transcriptional units. Twelve noncoding RNAs were identified, including 11 small RNAs and one cobalamine riboswitch. Fifty seven genes were differentially regulated in the LuxS mutant. Addition of exogenous synthetic 4,5-dihydroxy-2,3-pentanedione (DPD, AI-2 precursor to the ΔluxS mutant culture complemented expression of a subset of genes, indicating that LuxS is involved in both AI-2 signaling and non-signaling dependent systems in P. gingivalis. This work provides an important dataset for future study of P. gingivalis pathophysiology and further defines the LuxS regulon in this oral pathogen.

  16. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants.

    Directory of Open Access Journals (Sweden)

    Nadin Rohland

    2010-12-01

    Full Text Available To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success.

  17. Deep sequencing analysis of viral short RNAs from an infected Pinot Noir grapevine.

    Science.gov (United States)

    Pantaleo, Vitantonio; Saldarelli, Pasquale; Miozzi, Laura; Giampetruzzi, Annalisa; Gisel, Andreas; Moxon, Simon; Dalmay, Tamas; Bisztray, György; Burgyan, Jozsef

    2010-12-05

    Virus-derived short interfering RNAs (vsiRNAs) isolated from grapevine V. vinifera Pinot Noir clone ENTAV 115 were analyzed by high-throughput sequencing using the Illumina Solexa platform. We identified and characterized vsiRNAs derived from grapevine field plants naturally infected with different viruses belonging to the genera Foveavirus, Maculavirus, Marafivirus and Nepovirus. These vsiRNAs were mainly of 21 and 22 nucleotides (nt) in size and were discontinuously distributed throughout Grapevine rupestris stem-pitting associated virus (GRSPaV) and Grapevine fleck virus (GFkV) genomic RNAs. Among the studied viruses, GRSPaV and GFkV vsiRNAs had a 5' terminal nucleotide bias, which differed from that described for experimental viral infections in Arabidopsis thaliana. VsiRNAs were found to originate from both genomic and antigenomic GRSPaV RNA strands, whereas with the grapevine tymoviruses GFkV and Grapevine Red Globe associated virus (GRGV), the large majority derived from the antigenomic viral strand, a feature never observed in other plant-virus interactions. Copyright © 2010 Elsevier Inc. All rights reserved.

  18. Deep sequencing analyses of low density microbial communities: working at the boundary of accurate microbiota detection.

    Directory of Open Access Journals (Sweden)

    Giske Biesbroek

    Full Text Available INTRODUCTION: Accurate analyses of microbiota composition of low-density communities (10(3-10(4 bacteria/sample can be challenging. Background DNA from chemicals and consumables, extraction biases as well as differences in PCR efficiency can significantly interfere with microbiota assessment. This study was aiming to establish protocols for accurate microbiota analysis at low microbial density. METHODS: To examine possible effects of bacterial density on microbiota analyses we compared microbiota profiles of serial diluted saliva and low (nares, nasopharynx and high-density (oropharynx upper airway communities in four healthy individuals. DNA was extracted with four different extraction methods (Epicentre Masterpure, Qiagen DNeasy, Mobio Powersoil and a phenol bead-beating protocol combined with Agowa-Mag-mini. Bacterial DNA recovery was analysed by 16S qPCR and microbiota profiles through GS-FLX-Titanium-Sequencing of 16S rRNA gene amplicons spanning the V5-V7 regions. RESULTS: Lower template concentrations significantly impacted microbiota profiling results. With higher dilutions, low abundant species were overrepresented. In samples of <10(5 bacteria per ml, e.g. DNA <1 pg/µl, microbiota profiling deviated from the original sample and other dilutions showing a significant increase in the taxa Proteobacteria and decrease in Bacteroidetes. In similar low density samples, DNA extraction method determined if DNA levels were below or above 1 pg/µl and, together with lysis preferences per method, had profound impact on microbiota analyses in both relative abundance as well as representation of species. CONCLUSION: This study aimed to interpret microbiota analyses of low-density communities. Bacterial density seemed to interfere with microbiota analyses at < than 10(6 bacteria per ml or DNA <1 pg/µl. We therefore recommend this threshold for working with low density materials. This study underlines that bias reduction is crucial for adequate

  19. Deep small RNA sequencing from the nematode Ascaris reveals conservation, functional diversification, and novel developmental profiles.

    Science.gov (United States)

    Wang, Jianbin; Czech, Benjamin; Crunk, Amanda; Wallace, Adam; Mitreva, Makedonka; Hannon, Gregory J; Davis, Richard E

    2011-09-01

    Eukaryotic cells express several classes of small RNAs that regulate gene expression and ensure genome maintenance. Endogenous siRNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs) mainly control gene and transposon expression in the germline, while microRNAs (miRNAs) generally function in post-transcriptional gene silencing in both somatic and germline cells. To provide an evolutionary and developmental perspective on small RNA pathways in nematodes, we identified and characterized known and novel small RNA classes through gametogenesis and embryo development in the parasitic nematode Ascaris suum and compared them with known small RNAs of Caenorhabditis elegans. piRNAs, Piwi-clade Argonautes, and other proteins associated with the piRNA pathway have been lost in Ascaris. miRNAs are synthesized immediately after fertilization in utero, before pronuclear fusion, and before the first cleavage of the zygote. This is the earliest expression of small RNAs ever described at a developmental stage long thought to be transcriptionally quiescent. A comparison of the two classes of Ascaris endo-siRNAs, 22G-RNAs and 26G-RNAs, to those in C. elegans, suggests great diversification and plasticity in the use of small RNA pathways during spermatogenesis in different nematodes. Our data reveal conserved characteristics of nematode small RNAs as well as features unique to Ascaris that illustrate significant flexibility in the use of small RNAs pathways, some of which are likely an adaptation to Ascaris' life cycle and parasitism. The transcriptome assembly has been submitted to NCBI Transcriptome Shotgun Assembly Sequence Database(http://www.ncbi.nlm.nih.gov/genbank/TSA.html) under accession numbers JI163767–JI182837 and JI210738–JI257410.

  20. dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes

    Directory of Open Access Journals (Sweden)

    Li Hua

    2012-10-01

    Full Text Available Abstract Background Dysregulation of imprinted genes, which are expressed in a parent-of-origin-specific manner, plays an important role in various human diseases, such as cancer and behavioral disorder. To date, however, fewer than 100 imprinted genes have been identified in the human genome. The recent availability of high-throughput technology makes it possible to have large-scale prediction of imprinted genes. Here we propose a Bayesian model (dsPIG to predict imprinted genes on the basis of allelic expression observed in mRNA-Seq data of independent human tissues. Results Our model (dsPIG was capable of identifying imprinted genes with high sensitivity and specificity and a low false discovery rate when the number of sequenced tissue samples was fairly large, according to simulations. By applying dsPIG to the mRNA-Seq data, we predicted 94 imprinted genes in 20 cerebellum samples and 57 imprinted genes in 9 diverse tissue samples with expected low false discovery rates. We also assessed dsPIG using previously validated imprinted and non-imprinted genes. With simulations, we further analyzed how imbalanced allelic expression of non-imprinted genes or different minor allele frequencies affected the predictions of dsPIG. Interestingly, we found that, among biallelically expressed genes, at least 18 genes expressed significantly more transcripts from one allele than the other among different individuals and tissues. Conclusion With the prevalence of the mRNA-Seq technology, dsPIG has become a useful tool for analysis of allelic expression and large-scale prediction of imprinted genes. For ease of use, we have set up a web service and also provided an R package for dsPIG at http://www.shoudanliang.com/dsPIG/.

  1. Deep sequencing whole transcriptome exploration of the σE regulon in Neisseria meningitidis.

    Directory of Open Access Journals (Sweden)

    Robert Antonius Gerhardus Huis in 't Veld

    Full Text Available Bacteria live in an ever-changing environment and must alter protein expression promptly to adapt to these changes and survive. Specific response genes that are regulated by a subset of alternative σ(70-like transcription factors have evolved in order to respond to this changing environment. Recently, we have described the existence of a σ(E regulon including the anti-σ-factor MseR in the obligate human bacterial pathogen Neisseria meningitidis. To unravel the complete σ(E regulon in N. meningitidis, we sequenced total RNA transcriptional content of wild type meningococci and compared it with that of mseR mutant cells (ΔmseR in which σ(E is highly expressed. Eleven coding genes and one non-coding gene were found to be differentially expressed between H44/76 wildtype and H44/76ΔmseR cells. Five of the 6 genes of the σ(E operon, msrA/msrB, and the gene encoding a pepSY-associated TM helix family protein showed enhanced transcription, whilst aniA encoding a nitrite reductase and nspA encoding the vaccine candidate Neisserial surface protein A showed decreased transcription. Analysis of differential expression in IGRs showed enhanced transcription of a non-coding RNA molecule, identifying a σ(E dependent small non-coding RNA. Together this constitutes the first complete exploration of an alternative σ-factor regulon in N. meningitidis. The results direct to a relatively small regulon indicative for a strictly defined response consistent with a relatively stable niche, the human throat, where N. meningitidis resides.

  2. Draft Genome Sequences of TwoThiomicrospiraStrains Isolated from the Brine-Seawater Interface of Kebrit Deep in the Red Sea

    KAUST Repository

    Zhang, Guishan

    2016-03-11

    Two Thiomicrospira strains, WB1 and XS5, were isolated from the Kebrit Deep brine-seawater interface in the Red Sea, Saudi Arabia. Here, we present the draft genome sequences of these gammaproteobacteria, which both produce sulfuric acid from thiosulfate in culture.

  3. Draft Genome Sequence of Pseudoalteromonas sp. Strain XI10 Isolated from the Brine-Seawater Interface of Erba Deep in the Red Sea

    KAUST Repository

    Zhang, Guishan

    2016-03-10

    Pseudoalteromonas sp. strain XI10 was isolated from the brine-seawater interface of Erba Deep in the Red Sea, Saudi Arabia. Here, we present the draft genome sequence of strain XI10, a gammaproteobacterium that synthesizes polysaccharides for biofilm formation when grown in liquid culture.

  4. Isolation of an ATP synthase cDNA from Sinonovacula constricta ...

    African Journals Online (AJOL)

    Yomi

    2012-01-24

    Jan 24, 2012 ... Complete cDNA sequence of ScATPase and its deduced amino acid sequence. Nucleotides were numbered from the first base at the 5'end. The canonical polyadenylation signal-sequence was italic and underlined. The asterisk indicated the stop codon. The domain for ATP synthase C was underlined.

  5. [Construction and characterization of normalized cDNA library of maize inbred Mo17 from multiple tissues and developmental stages].

    Science.gov (United States)

    Zhang, Z X; Zhang, F D; Tang, W H; Pi, Y J; Zheng, Y L

    2005-01-01

    Comprehensive complementary DNA (cDNA) library is a valuable resource for functional genomics. In this study, we set up a normalized cDNA library of Mo17 (MONL) by saturation hybridization with genomic DNA, which contained expressed genes of eight tissues and organs from inbred Mo17 of maize (Zea mays L.). In this library, the insert sizes range from 0.4 kb to 4 kb and the average size is 1.18 kb. 10.830 clones were spotted on nylon membrane to make a cDNA microarray. Randomly picked 300 clones from the cDNA library were sequenced. The cDNA microarry was hybridized with pooled tissue mRNA probes or housekeeping gene cDNA probes. The results showed the normalized cDNA library comprehensively includes tissue-specific genes in which 71% are unique ESTs (expressed sequence tags) based on the 300 sequences analyzed. Using BLAST program to compare the sequences against online nucleotide databases, 88% sequences were found in ZmDB or NCBI, and 12% sequences were not found in existing nucleotide databases. More than 73% sequences are of unknown function. The library could be extensively used in developing DNA markers, sequencing ESTs, mining new genes, identifying positional cloning and candidate gene, and developing microarrays in maize genomics research.

  6. [A novel vector for construction of a cDNA library].

    Science.gov (United States)

    Fedchenko, V I; Kaloshin, A A; Medvedev, A E

    2010-01-01

    A new original vector pEM-(dT)40(f+) has been prepared. It can be used for cDNA library construction from polyadenylated mRNA, isolated from various sources. The pGEM-(dT)40f(+) is initially transformed into single stranded and then into a linear form and its (dT)40 tail at 3'-end is used as the vector-primer for synthesis of the first strand cDNA. The use of a synthetic oligonucleotide complementary to the vector and recombinant DNA results in vector cyclization and synthesis of the second strand cDNA. This approach significantly simplifies cDNA library construction, it does not require PCR reaction (which can induce artifact mutations in cDNA sequences) and restrictase treatment.

  7. Deep Sea Coral voucher sequence dataset - Identification of deep-sea corals collected during the 2009 - 2014 West Coast Groundfish Bottom Trawl Survey

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Data for this project resides in the West Coast Groundfish Bottom Trawl Survey Database. Deep-sea corals are often components of trawling bycatch, though their...

  8. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Konstantin Okonechnikov

    Full Text Available Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion.

  9. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data.

    Science.gov (United States)

    Okonechnikov, Konstantin; Imai-Matsushima, Aki; Paul, Lukas; Seitz, Alexander; Meyer, Thomas F; Garcia-Alcalde, Fernando

    2016-01-01

    Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion.

  10. Deep RNA sequencing reveals dynamic regulation of myocardial noncoding RNAs in failing human heart and remodeling with mechanical circulatory support.

    Science.gov (United States)

    Yang, Kai-Chien; Yamada, Kathryn A; Patel, Akshar Y; Topkara, Veli K; George, Isaac; Cheema, Faisal H; Ewald, Gregory A; Mann, Douglas L; Nerbonne, Jeanne M

    2014-03-04

    Microarrays have been used extensively to profile transcriptome remodeling in failing human heart, although the genomic coverage provided is limited and fails to provide a detailed picture of the myocardial transcriptome landscape. Here, we describe sequencing-based transcriptome profiling, providing comprehensive analysis of myocardial mRNA, microRNA (miRNA), and long noncoding RNA (lncRNA) expression in failing human heart before and after mechanical support with a left ventricular (LV) assist device (LVAD). Deep sequencing of RNA isolated from paired nonischemic (NICM; n=8) and ischemic (ICM; n=8) human failing LV samples collected before and after LVAD and from nonfailing human LV (n=8) was conducted. These analyses revealed high abundance of mRNA (37%) and lncRNA (71%) of mitochondrial origin. miRNASeq revealed 160 and 147 differentially expressed miRNAs in ICM and NICM, respectively, compared with nonfailing LV. Among these, only 2 (ICM) and 5 (NICM) miRNAs are normalized with LVAD. RNASeq detected 18 480, including 113 novel, lncRNAs in human LV. Among the 679 (ICM) and 570 (NICM) lncRNAs differentially expressed with heart failure, ≈10% are improved or normalized with LVAD. In addition, the expression signature of lncRNAs, but not miRNAs or mRNAs, distinguishes ICM from NICM. Further analysis suggests that cis-gene regulation represents a major mechanism of action of human cardiac lncRNAs. The myocardial transcriptome is dynamically regulated in advanced heart failure and after LVAD support. The expression profiles of lncRNAs, but not mRNAs or miRNAs, can discriminate failing hearts of different pathologies and are markedly altered in response to LVAD support. These results suggest an important role for lncRNAs in the pathogenesis of heart failure and in reverse remodeling observed with mechanical support.

  11. Analyses of Tissue Culture Adaptation of Human Herpesvirus-6A by Whole Genome Deep Sequencing Redefines the Reference Sequence and Identifies Virus Entry Complex Changes.

    Science.gov (United States)

    Tweedy, Joshua G; Escriva, Eric; Topf, Maya; Gompels, Ursula A

    2017-12-31

    Tissue-culture adaptation of viruses can modulate infection. Laboratory passage and bacterial artificial chromosome (BAC)mid cloning of human cytomegalovirus, HCMV, resulted in genomic deletions and rearrangements altering genes encoding the virus entry complex, which affected cellular tropism, virulence, and vaccine development. Here, we analyse these effects on the reference genome for related betaherpesviruses, Roseolovirus, human herpesvirus 6A (HHV-6A) strain U1102. This virus is also naturally "cloned" by germline subtelomeric chromosomal-integration in approximately 1% of human populations, and accurate references are key to understanding pathological relationships between exogenous and endogenous virus. Using whole genome next-generation deep-sequencing Illumina-based methods, we compared the original isolate to tissue-culture passaged and the BACmid-cloned virus. This re-defined the reference genome showing 32 corrections and 5 polymorphisms. Furthermore, minor variant analyses of passaged and BACmid virus identified emerging populations of a further 32 single nucleotide polymorphisms (SNPs) in 10 loci, half non-synonymous indicating cell-culture selection. Analyses of the BAC-virus genome showed deletion of the BAC cassette via loxP recombination removing green fluorescent protein (GFP)-based selection. As shown for HCMV culture effects, select HHV-6A SNPs mapped to genes encoding mediators of virus cellular entry, including virus envelope glycoprotein genes gB and the gH/gL complex. Comparative models suggest stabilisation of the post-fusion conformation. These SNPs are essential to consider in vaccine-design, antimicrobial-resistance, and pathogenesis.

  12. The stability of aerobic granular sludge treating municipal sludge deep dewatering filtrate in a bench scale sequencing batch reactor.

    Science.gov (United States)

    Long, Bei; Yang, Chang-Zhu; Pu, Wen-Hong; Yang, Jia-Kuan; Shi, Ya-Fei; Wang, Jing; Bai, Jun; Zhou, Xuan-Yue; Jiang, Guo-Sheng; Li, Chun-Yang; Liu, Fu-Biao

    2014-10-01

    Inoculated with mature aerobic granular sludge in a sequencing batch reactor, gradually increasing the proportion of municipal sludge deep dewatering filtrate in influent, aerobic granular sludge was domesticated after 84 days and maintained its structure during the operation. The domesticated AGS was yellowish-brown, dense and irregular spherical shape, average size was 1.49 mm, water content and specific density were 98.13% and 1.0114, the SVI and settling velocity were 40 ml/g and 46.5m/h. After 38 days, NO3(-)-N accumulated obviously in the reactor as lack of carbon sources. When adding 1-3g solid CH3COONa at 4.5 and 5.5h of each cycle from the 57th day, the removal rate of TN rose to above 90% after 20 days, where effective COD removal and denitrification were realized in a single bioreactor. Finally, the removal rates of COD, TP, TN and NH4(+)-N were higher than 95%, 88%, 96% and 99%. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Expression analysis of a ''Cucurbita'' cDNA encoding endonuclease

    International Nuclear Information System (INIS)

    Szopa, J.

    1995-01-01

    The nuclear matrices of plant cell nuclei display intrinsic nuclease activity which consists in nicking supercoiled DNA. A cDNA encoding a 32 kDa endonuclease has been cloned and sequenced. The nucleotide and deduced amino-acid sequences show high homology to known 14-3-3-protein sequences from other sources. The amino-acid sequence shows agreement with consensus sequences for potential phosphorylation by protein kinase A and C and for calcium, lipid and membrane-binding sites. The nucleotide-binding site is also present within the conserved part of the sequence. By Northern blot analysis, the differential expression of the corresponding mRNA was detected; it was the strongest in sink tissues. The endonuclease activity found on DNA-polyacrylamide gel electrophoresis coincided with mRNA content and was the highest in tuber. (author). 22 refs, 6 figs

  14. [Screening of specifically expressed genes in amphioxus neurula by construction of a subtractive cDNA library].

    Science.gov (United States)

    Zhang, Lei; Yang, Yong-Jie; Zhang, Yan-Jun

    2010-12-01

    To screen specifically expressed genes in the development of nerve, muscle, and body axis of amphioxus, Branchiostoma belcheri tsingtauenese. A subtractive cDNA library was constructed from the 12-hour amphioxus neurula cDNA after subtractively hybridized with the 6-hour amphioxus gastrula cDNA. The total RNA was extracted from the 12-hour neurula and 6-hour gastrula, then reverse transcribed into cDNA. The 12-hour neurula cDNA was designated as the experimental group (the tester) and the 6-hour gastrula cDNA as the control group (the driver). The differentially expressed sequences were exponentially amplified using suppression PCR. Background was subtracted and differentially expressed sequences were further enriched. The PCR products were ligated to the T Vector. After transformation of the recombinant plasmid carrying inserted amphioxus cDNA into E.coli host cells, the cDNA library was constructed successfully. Two hundred randomly chosen positive clones were sequenced and some of neurula-specifically expressed genes were obtained. SSH is an effective method for searching differentially expressed genes. The subtractive cDNA library we generated provides a tool for further study of regulatory mechanisms of amphioxus early embryonic development.

  15. Transmission Bottleneck Size Estimation from Pathogen Deep-Sequencing Data, with an Application to Human Influenza A Virus.

    Science.gov (United States)

    Sobel Leonard, Ashley; Weissman, Daniel B; Greenbaum, Benjamin; Ghedin, Elodie; Koelle, Katia

    2017-07-15

    The bottleneck governing infectious disease transmission describes the size of the pathogen population transferred from the donor to the recipient host. Accurate quantification of the bottleneck size is particularly important for rapidly evolving pathogens such as influenza virus, as narrow bottlenecks reduce the amount of transferred viral genetic diversity and, thus, may decrease the rate of viral adaptation. Previous studies have estimated bottleneck sizes governing viral transmission by using statistical analyses of variants identified in pathogen sequencing data. These analyses, however, did not account for variant calling thresholds and stochastic viral replication dynamics within recipient hosts. Because these factors can skew bottleneck size estimates, we introduce a new method for inferring bottleneck sizes that accounts for these factors. Through the use of a simulated data set, we first show that our method, based on beta-binomial sampling, accurately recovers transmission bottleneck sizes, whereas other methods fail to do so. We then apply our method to a data set of influenza A virus (IAV) infections for which viral deep-sequencing data from transmission pairs are available. We find that the IAV transmission bottleneck size estimates in this study are highly variable across transmission pairs, while the mean bottleneck size of 196 virions is consistent with a previous estimate for this data set. Furthermore, regression analysis shows a positive association between estimated bottleneck size and donor infection severity, as measured by temperature. These results support findings from experimental transmission studies showing that bottleneck sizes across transmission events can be variable and influenced in part by epidemiological factors. IMPORTANCE The transmission bottleneck size describes the size of the pathogen population transferred from the donor to the recipient host and may affect the rate of pathogen adaptation within host populations. Recent

  16. Cloning and functional expression of a human pancreatic islet glucose-transporter cDNA

    International Nuclear Information System (INIS)

    Permutt, M.A.; Koranyi, L.; Keller, K.; Lacy, P.E.; Scharp, D.W.; Mueckler, M.

    1989-01-01

    Previous studies have suggested that pancreatic islet glucose transport is mediated by a high-K m , low-affinity facilitated transporter similar to that expressed in liver. To determine the relationship between islet and liver glucose transporters, liver-type glucose-transporter cDNA clones were isolated from a human liver cDNA library. The liver-type glucose-transporter cDNA clone hybridized to mRNA transcripts of the same size in human liver and pancreatic islet RNA. A cDNA library was prepared from purified human pancreatic islet tissue and screened with human liver-type glucose-transporter cDNA. The authors isolated two overlapping cDNA clones encompassing 2600 base pairs, which encode a pancreatic islet protein identical in sequence to that of the putative liver-type glucose-transporter protein. Xenopus oocytes injected with synthetic mRNA transcribed from a full-length cDNA construct exhibited increased uptake of 2-deoxyglucose, confirming the functional identity of the clone. These cDNA clones can now be used to study regulation of expression of the gene and to assess the role of inherited defects in this gene as a candidate for inherited susceptibility to non-insulin-dependent diabetes mellitus

  17. [cDNA library constructing and specific antigen expression of Streptomyces thermohydroscopicus].

    Science.gov (United States)

    Xu, Lei; Wang, Ling-ling; Liu, Shuo; Ling, Yuan; Ma, Lie; Wang, Qun; Zhang, Li-jiao; He, Xiao-yu; Zhao, Ming-jing; Wang, Xiao-ge

    2012-03-01

    To construct a cDNA library from Streptomyces thermohydroscopicus and screen genes with virulence, obtain the recombinant fusion virulence proteins by prokaryotic expression system. The Streptomyces thermohydroscopicus cDNA library was constructed by switching mechanism at 5'end of RNA transcript approach. A total of 1020 clones randomly selected from the cDNA library were sequenced and these expressed sequence tags (EST) were further analyzed for the screen of antigen-specific genes. The two candidate genes were subcloned into expression vector pET-28a. The recombinants were transformed into BL2 and proteins were expressed by the induction of isopropyl-β-D-1-thiogalactopyranoside (IPTG). A high-quality cDNA library from Streptomyces thermohydroscopicus was constructed and a set of 978 valid sequences were obtained. Clustering and assembly of these cDNA sequences resulted in 347 unique genes, among which 2 potential antigen-specific genes were highly allied with outer membrane lipoprotein (51%) and transferring-binding protein B (42%) from Actinobacillus pleuropneumoniae serotype (APP). The open reading frame (ORF) of the two candidate genes are 1554 bp and 726 bp, which coded two peptides with 517 and 241 amino acids, respectively. The molecular weights of the recombinant fusion proteins were 63 000 and 30 000. The cDNA library of Streptomyces thermohydroscopicus reached the quality requirement of gene library. EST database in the library would greatly facilitate further screening of virulence genes.

  18. Deep sequencing of the T-cell receptor repertoire in CD8+ T-large granular lymphocyte leukemia identifies signature landscapes.

    Science.gov (United States)

    Clemente, Michael J; Przychodzen, Bartlomiej; Jerez, Andres; Dienes, Brittney E; Afable, Manuel G; Husseinzadeh, Holleh; Rajala, Hanna L M; Wlodarski, Marcin W; Mustjoki, Satu; Maciejewski, Jaroslaw P

    2013-12-12

    New massively parallel sequencing technology enables, through deep sequencing of rearranged T-cell receptor (TCR) Vβ complementarity-determining region 3 (CDR3) regions, a previously inaccessible level of TCR repertoire analysis. The CDR3 repertoire diversity reflects clonal composition, the potential antigenic recognition spectrum, and the quantity of available T-cell responses. In this context, T-large granular lymphocyte (T-LGL) leukemia is a chronic clonal lymphoproliferation of cytotoxic T cells often associated with autoimmune diseases and various cytopenias. Using CD8(+) T-LGL leukemia as a model disease, we set out to evaluate and compare the TCR deep-sequencing spectra of both patients and healthy controls to better understand how TCR deep sequencing could be used in the diagnosis and monitoring of not only T-LGL leukemia but also reactive processes such as autoimmune disease and infection. Our data demonstrate, with high resolution, significantly decreased diversity of the T-cell repertoire in CD8(+) T-LGL leukemia and suggest that many T-LGL clonotypes may be private to the disease and may not be present in the general public, even at the basal level.

  19. Deep sequencing of HPV E6/E7 genes reveals loss of genotypic diversity and gain of clonal dominance in high-grade intraepithelial lesions of the cervix.

    Science.gov (United States)

    Shen-Gunther, Jane; Wang, Yufeng; Lai, Zhao; Poage, Graham M; Perez, Luis; Huang, Tim H M

    2017-03-14

    Human papillomavirus (HPV) is the carcinogen of almost all invasive cervical cancer and a major cause of oral and other anogenital malignancies. HPV genotyping by dideoxy (Sanger) sequencing is currently the reference method of choice for clinical diagnostics. However, for samples with multiple HPV infections, genotype identification is singular and occasionally imprecise or indeterminable due to overlapping chromatograms. Our aim was to explore and compare HPV metagenomes in abnormal cervical cytology by deep sequencing for correlation with disease states. Low- and high-grade intraepithelial lesion (LSIL and HSIL) cytology samples were DNA extracted for PCR-amplification of the HPV E6/E7 genes. HPV+ samples were sequenced by dideoxy and deep methods. Deep sequencing revealed ~60% of all samples (n = 72) were multi-HPV infected. Among LSIL samples (n = 43), 27 different genotypes were found. The 3 dominant (most abundant) genotypes were: HPV-39, 11/43 (26%); -16, 9/43 (21%); and -35, 4/43 (9%). Among HSIL (n = 29), 17 HPV genotypes were identified; the 3 dominant genotypes were: HPV-16, 21/29 (72%); -35, 4/29 (14%); and -39, 3/29 (10%). Phylogenetically, type-specific E6/E7 genetic distances correlated with carcinogenic potential. Species diversity analysis between LSIL and HSIL revealed loss of HPV diversity and domination by HPV-16 in HSIL samples. Deep sequencing resolves HPV genotype composition within multi-infected cervical cytology. Biodiversity analysis reveals loss of diversity and gain of dominance by carcinogenic genotypes in high-grade cytology. Metagenomic profiles may therefore serve as a biomarker of disease severity and a population surveillance tool for emerging genotypes.

  20. Generation of full-length cDNA libraries: focus on plants.

    Science.gov (United States)

    Seki, Motoaki; Kamiya, Asako; Carninci, Piero; Hayashizaki, Yoshihide; Shinozaki, Kazuo

    2009-01-01

    Full-length cDNAs are essential for the correct annotation of transcriptional units and gene products from genomic sequence data and for functional analysis of the genes. Full-length cDNA libraries are very important resources for isolation of the full-length cDNAs. The biotinylated cap trapper method using the trehalose-thermostabilized reverse transcriptase has been developed and has become an efficient method for construction of high-content full-length cDNA libraries. We have constructed full-length cDNA libraries from various plants and animals using this method. The protocol of the method is described in this chapter.

  1. High-throughput deep sequencing reveals that microRNAs play important roles in salt tolerance of euhalophyte Salicornia europaea.

    Science.gov (United States)

    Feng, Juanjuan; Wang, Jinhui; Fan, Pengxiang; Jia, Weitao; Nie, Lingling; Jiang, Ping; Chen, Xianyang; Lv, Sulian; Wan, Lichuan; Chang, Sandra; Li, Shizhong; Li, Yinxin

    2015-02-26

    microRNAs (miRNAs) are implicated in plant development processes and play pivotal roles in plant adaptation to environmental stresses. Salicornia europaea, a salt mash euhalophyte, is a suitable model plant to study salt adaptation mechanisms. S. europaea is also a vegetable, forage, and oilseed that can be used for saline land reclamation and biofuel precursor production on marginal lands. Despite its importance, no miRNA has been identified from S. europaea thus far. Deep sequencing was performed to investigate small RNA transcriptome of S. europaea. Two hundred and ten conserved miRNAs comprising 51 families and 31 novel miRNAs (including seven miRNA star sequences) belonging to 30 families were identified. About half (13 out of 31) of the novel miRNAs were only detected in salt-treated samples. The expression of 43 conserved and 13 novel miRNAs significantly changed in response to salinity. In addition, 53 conserved and 13 novel miRNAs were differentially expressed between the shoots and roots. Furthermore, 306 and 195 S. europaea unigenes were predicted to be targets of 41 conserved and 29 novel miRNA families, respectively. These targets encoded a wide range of proteins, and genes involved in transcription regulation constituted the largest category. Four of these genes encoding laccase, F-box family protein, SAC3/GANP family protein, and NADPH cytochrome P-450 reductase were validated using 5'-RACE. Our results indicate that specific miRNAs are tightly regulated by salinity in the shoots and/or roots of S. europaea, which may play important roles in salt tolerance of this euhalophyte. The S. europaea salt-responsive miRNAs and miRNAs that target transcription factors, nucleotide binding site-leucine-rich repeat proteins and enzymes involved in lignin biosynthesis as well as carbon and nitrogen metabolism may be applied in genetic engineering of crops with high stress tolerance, and genetic modification of biofuel crops with high biomass and regulatable

  2. Cloning of cDNA encoding steroid 11β-hydroxylase (P450c11)

    International Nuclear Information System (INIS)

    Chua, S.C.; Szabo, P.; Vitek, A.; Grzeschik, K.H.; John, M.; White, P.C.

    1987-01-01

    The authors have isolated bovine and human adrenal cDNA clones encoding the adrenal cytochrome P-450 specific for 11β-hydroxylation (P450c11). A bovine adrenal cDNA library constructed in the bacteriophage λ vector gt10 was probed with a previously isolated cDNA clone corresponding to part of the 3' untranslated region of the 4.2-kilobase (kb) mRNA encoding P450c11. Several clones with 3.2-kb cDNA inserts were isolated. Sequence analysis showed that they overlapped the original probe by 300 base pairs (bp). Combined cDNA and RNA sequence data demonstrated a continuous open reading frame of 1509 bases. P450c11 is predicted to contain 479 amino acid residues in the mature protein in addition to a 24-residue amino-terminal mitochondrial signal sequence. A bovine clone was used to isolate a homologous clone with a 3.5-kb insert from a human adrenal cDNA library. A region of 1100 bp was 81% homologous to 769 bp of the coding sequence of the bovine cDNA except for a 400-bp segment presumed to be an unprocessed intron. Hybridization of the human cDNA to DNA from a panel of human-rodent somatic cell hybrid lines and in situ hybridization to metaphase spreads of human chromosomes localized the gene to the middle of the long arm of chromosome 8. These data should be useful in developing reagents for heterozygote detection and prenatal diagnosis of 11β-hydroxylase deficiency, the second most frequent cause of congenital adrenal hyperplasia

  3. A deep sequencing approach to comparatively analyze the transcriptome of lifecycle stages of the filarial worm, Brugia malayi.

    Directory of Open Access Journals (Sweden)

    Young-Jun Choi

    2011-12-01

    Full Text Available Developing intervention strategies for the control of parasitic nematodes continues to be a significant challenge. Genomic and post-genomic approaches play an increasingly important role for providing fundamental molecular information about these parasites, thus enhancing basic as well as translational research. Here we report a comprehensive genome-wide survey of the developmental transcriptome of the human filarial parasite Brugia malayi.Using deep sequencing, we profiled the transcriptome of eggs and embryos, immature (≤3 days of age and mature microfilariae (MF, third- and fourth-stage larvae (L3 and L4, and adult male and female worms. Comparative analysis across these stages provided a detailed overview of the molecular repertoires that define and differentiate distinct lifecycle stages of the parasite. Genome-wide assessment of the overall transcriptional variability indicated that the cuticle collagen family and those implicated in molting exhibit noticeably dynamic stage-dependent patterns. Of particular interest was the identification of genes displaying sex-biased or germline-enriched profiles due to their potential involvement in reproductive processes. The study also revealed discrete transcriptional changes during larval development, namely those accompanying the maturation of MF and the L3 to L4 transition that are vital in establishing successful infection in mosquito vectors and vertebrate hosts, respectively.Characterization of the transcriptional program of the parasite's lifecycle is an important step toward understanding the developmental processes required for the infectious cycle. We find that the transcriptional program has a number of stage-specific pathways activated during worm development. In addition to advancing our understanding of transcriptome dynamics, these data will aid in the study of genome structure and organization by facilitating the identification of novel transcribed elements and splice variants.

  4. Characterization of small interfering RNAs derived from Sugarcane mosaic virus in infected maize plants by deep sequencing.

    Science.gov (United States)

    Xia, Zihao; Peng, Jun; Li, Yongqiang; Chen, Ling; Li, Shuai; Zhou, Tao; Fan, Zaifeng

    2014-01-01

    RNA silencing is a conserved surveillance mechanism against viruses in plants. It is mediated by Dicer-like (DCL) proteins producing small interfering RNAs (siRNAs), which guide specific Argonaute (AGO)-containing complexes to inactivate viral genomes and may promote the silencing of host mRNAs. In this study, we obtained the profile of virus-derived siRNAs (vsiRNAs) from Sugarcane mosaic virus (SCMV) in infected maize (Zea mays L.) plants by deep sequencing. Our data showed that vsiRNAs which derived almost equally from sense and antisense SCMV RNA strands accumulated preferentially as 21- and 22-nucleotide (nt) species and had an adenosine bias at the 5'-terminus. The single-nucleotide resolution maps revealed that vsiRNAs were almost continuously but heterogeneously distributed throughout the SCMV genome and the hotspots of sense and antisense strands were mainly distributed in the HC-Pro coding region. Moreover, dozens of host transcripts targeted by vsiRNAs were predicted, several of which encode putative proteins involved in ribosome biogenesis and in biotic and abiotic stresses. We also found that ZmDCL2 mRNAs were up-regulated in SCMV-infected maize plants, which may be the cause of abundant 22-nt vsiRNAs production. However, ZmDCL4 mRNAs were down-regulated slightly regardless of the most abundant 21-nt vsiRNAs. Our results also showed that SCMV infection induced the accumulation of AGO2 mRNAs, which may indicate a role for AGO2 in antiviral defense. To our knowledge, this is the first report on vsiRNAs in maize plants.

  5. Characterization of small interfering RNAs derived from Sugarcane mosaic virus in infected maize plants by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Zihao Xia

    Full Text Available RNA silencing is a conserved surveillance mechanism against viruses in plants. It is mediated by Dicer-like (DCL proteins producing small interfering RNAs (siRNAs, which guide specific Argonaute (AGO-containing complexes to inactivate viral genomes and may promote the silencing of host mRNAs. In this study, we obtained the profile of virus-derived siRNAs (vsiRNAs from Sugarcane mosaic virus (SCMV in infected maize (Zea mays L. plants by deep sequencing. Our data showed that vsiRNAs which derived almost equally from sense and antisense SCMV RNA strands accumulated preferentially as 21- and 22-nucleotide (nt species and had an adenosine bias at the 5'-terminus. The single-nucleotide resolution maps revealed that vsiRNAs were almost continuously but heterogeneously distributed throughout the SCMV genome and the hotspots of sense and antisense strands were mainly distributed in the HC-Pro coding region. Moreover, dozens of host transcripts targeted by vsiRNAs were predicted, several of which encode putative proteins involved in ribosome biogenesis and in biotic and abiotic stresses. We also found that ZmDCL2 mRNAs were up-regulated in SCMV-infected maize plants, which may be the cause of abundant 22-nt vsiRNAs production. However, ZmDCL4 mRNAs were down-regulated slightly regardless of the most abundant 21-nt vsiRNAs. Our results also showed that SCMV infection induced the accumulation of AGO2 mRNAs, which may indicate a role for AGO2 in antiviral defense. To our knowledge, this is the first report on vsiRNAs in maize plants.

  6. Characterization of small interfering RNAs derived from Rice black streaked dwarf virus in infected maize plants by deep sequencing.

    Science.gov (United States)

    Li, Mingjun; Li, Yongqiang; Xia, Zihao; Di, Dianping; Zhang, Aihong; Miao, Hongqin; Zhou, Tao; Fan, Zaifeng

    2017-01-15

    Rice black streaked dwarf virus (RBSDV) is the casual agent of maize rough dwarf disease, which frequently causes severe yield loss in China. However, the interaction between RBSDV and maize plants is largely unknown. RNA silencing is a conserved mechanism against viruses in plants. To understand the antiviral RNA interfering response in RBSDV-infected plants, the profile of virus-derived small interfering RNAs (vsiRNAs) from RBSDV in infected maize plants was obtained by deep sequencing in this study. Our data showed that vsiRNAs, accumulated preferentially as 21- and 22-nucleotide (nt) species, were mapped against all 10 genomic RNA segments of RBSDV and derived almost equally overall from both positive and negative strands, while there were significant differences in the accumulation level of vsiRNAs from segments 2, 4, 6, 7 and 10. The vsiRNAs (21 and 22 nt) generated from each segment of RBSDV genome had a 5'-terminal nucleotide bias toward adenine and uracil. The single-nucleotide resolution maps showed that RBSDV-derived siRNAs preferentially distributed in the 5'- or 3'-terminal regions of several genomic segments. In addition, our results showed that the mRNA levels of some components involved in antiviral RNA silencing pathway were differentially modified during RBSDV infection. Among them, the accumulation levels of ZmDCL1, ZmDCL2, ZmDCL3a, ZmAGO1a, ZmAGO1b, ZmAGO2a, ZmAGO18a and ZmRDR6 mRNAs were significantly up-regulated, while those of ZmDCL3b, ZmDCL4 and ZmAGO1c mRNAs showed no obvious changes in RBSDV-infected maize plants. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Deep Sequencing of Suppression Subtractive Hybridisation Drought and Recovery Libraries of the Non-model Crop Trifolium repens L.

    Science.gov (United States)

    Bisaga, Maciej; Lowe, Matthew; Hegarty, Matthew; Abberton, Michael; Ravagnani, Adriana

    2017-01-01

    White clover is a short-lived perennial whose persistence is greatly affected by abiotic stresses, particularly drought. The aim of this work was to characterize its molecular response to water deficit and recovery following re-hydration to identify targets for the breeding of tolerant varieties. We created a white clover reference transcriptome of 16,193 contigs by deep sequencing (mean base coverage 387x) four Suppression Subtractive Hybridization (SSH) libraries (a forward and a reverse library for each treatment) constructed from young leaf tissue of white clover at the onset of the response to drought and recovery. Reads from individual libraries were then mapped to the reference transcriptome and processed comparing expression level data. The pipeline generated four robust sets of transcripts induced and repressed in the leaves of plants subjected to water deficit stress (6,937 and 3,142, respectively) and following re-hydration (6,695 and 4,897, respectively). Semi-quantitative polymerase chain reaction was used to verify the expression pattern of 16 genes. The differentially expressed transcripts were functionally annotated and mapped to biological processes and pathways. In agreement with similar studies in other crops, the majority of transcripts up-regulated in response to drought belonged to metabolic processes, such as amino acid, carbohydrate, and lipid metabolism, while transcripts involved in photosynthesis, such as components of the photosystem and the biosynthesis of photosynthetic pigments, were up-regulated during recovery. The data also highlighted the role of raffinose family oligosaccharides (RFOs) and the possible delayed response of the flavonoid pathways in the initial response of white clover to water withdrawal. The work presented in this paper is to our knowledge the first large scale molecular analysis of the white clover response to drought stress and re-hydration. The data generated provide a valuable genomic resource for marker

  8. MicroRNA deep sequencing reveals chamber-specific miR-208 family expression patterns in the human heart.

    Science.gov (United States)

    Kakimoto, Yu; Tanaka, Masayuki; Kamiguchi, Hiroshi; Hayashi, Hideki; Ochiai, Eriko; Osawa, Motoki

    2016-05-15

    Heart chamber-specific mRNA expression patterns have been extensively studied, and dynamic changes have been reported in many cardiovascular diseases. MicroRNAs (miRNAs) are also important regulators of normal cardiac development and functions that generally suppress gene expression at the posttranscriptional level. Recent focus has been placed on circulating miRNAs as potential biomarkers for cardiac disorders. However, miRNA expression levels in human normal hearts have not been thoroughly studied, and chamber-specific miRNA expression signatures in particular remain unclear. We performed miRNA deep sequencing on human paired left atria (LA) and ventricles (LV) under normal physiologic conditions. Among 438 miRNAs, miR-1 was the most abundant in both chambers, representing 21% of the miRNAs in LA and 26% in LV. A total of 25 miRNAs were differentially expressed between LA and LV; 14 were upregulated in LA, and 11 were highly expressed in LV. Notably, the miR-208 family in particular showed prominent chamber specificity; miR-208a-3p and miR-208a-5p were abundant in LA, whereas miR-208b-3p and miR-208b-5p were preferentially expressed in LV. Subsequent real-time polymerase chain reaction analysis validated the predominant expression of miR-208a in LA and miR-208b in LV. Human atrial and ventricular tissues display characteristic miRNA expression signatures under physiological conditions. Notably, miR-208a and miR-208b show significant chamber-specificity as do their host genes, α-MHC and β-MHC, which are mainly expressed in the atria and ventricles, respectively. These findings might also serve to enhance our understanding of cardiac miRNAs and various heart diseases. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. RNA deep sequencing reveals novel candidate genes and polymorphisms in boar testis and liver tissues with divergent androstenone levels.

    Directory of Open Access Journals (Sweden)

    Asep Gunawan

    Full Text Available Boar taint is an unpleasant smell and taste of pork meat derived from some entire male pigs. The main causes of boar taint are the two compounds androstenone (5α-androst-16-en-3-one and skatole (3-methylindole. It is crucial to understand the genetic mechanism of boar taint to select pigs for lower androstenone levels and thus reduce boar taint. The aim of the present study was to investigate transcriptome differences in boar testis and liver tissues with divergent androstenone levels using RNA deep sequencing (RNA-Seq. The total number of reads produced for each testis and liver sample ranged from 13,221,550 to 33,206,723 and 12,755,487 to 46,050,468, respectively. In testis samples 46 genes were differentially regulated whereas 25 genes showed differential expression in the liver. The fold change values ranged from -4.68 to 2.90 in testis samples and -2.86 to 3.89 in liver samples. Differentially regulated genes in high androstenone testis and liver samples were enriched in metabolic processes such as lipid metabolism, small molecule biochemistry and molecular transport. This study provides evidence for transcriptome profile and gene polymorphisms of boars with divergent androstenone level using RNA-Seq technology. Digital gene expression analysis identified candidate genes in flavin monooxygenease family, cytochrome P450 family and hydroxysteroid dehydrogenase family. Moreover, polymorphism and association analysis revealed mutation in IRG6, MX1, IFIT2, CYP7A1, FMO5 and KRT18 genes could be potential candidate markers for androstenone levels in boars. Further studies are required for proving the role of candidate genes to be used in genomic selection against boar taint in pig breeding programs.

  10. Sensitive Deep-Sequencing-Based HIV-1 Genotyping Assay To Simultaneously Determine Susceptibility to Protease, Reverse Transcriptase, Integrase, and Maturation Inhibitors, as Well as HIV-1 Coreceptor Tropism

    Science.gov (United States)

    Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.

    2014-01-01

    With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new

  11. Deep sequencing reveals different compositions of mRNA transcribed from the F8 gene in a panel of FVIII-producing CHO cell lines

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Bolt, Gert; Hansen, Jens J

    2015-01-01

    productivities was selected for RNA sequencing analysis. The analysis showed distinct differences in F8 RNA composition between the clones. The exogenous F8-dhfr transcript was found to make up the most abundant transcript in the present clones. No correlation was seen between F8 mRNA levels and the measured...... FVIII productivity. It was found that three MTX resistant, nonproducing clones had different truncations of the F8 transcripts. We find that by using deep sequencing, in contrast to microarray technology, for determining the transcriptome from CHO transfectants, we are able to accurately deduce...

  12. [cDNA libraries construction and screening in gene expression profiling of disease resistance in wheat].

    Science.gov (United States)

    Luo, Meng; Kong, Xiu-Ying; Liu, Yue; Zhou, Rong-Hua; Jia, Ji-Zeng

    2002-09-01

    A wheat line, Bai Nong 3217/Mardler BC5F4 with resistance to powdery mildew, was used to construct a conventional cDNA library and a suppression subtractive hybridization (SSH) cDNA library from wheat leaves inoculated by Erysiphe graminis DC. Three hundred and eighty-seven non-redundant ESTs from the conventional cDNA library and 760 ESTs from the SSH cDNA library were obtained, and the ESTs similarity analysis using BLASTn and BLASTx were conducted by comparing these ESTs with sequences in GenBank. The results showed that the redundancy of some kinds of genes such as photosynthesis related genes and ribosome related genes was higher in the conventional cDNA library but the varieties and quantities of disease resistance genes were less than those in the SSH cDNA library. The SSH cDNA library was found to have obvious advantages in gene expression profiling of disease resistance such as simple library construction procedure, rich specific DRR (disease-resistance-related) genes and decreased sequencing amount. To acquire genes that were involved in the powdery mildew resistance of wheat, hybridization with high-density dots membranes was used to screen the two libraries. The result showed that the method was relatively simple in operation, and the membranes could be used for many times. But some problems also existed with this screening method. For instance, a large amount of mRNA and radioactive isotope were needed and the hybridization procedure must be repeated several times to obtain stable hybridization results. About 54.1% function-known ESTs in the SSH cDNA library were identified to be DRR genes by screening. There were 247 clones of the SSH cDNA library that had positive signal in the repeated hybridizations with the pathogen uninfected probe. The identified DRR genes distributed in the whole procedure of powdery mildew resistance, but mainly focused on the SAR (systemic of acquired resistance).

  13. Comparative clinical sample preparation of DNA and RNA viral nucleic acids for a commercial deep sequencing system (Illumina MiSeq(®)).

    Science.gov (United States)

    Ullmann, Leila Sabrina; de Camargo Tozato, Claudia; Malossi, Camila Dantas; da Cruz, Tais Fukuta; Cavalcante, Raíssa Vasconcelos; Kurissio, Jacqueline Kazue; Cagnini, Didier Quevedo; Rodrigues, Marianna Vaz; Biondo, Alexander Welker; Araujo, João Pessoa

    2015-08-01

    Sequence-independent methods for viral discovery have been widely used for whole genome sequencing of viruses. Different protocols for viral enrichment, library preparation and sequencing have increasingly been more available and at lower costs. However, no study to date has focused on optimization of viral sample preparation for commercial deep sequencing. Accordingly, the aim of the present study was to evaluate an In-House enzymatic protocol for double-stranded DNA (dsDNA) synthesis and also compare the use of a commercially available kit protocol (Nextera XT, Illumina Inc, San Diego, CA, USA) and its combination with a library quantitation kit (Kapa, Kapa Biosystems, Wilmington, MA, USA) for deep sequencing (Illumina Miseq). Two RNA viruses (canine distemper virus and dengue virus) and one ssDNA virus (porcine circovirus type 2) were tested with the optimized protocols. The tested method for dsDNA synthesis has shown satisfactory results and may be used in laboratory setting, particularly when enzymes are already available. Library preparation combining commercial kits (Nextera XT and Kapa) has yielded more reads and genome coverage, probably due to a lack of small fragment recovering at the normalization step of Nextera XT. In addition, libraries may be diluted or concentrated to provide increase on genome coverage with Kapa quantitation. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Construction of cDNA library of Pyrocystis lunula (Pyrophyta)

    Science.gov (United States)

    Sui, Zhenghong; Kowallik, Klaus V.

    2004-10-01

    Complementary DNA library of a dinoflagellate Pyrocystis lunula was constructed for the purpose of expression sequence tags analysis. The RNA isolated from this alga was about 20µgg-1 net cells, and the band intensity ratio of 28S/18 S in electrophoresis pattern was nearly 1 to 1. Different cDNA/vector molar ratios were exploited in the ligating reaction to be optimized. The clones produced by cDNA/vector molar ratio of 3.75 to 1 were desirable, most of whose inserts were longer than 300 bp. The recombinants insert length of the unfractionation cDNA library was largely shorter than 500 bp. However, in the fractionation library made from high molecule weight cDNA parts, over seventy percent of the recombinants contained inserts longer than 1 kb, some of which were even longer than 3 kb. Operating concerns were discussed at the end.

  15. Deep sequencing analysis of the heterogeneity of seed and commercial lots of the bacillus Calmette-Guérin (BCG) tuberculosis vaccine substrain Tokyo-172

    Science.gov (United States)

    Wada, Takayuki; Maruyama, Fumito; Iwamoto, Tomotada; Maeda, Shinji; Yamamoto, Taro; Nakagawa, Ichiro; Yamamoto, Saburo; Ohara, Naoya

    2015-01-01

    BCG, only vaccine available to prevent tuberculosis, was established in the early 20th century by prolonged passaging of a virulent clinical strain of Mycobacterium bovis. BCG Tokyo-172, originally distributed within Japan in 1924, is one of the currently used reference substrains for the vaccine. Recently, this substrain was reported to contain two spontaneously arising, heterogeneous subpopulations (Types I and II). The proportions of the subpopulations changed over time in both distributed seed lots and commercial lots. To maintain the homogeneity of live vaccines, such variations and subpopulational mutations in lots should be restrained and monitored. We incorporated deep sequencing techniques to validate such heterogeneity in lots of the BCG Tokyo-172 substrain without cloning. By bioinformatics analysis, we not only detected the two subpopulations but also detected two intrinsic variations within these populations. The intrinsic variants could be isolated from respective lots as colonies cultured on plate media, suggesting analyses incorporating deep sequencing techniques are powerful, valid tools to detect mutations in live bacterial vaccine lots. Our data showed that spontaneous mutations in BCG vaccines could be easily monitored by deep sequencing without direct isolation of variants, revealing the complex heterogeneity of BCG Tokyo-172 and its daughter lots currently in use. PMID:26635118

  16. An efficient strategy of screening for pathogens in wild-caught ticks and mosquitoes by reusing small RNA deep sequencing data.

    Directory of Open Access Journals (Sweden)

    Lu Zhuang

    Full Text Available This paper explored our hypothesis that sRNA (18 ∼ 30 bp deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases.

  17. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data.

    Science.gov (United States)

    Zheng, Ling-Ling; Li, Jun-Hao; Wu, Jie; Sun, Wen-Ju; Liu, Shun; Wang, Ze-Lin; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2016-01-04

    Small non-coding RNAs (e.g. miRNAs) and long non-coding RNAs (e.g. lincRNAs and circRNAs) are emerging as key regulators of various cellular processes. However, only a very small fraction of these enigmatic RNAs have been well functionally characterized. In this study, we describe deepBase v2.0 (http://biocenter.sysu.edu.cn/deepBase/), an updated platform, to decode evolution, expression patterns and functions of diverse ncRNAs across 19 species. deepBase v2.0 has been updated to provide the most comprehensive collection of ncRNA-derived small RNAs generated from 588 sRNA-Seq datasets. Moreover, we developed a pipeline named lncSeeker to identify 176 680 high-confidence lncRNAs from 14 species. Temporal and spatial expression patterns of various ncRNAs were profiled. We identified approximately 24 280 primate-specific, 5193 rodent-specific lncRNAs, and 55 highly conserved lncRNA orthologs between human and zebrafish. We annotated 14 867 human circRNAs, 1260 of which are orthologous to mouse circRNAs. By combining expression profiles and functional genomic annotations, we developed lncFunction web-server to predict the function of lncRNAs based on protein-lncRNA co-expression networks. This study is expected to provide considerable resources to facilitate future experimental studies and to uncover ncRNA functions. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

    NARCIS (Netherlands)

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to

  19. Molecular indexing enables quantitative targeted RNA sequencing and reveals poor efficiencies in standard library preparations.

    Science.gov (United States)

    Fu, Glenn K; Xu, Weihong; Wilhelmy, Julie; Mindrinos, Michael N; Davis, Ronald W; Xiao, Wenzhong; Fodor, Stephen P A

    2014-02-04

    We present a simple molecular indexing method for quantitative targeted RNA sequencing, in which mRNAs of interest are selectively captured from complex cDNA libraries and sequenced to determine their absolute concentrations. cDNA fragments are individually labeled so that each molecule can be tracked from the original sample through the library preparation and sequencing process. Multiple copies of cDNA fragments of identical sequence become distinct through labeling, and replicate clones created during PCR amplification steps can be identified and assigned to their distinct parent molecules. Selective capture enables efficient use of sequencing for deep sampling and for the absolute quantitation of rare or transient transcripts that would otherwise escape detection by standard sequencing methods. We have also constructed a set of synthetic barcoded RNA molecules, which can be introduced as controls into the sample preparation mix and used to monitor the efficiency of library construction. The quantitative targeted sequencing revealed extremely low efficiency in standard library preparations, which were further confirmed by using synthetic barcoded RNA molecules. This finding shows that standard library preparation methods result in the loss of rare transcripts and highlights the need for monitoring library efficiency and for developing more efficient sample preparation methods.

  20. Construction of Infectious cDNA Clone of a Chrysanthemum stunt viroid Korean Isolate

    Science.gov (United States)

    Yoon, Ju-Yeon; Cho, In-Sook; Choi, Gug-Seoun; Choi, Seung-Kook

    2014-01-01

    Chrysanthemum stunt viroid (CSVd), a noncoding infectious RNA molecule, causes seriously economic losses of chrysanthemum for 3 or 4 years after its first infection. Monomeric cDNA clones of CSVd isolate SK1 (CSVd-SK1) were constructed in the plasmids pGEM-T easy vector and pUC19 vector. Linear positive-sense transcripts synthesized in vitro from the full-length monomeric cDNA clones of CSVd-SK1 could infect systemically tomato seedlings and chrysanthemum plants, suggesting that the linear CSVd RNA transcribed from the cDNA clones could be replicated as efficiently as circular CSVd in host species. However, direct inoculation of plasmid cDNA clones containing full-length monomeric cDNA of CSVd-SK1 failed to infect tomato and chrysanthemum and linear negative-sense transcripts from the plasmid DNAs were not infectious in the two plant species. The cDNA sequences of progeny viroid in systemically infected tomato and chrysanthemum showed a few substitutions at a specific nucleotide position, but there were no deletions and insertions in the sequences of the CSVd progeny from tomato and chrysanthemum plants. PMID:25288987

  1. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    Energy Technology Data Exchange (ETDEWEB)

    Copeland, A [U.S. Department of Energy, Joint Genome Institute; Gu, Wei [U.S. Department of Energy, Joint Genome Institute; Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Pan, Chongle [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1 T was the first isolate within the phylum ThermusDeinococcus to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1 T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1(T)) from a deep-sea hydrothermal vent chimney.

    Science.gov (United States)

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J; Sikorski, Johannes; Göker, Markus; Detter, John C; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-03-19

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1(T) was the first isolate within the phylum "Thermus-Deinococcus" to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1(T) is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Construction and analysis of full-length and normalized cDNA libraries from citrus.

    Science.gov (United States)

    Marques, M Carmen; Perez-Amador, Miguel A

    2012-01-01

    We have developed an integrated method to generate a normalized cDNA collection enriched in full-length and rare transcripts from citrus, using different species and multiple tissues and developmental stages. Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. In this regard, the availability of full-length cDNA clones facilitates functional analysis of the corresponding genes enabling manipulation of their expression and the generation of a variety of tagged versions of the native protein. The development of full-length cDNA sequences has the power to improve the quality of genome annotation, as well as provide tools for functional characterization of genes.

  4. [Construction of subtractive cDNA libraries of the sporogony stage of Eimeria tenella by suppression subtractive hybridization].

    Science.gov (United States)

    Han, Hong-Yu; Lin, Jiao-Jiao; Zhao, Qi-Ping; Dong, Hui; Jiang, Lian-Lian; Wang, Xin; Han, Jing-Fang; Huang, Bing

    2007-11-01

    In order to clone and identify differentially expressed genes in the sporogony stage of Eimeria tenella, the cDNAs from unsporulated oocysts and sporulated oocysts of E. tenella were used as driver, respectively, the cDNAs from sporozoites of E. tenella was used tester, Two subtractive cDNA libraries of sporozoites were constructed by using the technique of suppression subtractive hybridization (SSH). the cDNAs from unsporulated oocysts was used driver, the cDNAs from sporulated ooceysts was used tester, one subtractive cDNA library of sporulated oocysts was constructed. PCR amplification revealed that the two subtractive cDNA libraries of sporozoites and one subtractive cDNA library of sporulated oocysts contained approximated 96%, 96% and 98% recombinant clones, respectively. Fifty positive clones were sequenced and analyzed in GenBank with Blast search from three subtractive cDNA libraries, respectively, thirteen unique sequences were found from the subtractive cDNA library of sporulated oocysts, eight ESTs shared significant identity with previously described. A total of forty unique sequences were obtained from the two subtractive cDNA libraries, nine ESTs shared significant identity with previously described, the other sequences represent novel genes of E. tenella with no significant homology to the proteins in Genbank. These results have provided the foundation for cloning new genes of E. tenella and further studying new approaches to control coccidiosis.

  5. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    Directory of Open Access Journals (Sweden)

    Ching-Ping Lin

    Full Text Available We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911. In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum, two monocots (Oryza sativa and Zea mays, two gymnosperms (Pinus taeda and Ginkgo biloba and one moss (Physcomitrella patens. Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants.

  6. Arthropod Phylogenetics in Light of Three Novel Millipede (Myriapoda: Diplopoda) Mitochondrial Genomes with Comments on the Appropriateness of Mitochondrial Genome Sequence Data for Inferring Deep Level Relationships

    Science.gov (United States)

    Brewer, Michael S.; Swafford, Lynn; Spruill, Chad L.; Bond, Jason E.

    2013-01-01

    Background Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. Results The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. Conclusions The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the

  7. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    Directory of Open Access Journals (Sweden)

    Michael S Brewer

    Full Text Available BACKGROUND: Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. RESULTS: The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly. As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. CONCLUSIONS: The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic

  8. Toward a cDNA map of the human genome

    Energy Technology Data Exchange (ETDEWEB)

    Korenberg, J.R.; Chen, X.N. [Cedars-Sinai Research Institute, Los Angeles, CA (United States); Adams, M.D.; Venter, J.C. [Institute for Genomic Research, Gaithersburg, MD (United States)

    1995-09-20

    Advances in the Human Genome Project are shaping the strategies for identifying the 50,000-100,000 human genes. High-resolution genetic maps of the human genome combined with sequencing herald an era of rapid regional definition of disease genes. However, only once their chromosomes band location is known will the systematic partial sequencing of thousands of random cDNA clones provide the reagents for the rapid assessment of the genes responsible for the inherited disorders. We now present an approach to the rapid determination of map position and therefore to the creation of a transcribed map of the human genome. Sensitive fluorescence in situ hybridization has been combined with high-resolution chromosome banding and random cDNA sequencing to 41 cDNAs with an average insert size of < 2 kb to single human chromosome bands. The results provide 15 new genes, with database and functional information, as candidates for human disease. These include the large extracellular single-related kinase (HUMERK), the ERK activator kinase (PRKMK1), a new member of the RAS oncogene family, protein phosphotase 2 regulatory subunit B alpha isoform (PPP2R2A), and a novel human gene with very high homology to a plant membrane transport family. Further, an analysis of expressed genes associated with pseudogenes showed that by using these techniques, it is possible to detect accurately the transcribed locus within a multigene or processed pseudogene family in most cases. These findings suggest that direct cDNA mapping using fluorescence in situ hybridization provides an accurate and rapid approach to the definition of a transcribed map of the human genome. This low-cost, high-resolution (205 Mb) mapping greatly enhances the speed with which these genes can be subsequently assigned to contigs. This assignment provides a necessary first step in understanding the relationship of the genes to both acquired and inherited human diseases. 16 refs., 1 fig., 3 tabs.

  9. Construction of cDNA libraries: focus on protists and fungi.

    Science.gov (United States)

    Rodríguez-Ezpeleta, Naiara; Teijeiro, Shona; Forget, Lise; Burger, Gertraud; Lang, B Franz

    2009-01-01

    Sequencing of cDNA libraries is an efficient and inexpensive approach to analyze the protein-coding portion of a genome. It is frequently used for surveying the genomes of poorly studied eukaryotes, and is particularly useful for species that are not easily amenable to genome sequencing, because they are nonaxenic and/or difficult to cultivate. In this chapter, we describe protocols that have been applied successfully to construct and normalize a variety of cDNA libraries from many different species of free-living protists and fungi, and that require only small quantities of cell material.

  10. Characterization of a cDNA encoding cottonseed catalase.

    Science.gov (United States)

    Ni, W; Turley, R B; Trelease, R N

    1990-06-21

    A 1.7 kb cDNA clone was isolated from our lambda gt11 library constructed from poly(A) RNA of 24-h-old cotyledons. The cDNA encodes a full-length catalase peptide (492 amino acid residues). The calculated molecular mass is 56,800, similar to that determined for purified enzyme (57,000 SDS-PAGE). Among higher plant catalases, this cotton catalase shows the highest amino acid sequence identity (85%) to the subunit of homotetrameric maize CAT 1, a developmental counterpart to the homotetrameric CAT A isoform of cotton seeds. Comparison of sequences from cotton, sweet potato, maize CAT 1, and yeast with bovine catalase revealed that the amino acid residues and regions that are involved in catalytic activity and/or required to maintain basic catalase structure, are highly conserved. The C-terminus region, which has the lowest nucleotide sequence identity between plant and mammalian catalases, does not terminate with a tripeptide, S-K/R/H-L, a putative targeting signal for peroxisomal proteins.

  11. Cloning and expression of full-length cDNA encoding human vitamin D receptor

    Energy Technology Data Exchange (ETDEWEB)

    Baker, A.R.; McDonnell, D.P.; Hughes, M.; Crisp, T.M.; Mangelsdorf, D.J.; Haussler, M.R.; Pike, J.W.; Shine, J.; O' Malley, B.W. (California Biotechnology Inc., Mountain View (USA))

    1988-05-01

    Complementary DNA clones encoding the human vitamin D receptor have been isolated from human intestine and T47D cell cDNA libraries. The nucleotide sequence of the 4605-base pair (bp) cDNA includes a noncoding leader sequence of 115 bp, a 1281-bp open reading frame, and 3209 bp of 3{prime} noncoding sequence. Two polyadenylylation signals, AATAAA, are present 25 and 70 bp upstream of the poly(A) tail, respectively. RNA blot hybridization indicates a single mRNA species of {approx} 4600 bp. Transfection of the cloned sequences into COS-1 cells results in the production of a single receptor species indistinguishable from the native receptor. Sequence comparisons demonstrate that the vitamin D receptor belongs to the steroid-receptor gene family and is closest in size and sequence to another member of this family, the thyroid hormone receptor.

  12. Construction and analysis of full-length cDNA library of Cryptosporidium parvum.

    Science.gov (United States)

    Yamagishi, Junya; Wakaguri, Hiroyuki; Sugano, Sumio; Kawano, Suguru; Fujisaki, Kozo; Sugimoto, Chihiro; Watanabe, Junichi; Suzuki, Yutaka; Kimata, Isao; Xuan, Xuenan

    2011-06-01

    A full-length cDNA library was constructed from the sporozoite of Cryptosporidium parvum. Normalized clones were subjected to Solexa shotgun sequencing, and then complete sequences for 1066 clones were reconfigured. Detailed analyses of the sequences revealed that 13.5% of the transcripts were spliced; the average and median 5' UTR lengths were 213.5 and 122 nucleotides, respectively. There were 148 inconsistencies out of 562 examined genes between the experimentally described cDNA sequence and the predicted sequence from its genome. In addition, we identified 118 sequences that had little homology against annotated genes of C. parvum as prospective candidates for addable genes. These observations should improve the reliability of C. parvum transcriptome and provide a versatile resource for further studies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  13. Prevalence of Hepatitis C Virus Subgenotypes 1a and 1b in Japanese Patients: Ultra-Deep Sequencing Analysis of HCV NS5B Genotype-Specific Region

    Science.gov (United States)

    Wu, Shuang; Kanda, Tatsuo; Nakamoto, Shingo; Jiang, Xia; Miyamura, Tatsuo; Nakatani, Sueli M.; Ono, Suzane Kioko; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2013-01-01

    Background Hepatitis C virus (HCV) subgenotypes 1a and 1b have different impacts on the treatment response to peginterferon plus ribavirin with direct-acting antivirals (DAAs) against patients infected with HCV genotype 1, as the emergence rates of resistance mutations are different between these two subgenotypes. In Japan, almost all of HCV genotype 1 belongs to subgenotype 1b. Methods and Findings To determine HCV subgenotype 1a or 1b in Japanese patients infected with HCV genotype 1, real-time PCR-based method and Sanger method were used for the HCV NS5B region. HCV subgenotypes were determined in 90% by real-time PCR-based method. We also analyzed the specific probe regions for HCV subgenotypes 1a and 1b using ultra-deep sequencing, and uncovered mutations that could not be revealed using direct-sequencing by Sanger method. We estimated the prevalence of HCV subgenotype 1a as 1.2-2.5% of HCV genotype 1 patients in Japan. Conclusions Although real-time PCR-based HCV subgenotyping method seems fair for differentiating HCV subgenotypes 1a and 1b, it may not be sufficient for clinical practice. Ultra-deep sequencing is useful for revealing the resistant strain(s) of HCV before DAA treatment as well as mixed infection with different genotypes or subgenotypes of HCV. PMID:24069214

  14. Integrative microRNA and mRNA deep-sequencing expression profiling in endemic Burkitt lymphoma.

    Science.gov (United States)

    Oduor, Cliff I; Kaymaz, Yasin; Chelimo, Kiprotich; Otieno, Juliana A; Ong'echa, John Michael; Moormann, Ann M; Bailey, Jeffrey A

    2017-11-13

    Burkitt lymphoma (BL) is characterized by overexpression of the c-myc oncogene, which in the vast majority of cases is a consequence of an IGH/MYC translocation. While myc is the seminal event, BL is a complex amalgam of genetic and epigenetic changes causing dysregulation of both coding and non-coding transcripts. Emerging evidence suggest that abnormal modulation of mRNA transcription via miRNAs might be a significant factor in lymphomagenesis. However, the alterations in these miRNAs and their correlations to their putative mRNA targets have not been extensively studied relative to normal germinal center (GC) B cells. Using more sensitive and specific transcriptome deep sequencing, we compared previously published small miRNA and long mRNA of a set of GC B cells and eBL tumors. MiRWalk2.0 was used to identify the validated target genes for the deregulated miRNAs, which would be important for understanding the regulatory networks associated with eBL development. We found 211 differentially expressed (DE) genes (79 upregulated and 132 downregulated) and 49 DE miRNAs (22 up-regulated and 27 down-regulated). Gene Set enrichment analysis identified the enrichment of a set of MYC regulated genes. Network propagation-based method and correlated miRNA-mRNA expression analysis identified dysregulated miRNAs, including miR-17~95 cluster members and their target genes, which have diverse oncogenic properties to be critical to eBL lymphomagenesis. Central to all these findings, we observed the downregulation of ATM and NLK genes, which represent important regulators in response to DNA damage in eBL tumor cells. These tumor suppressors were targeted by multiple upregulated miRNAs (miR-19b-3p, miR-26a-5p, miR-30b-5p, miR-92a-5p and miR-27b-3p) which could account for their aberrant expression in eBL. Combined loss of p53 induction and function due to miRNA-mediated regulation of ATM and NLK, together with the upregulation of TFAP4, may be a central role for human miRNAs in e

  15. Deep learning

    CERN Document Server

    Goodfellow, Ian; Courville, Aaron

    2016-01-01

    Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language proces...

  16. [Construction of cDNA expression library of unfed female Haemaphysalis longicornis and immuno-screening].

    Science.gov (United States)

    Chai, Hui-ping; Liu, Guang-yuan; Zhang, Lin; Gong, Zhen-li; Xie, Jun-ren; Tian, Zhan-cheng; Wang, Lu; Jia, Ning

    2009-02-28

    To construct a cDNA expression library from unfed female tick Haemaphysalis longicornis for screening and cloning potential antigenic genes. Total RNA was isolated from unfed female ticks, mRNA was purified and a library of oligo (dT) -primed cDNA with added directional EcoR I /Hind III linkers was constructed from the purified mRNA. The constructed cDNA was ligated to the EcoR I /Hind III arms of the lambda SCREEN vector. Pure phage stocks were harvested by plaque purification and converted to plasmid subclones by plating phage on host strain BM25.8. Recombinant plasmids that were subcloned to E. coli BM25.8 were isolated and transformed into E. coli JM109. Recombinant plasmids abstracted from JM109 were identified by PCR and sequencing. The recombinant phage DNA was packaged by using phage-marker packaging extracts, resulting in a primary cDNA library with a size of 1.8 x 10(6) pfu. Data showed 100% of the library were recombinant and the titer of the amplified library was 2.4 x 10(9) pfu/ml. Forty-two clones of encoding immunodominant antigens were obtained from the cDNA library. Sequence analysis revealed 12 unique cDNA sequences and the encoded putative proteins showed similarities to H. longicornis tropomyosin mRNA, Rhipicephalus annulatus unknown larval protein mRNA, chromosome 2R of Drosophila melanogaster, mitochondrial DNA of H. flava, clones HqL09 unkown mRNA and Hq05 mRNA of H. qinghaiensis, and myosin alkali light chain protein mRNA. The cDNA expression library from unfed female H. longicornis was successfully constructed and screening of protective genes may provide candidate antigens of the tick.

  17. Deep sequencing of ESTs from nacreous and prismatic layer producing tissues and a screen for novel shell formation-related genes in the pearl oyster.

    Directory of Open Access Journals (Sweden)

    Shigeharu Kinoshita

    Full Text Available BACKGROUND: Despite its economic importance, we have a limited understanding of the molecular mechanisms underlying shell formation in pearl oysters, wherein the calcium carbonate crystals, nacre and prism, are formed in a highly controlled manner. We constructed comprehensive expressed gene profiles in the shell-forming tissues of the pearl oyster Pinctada fucata and identified novel shell formation-related genes candidates. PRINCIPAL FINDINGS: We employed the GS FLX 454 system and constructed transcriptome data sets from pallial mantle and pearl sac, which form the nacreous layer, and from the mantle edge, which forms the prismatic layer in P. fucata. We sequenced 260477 reads and obtained 29682 unique sequences. We also screened novel nacreous and prismatic gene candidates by a combined analysis of sequence and expression data sets, and identified various genes encoding lectin, protease, protease inhibitors, lysine-rich matrix protein, and secreting calcium-binding proteins. We also examined the expression of known nacreous and prismatic genes in our EST library and identified novel isoforms with tissue-specific expressions. CONCLUSIONS: We constructed EST data sets from the nacre- and prism-producing tissues in P. fucata and found 29682 unique sequences containing novel gene candidates for nacreous and prismatic layer formation. This is the first report of deep sequencing of ESTs in the shell-forming tissues of P. fucata and our data provide a powerful tool for a comprehensive understanding of the molecular mechanisms of molluscan biomineralization.

  18. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection

    Directory of Open Access Journals (Sweden)

    Michael Cullen

    2015-12-01

    Full Text Available For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current “gold standard” of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%.Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3–222, P=0.002, and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis. Keywords: HPV16, HPV epidemiology, HPV genomics

  19. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection.

    Science.gov (United States)

    Cullen, Michael; Boland, Joseph F; Schiffman, Mark; Zhang, Xijun; Wentzensen, Nicolas; Yang, Qi; Chen, Zigui; Yu, Kai; Mitchell, Jason; Roberson, David; Bass, Sara; Burdette, Laurie; Machado, Moara; Ravichandran, Sarangan; Luke, Brian; Machiela, Mitchell J; Andersen, Mark; Osentoski, Matt; Laptewicz, Michael; Wacholder, Sholom; Feldman, Ashlie; Raine-Bennett, Tina; Lorey, Thomas; Castle, Philip E; Yeager, Meredith; Burk, Robert D; Mirabello, Lisa

    2015-12-01

    For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS) whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current "gold standard" of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%. Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3-222, P =0.002), and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis.

  20. Genome sequence of Haloplasma contractile, an unusual contractile bacterium from a deep-sea anoxic brine lake.

    KAUST Repository

    Antunes, Andre

    2011-09-01

    We present the draft genome of Haloplasma contractile, isolated from a deep-sea brine and representing a new order between Firmicutes and Mollicutes. Its complex morphology with contractile protrusions might be strongly influenced by the presence of seven MreB/Mbl homologs, which appears to be the highest copy number ever reported.

  1. Genome sequence of Halorhabdus tiamatea, the first archaeon isolated from a deep-sea anoxic brine lake.

    KAUST Repository

    Antunes, Andre

    2011-09-01

    We present the draft genome of Halorhabdus tiamatea, the first member of the Archaea ever isolated from a deep-sea anoxic brine. Genome comparison with Halorhabdus utahensis revealed some striking differences, including a marked increase in genes associated with transmembrane transport and putative genes for a trehalose synthase and a lactate dehydrogenase.

  2. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir

    OpenAIRE

    Sevasti Filippidou; Marion Jaussi; Thomas Junier; Tina Wunderlin; Nicole Jeanneret; Simona Regenspurg; Po-E Li; Chien-Chi Lo; Shannon Johnson; Kim McMurry; Cheryl D. Gleasner; Momchilo Vuyisich; Patrick S. Chain; Pilar Junier

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera.

  3. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir.

    Science.gov (United States)

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D; Vuyisich, Momchilo; Chain, Patrick S; Junier, Pilar

    2015-08-27

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. Copyright © 2015 Filippidou et al.

  4. Deep sequencing of the Trypanosoma cruzi GP63 surface proteases reveals diversity and diversifying selection among chronic and congenital Chagas disease patients.

    Directory of Open Access Journals (Sweden)

    Martin S Llewellyn

    2015-04-01

    Full Text Available Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology.A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target--ND5--was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20, Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus.Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I gene family suggests a link between genetic diversity

  5. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics.

    Science.gov (United States)

    Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui

    2012-03-29

    MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved mi

  6. Identification of microRNAs from Amur grape (vitis amurensis Rupr. by deep sequencing and analysis of microRNA variations with bioinformatics

    Directory of Open Access Journals (Sweden)

    Wang Chen

    2012-03-01

    Full Text Available Abstract Background MicroRNA (miRNA is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr. is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72

  7. Construction and characteristics of 3-end enriched cDNA library from individual embryos of cattle.

    Science.gov (United States)

    Long, Jian-Er; He, Li-Qiang; Cai, Xia; Ren, Zhao-Rui; Huang, Shu-Zhen; Zeng, Yi-Tao

    2006-11-01

    To analyze stage-specific gene expression profiles of pre-implantation embryos and evaluate potential viability, techniques were adapted to generate 3-end enriched cDNA libraries from individual embryos of cattle based on RT-PCR methodology. The reproducibility of constructing a cDNA library was tested by five independent PCR experiments with specific primers for the presence of several rare genes such as DNMT1 (DNA methylation transferase 1), DNMT2, DNMT3A, Oct-4/3 (octmer-binding transcription factor), IFN-iota, IGF-2r (insulin like growth factor 2 receptor), and the housekeeping genes, H2A and beta-actin. Results indicated repeatability and that a proportion of expressed genes in the cDNA library from an individual embryo was not affected by limited PCR amplification. From the cDNA library, 134 clones were randomly selected for sequencing and showed that structure related elements accounted for 33.5% of transcripts and the energy- and metabolism-related genes were also an important component being 11.9% in the cDNA library. Approximately 14% of genes in the library were functionally unknown including greater than 5% of genes that were likely novel because there was no identity in Genbank. The frequency of structure-related genes such as beta-actin and ribosomal proteins in the cDNA library corresponded to other reports and suggested that the cDNA library constructed by RT-PCR might be proportional to the mRNA populations. The cDNA libraries constructed from different stage embryos will provide a powerful tool to explore novel genes relevant to embryogenesis, determine the profiling of stage-specific gene expression, and evaluate the potential viability of embryos.

  8. Multiple viral infections in Agaricus bisporus - Characterisation of 18 unique RNA viruses and 8 ORFans identified by deep sequencing

    OpenAIRE

    Deakin, Gregory; Dobbs, Edward; Bennett, Julie M.; Jones, Ian M.; Grogan, Helen M.; Burton, Kerry S.

    2017-01-01

    Thirty unique non-host RNAs were sequenced in the cultivated fungus, Agaricus bisporus, comprising 18 viruses each encoding an RdRp domain with an additional 8 ORFans (non-host RNAs with no similarity to known sequences). Two viruses were multipartite with component RNAs showing correlative abundances and common 3′ motifs. The viruses, all positive sense single-stranded, were classified into diverse orders/families. Multiple infections of Agaricus may represent a diverse, dynamic and interact...

  9. Trace Fossils as Indicators of Depositional Sequence Boundaries in Lower Carboniferous Deep-Sea Fan Environment Moravice Formation, Czech Republic

    Czech Academy of Sciences Publication Activity Database

    Lehotský, T.; Bábek, O.; Mikuláš, Radek; Zapletal, J.

    2002-01-01

    Roč. 14, - (2002), s. 59-60 ISSN 1210-9606. [Áelazno 2002. Meeting of the Czech Tectonic Studies Group /7./. Áelazno, 09.05.2002-12.05.2002] R&D Projects: GA ČR GA205/00/0118 Keywords : trace fossils * Carboniferous * Deep- Sea Environment Subject RIV: DB - Geology ; Mineralogy http://geolines.gli.cas.cz/fileadmin/volumes/volume14/G14-059.pdf

  10. PCR-based cDNA library construction: general cDNA libraries at the level of a few cells.

    OpenAIRE

    Belyavsky, A; Vinogradova, T; Rajewsky, K

    1989-01-01

    A procedure for the construction of general cDNA libraries is described which is based on the amplification of total cDNA in vitro. The first cDNA strand is synthesized from total RNA using an oligo(dT)-containing primer. After oligo(dG) tailing the total cDNA is amplified by PCR using two primers complementary to oligo(dA) and oligo(dG) ends of the cDNA. For insertion of the cDNA into a vector a controlled trimming of the 3' ends of the cDNA by Klenow enzyme was used. Starting from 10 J558L ...

  11. Isolation of cDNA clones coding for human tissue factor: primary structure of the protein and cDNA

    International Nuclear Information System (INIS)

    Spicer, E.K.; Horton, R.; Bloem, L.

    1987-01-01

    Tissue factor is a membrane-bound procoagulant protein that activates the extrinsic pathway of blood coagulation in the presence of factor VII and calcium. λ Phage containing the tissue factor gene were isolated from a human placental cDNA library. The amino acid sequence deduced from the nucleotide sequence of the cDNAs indicates that tissue factor is synthesized as a higher molecular weight precursor with a leader sequence of 32 amino acids, while the mature protein is a single polypeptide chain composed of 263 residues. The derived primary structure of tissue factor has been confirmed by comparison to protein and peptide sequence data. The sequence of the mature protein suggests that there are three distinct domains: extracellular, residues 1-219; hydrophobic, residues 220-242; and cytoplasmic, residues 243-263. Three potential N-linked carbohydrate attachment sites occur in the extracellular domain. The amino acid sequence of tissue factor shows no significant homology with the vitamin K-dependent serine proteases, coagulation cofactors, or any other protein in the National Biomedical Research Foundation sequence data bank (Washington, DC)

  12. Strategy for robust detection of insertions, deletions, and point mutations in CEBPA, a GC-rich content gene, using 454 next-generation deep-sequencing technology.

    Science.gov (United States)

    Grossmann, Vera; Schnittger, Susanne; Schindela, Sonja; Klein, Hans-Ulrich; Eder, Christiane; Dugas, Martin; Kern, Wolfgang; Haferlach, Torsten; Haferlach, Claudia; Kohlmann, Alexander

    2011-03-01

    CEBPA mutations are of prognostic relevance in acute myeloid leukemia (AML) and are currently detected using a combination of denaturing high-performance liquid chromatography (DHPLC), gene scan/fragment length analysis, and direct Sanger sequencing. Next-generation deep pyrosequencing, principally, allows for the highly sensitive detection of molecular mutations. However, standard 454 chemistry laboratory procedures lack efficient amplification of guanine-cytosine (GC)-rich amplicons during the emulsion PCR (emPCR) steps allowing direct massively parallel clonal amplification of PCR products. To solve this problem, we investigated six distinct emPCR conditions. The coding sequence of CEBPA was subdivided into four overlapping amplicons: GC content for amplicon 1, 74%; amplicon 2, 76%; amplicon 3, 77%; and amplicon 4, 69%. A new emPCR condition, improving the standard titanium assay, presents a robust solution to sequence amplicons with a GC content of up to 77%. Moreover, this assay was subsequently tested on a larger independent cohort of 23 AML patients. For each patient, a median of 737 reads was generated (coverage range, 397-fold to 1194-fold) and therefore allowed a robust detection of insertions, deletions, and point mutations. In conclusion, next-generation amplicon sequencing enables the highly sensitive detection of molecular mutations and is a feasible assay for routine assessment of GC-rich content amplicons. Copyright © 2011 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  13. Inspecting Targeted Deep Sequencing of Whole Genome Amplified DNA Versus Fresh DNA for Somatic Mutation Detection: A Genetic Study in Myelodysplastic Syndrome Patients.

    Science.gov (United States)

    Palomo, Laura; Fuster-Tormo, Francisco; Alvira, Daniel; Ademà, Vera; Armengol, María Pilar; Gómez-Marzo, Paula; de Haro, Nuri; Mallo, Mar; Xicoy, Blanca; Zamora, Lurdes; Solé, Francesc

    2017-08-01

    Whole genome amplification (WGA) has become an invaluable method for preserving limited samples of precious stock material and has been used during the past years as an alternative tool to increase the amount of DNA before library preparation for next-generation sequencing. Myelodysplastic syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. In this work, targeted deep sequencing has been performed on four paired fresh DNA and WGA DNA samples from bone marrow of MDS patients, to assess the feasibility of using WGA DNA for detecting somatic mutations. The results of this study highlighted that, in general, the sequencing and alignment statistics of fresh DNA and WGA DNA samples were similar. However, after variant calling and when considering variants detected at all frequencies, there was a high level of discordance between fresh DNA and WGA DNA (overall, a higher number of variants was detected in WGA DNA). After proper filtering, a total of three somatic mutations were detected in the cohort. All somatic mutations detected in fresh DNA were also identified in WGA DNA and validated by whole exome sequencing.

  14. Deep Sequencing Analysis of RNAs from Citrus Plants Grown in a Citrus Sudden Death-Affected Area Reveals Diverse Known and Putative Novel Viruses.

    Science.gov (United States)

    Matsumura, Emilyn E; Coletta-Filho, Helvecio D; Nouri, Shahideh; Falk, Bryce W; Nerva, Luca; Oliveira, Tiago S; Dorta, Silvia O; Machado, Marcos A

    2017-04-24

    Citrus sudden death (CSD) has caused the death of approximately four million orange trees in a very important citrus region in Brazil. Although its etiology is still not completely clear, symptoms and distribution of affected plants indicate a viral disease. In a search for viruses associated with CSD, we have performed a comparative high-throughput sequencing analysis of the transcriptome and small RNAs from CSD-symptomatic and -asymptomatic plants using the Illumina platform. The data revealed mixed infections that included Citrus tristeza virus (CTV) as the most predominant virus, followed by the Citrus sudden death-associated virus (CSDaV), Citrus endogenous pararetrovirus (CitPRV) and two putative novel viruses tentatively named Citrus jingmen-like virus (CJLV), and Citrus virga-like virus (CVLV). The deep sequencing analyses were sensitive enough to differentiate two genotypes of both viruses previously associated with CSD-affected plants: CTV and CSDaV. Our data also showed a putative association of the CSD-symptomatic plants with a specific CSDaV genotype and a likely association with CitPRV as well, whereas the two putative novel viruses showed to be more associated with CSD-asymptomatic plants. This is the first high-throughput sequencing-based study of the viral sequences present in CSD-affected citrus plants, and generated valuable information for further CSD studies.

  15. Construction of primary and subtracted cDNA libraries from early embryos.

    Science.gov (United States)

    Rothstein, J L; Johnson, D; Jessee, J; Skowronski, J; DeLoia, J A; Solter, D; Knowles, B B

    1993-01-01

    By modifying current cDNA cloning and electroporation methods, large and representative murine cDNA libraries were synthesized from 10 to 100 ng mRNA isolated from unfertilized egg and preimplantation mouse embryos. High cloning efficiency is essential for complete representation of genes expressed in egg and preimplantation embryos and for the isolation of stage-specific genes using subtractive hybridization. Because the mouse embryo contains no more than 50 pg of poly(A)+ mRNA at any stage of preimplantation development, approximately 5000-10,000 embryos are required to obtain enough mRNA to synthesize libraries using current methods. To obtain a representative library that also includes rare transcripts, the size of the library should be at least 10(6) clones. The average percent conversion of mRNA to single-stranded cDNA was 20-40%, so that a cloning efficiency of nearly 2 x 10(8) cfu/microgram cDNA is required for such a cDNA library. No previous methods have provided directional cloning of cDNA into plasmids with these high efficiencies. The advent of electroporation methods for the introduction of nucleic acids into bacteria has made possible the use of standard plasmid vectors for high-efficiency cDNA cloning. Plasmid vectors are currently available that can accommodate the directional cloning of cDNA such that T7 and T3 RNA polymerase promoter sequences can be used to generate sense and anti-sense transcripts for subtractive hybridization and riboprobe synthesis. The cDNA libraries we derived using this methodology are a reusable and abundant source of genetic information about the control of preimplantation development. Specialized subtractive cDNA libraries enriched for genes expressed exclusively at a predetermined time in development give access to genes expressed in a stage-specific manner. The ability to construct new cDNA libraries from limited amounts of starting material ensures the provision of new and important resources for the identification

  16. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Author Affiliations. Vikas Anathy1 Thayanithy Venugopal1 Ramanathan Koteeswaran1 Thavamani J Pandian1 Sinnakaruppan Mathavan2. Department of Genetics, School of Biological Sciences, Madurai Kamaraj University, Madurai 625 021, India; Genome Institute of Singapore, National University of Singapore, 1, ...

  17. Construction of equalized short hairpin RNA library from human brain cDNA.

    Science.gov (United States)

    Xu, Lei; Li, Jingqi; Liu, Li; Lu, Lixia; Gao, Jingxia; Li, Xueli

    2007-02-20

    Short hairpin RNA (shRNA) library is a powerful new tool for high-throughput loss-of-function genetic screens in mammalian cells. An shRNA library can be constructed from synthetic oligonucleotides or enzymatically cleaved natural cDNA. Here, we describe a new method for constructing equalized shRNA libraries from cDNA. First, enzymatically digested cDNA fragments are equalized by a suppression PCR-based method modified from suppression subtractive hybridization. The efficiency of equalization was confirmed by quantitative real-time PCR. The fragments are then converted into an shRNA library by a series of enzymatic treatments. With this new technology, we constructed a library from human brain cDNA. Sequence analysis showed that most of the randomly selected clones had inverted repeat sequences converted from different cDNA. After transfecting HEK 293T cells and detecting gene expression, three out of eight clones were demonstrated to significantly inhibit their target genes.

  18. Molecular cloning of goat 20alpha-hydroxysteroid dehydrogenase cDNA.

    Science.gov (United States)

    Jayasekara, Walimuni Samantha Nilanthi; Yonezawa, Tomohiro; Ishida, Maho; Yamanouchi, Keitaro; Nishihara, Masugi

    2004-06-01

    20Alpha-hydroxysteroid dehydrogenase (20alpha-HSD), which catalyzes the conversion of progesterone to its inactive form 20alpha-dihydroprogesterone, is expressed in murine placenta and has been suggested to play roles in maintaining pregnancy. To understand the role of 20alpha-HSD during pregnancy in the goat, as a first step, cloning and sequencing of 20alpha-HSD cDNA were performed. The full nucleotide sequence of 20alpha-HSD cDNA was determined on samples obtained from the corpus luteum at the luteal phase of the estrous cycle and the placenta in late pregnancy by RT-PCR and 3' and 5' RACE systems. Cloned 20alpha-HSD cDNA consisted of 1124 bp and belonged to the aldo-keto reductase superfamily. From the start codon to stop codon there were 323 amino acids, the same as in other species. To verify whether the protein derived from goat 20alpha-HSD cDNA had 20alpha-HSD activity, the cDNA was expressed by bacteria. Bacterially expressed goat 20alpha-HSD protein showed 20alpha-HSD enzyme activity. A tissue distribution study demonstrated that 20alpha-HSD was expressed in the placenta, but not in the adrenal gland, liver and spleen during pregnancy. The present study suggests that goat 20alpha-HSD is another member of the aldo-keto reductase superfamily and that it plays a role in the placenta during pregnancy.

  19. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    Directory of Open Access Journals (Sweden)

    Guoyan Liu

    Full Text Available Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods.We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish.We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information

  20. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts.

    Science.gov (United States)

    Liu, Guoyan; Zhou, Yonghong; Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington's, Alzheimer's and Parkinson's diseases. This is the first description of degenerative disease-associated genes in jellyfish. We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular level and information on the underlying

  1. Identification of SSRs and differentially expressed genes in two cultivars of celery (Apium graveolens L.) by deep transcriptome sequencing.

    Science.gov (United States)

    Li, Meng-Yao; Wang, Feng; Jiang, Qian; Ma, Jing; Xiong, Ai-Sheng

    2014-01-01

    Celery (Apium graveolens L.) is one of the most important and widely grown vegetables in the Apiaceae family. Due to the lack of comprehensive genomic resources, research on celery has mainly utilized physiological and biochemical approaches, rather than molecular biology, to study this crop. Transcriptome sequencing has become an efficient and economic technology for obtaining information on gene expression that can greatly facilitate molecular and genomic studies of species for which a sequenced genome is not available. In the present study, 15 893 516 and 19 818 161 high-quality sequences were obtained by RNA-seq from two celery varieties 'Ventura' and 'Jinnan Shiqin', respectively. The obtained reads were assembled into 39 584 and 41 740 unigenes with mean lengths of 683 bp and 690 bp, respectively. A total of 1939 simple sequence repeat (SSR) markers were identified in 'Ventura' and 2004 SSRs in 'Jinnan Shiqin'. Di-nucleotide repeats were the most common repeat motif, accounting for 55.49% and 54.84% in 'Ventura' and 'Jinnan Shiqin', respectively. A comparison of expressed genes between the two libraries, identified 338 differentially expressed genes (DEGs). Three hundred and three of the DEGs were annotated based on a sequence similarity search utilizing eight public databases. Additionally, the expression profile of eight annotated DEGs was characterized in response to abiotic stresses. The collective data generated in the present research represent a valuable resource for further genetic and molecular studies in celery.

  2. KLONING cDNA HORMON PERTUMBUHAN DARI IKAN GURAME (Osphronemus gouramy

    Directory of Open Access Journals (Sweden)

    Estu Nugroho

    2016-11-01

    Full Text Available Penelitian mengenai kloning cDNA pengkode hormon pertumbuhan ikan gurame telah dilakukan. Tujuan dari penelitian ini adalah untuk memperoleh sekuens DNA komplemen hormon pertumbuhan sebagai langkah awal dalam rangka pengembangan teknologi rekayasa genetik ikan gurame. Empat buah kelenjar hifopisa ikan gurame digunakan sebagai bahan bakunya dan dilakukan proses ekstraksi RNA total dari kelenjar hipofisa, dilanjutkan dengan sintesis cDNA, amplifikasi PCR, purifikasi fragmen DNA dari gel, ligasi produk PCR dengan vektor kloning, transformasi dan inkubasi bakteri, seleksi koloni bakteri putih, isolasi plasmid, dan sekuensing. Hasil sekuensing menunjukkan bahwa panjang produk amplifikasi PCR adalah 843 bp yang menyandikan 204 asam amino residu dan mengandung sekuens-sekuens yang konserf untuk gen hormon pertumbuhan (GH. Analisis homologi menunjukkan kesamaan sekuens hasil isolasi antara 52,4%--97,6% dengan gen GH ikan lainnya, dengan persentase homologi tertinggi adalah dengan ikan sepat. Dengan demikian dapat disimpulkan bahwa sekuens hasil isolasi merupakan sekuens gen GH. Dari hasil analisis sekuens terlihat bahwa gen GH ikan gurame secara evolusi adalah konserf. Research on cDNA cloning encoded the gouramy growth hormone was conducted. The aim of the research was to get complementary DNA, cDNA, sequences of growth hormone as an initial step to develop genetic engineering of gouramy fish. Four pituitary glands of the gouramy were taken and then processed with total RNA extraction, and continued with cDNA synthesis, PCR amplification, DNA fragment purification from the gel, PCR product legation with cloning vector, transformation and incubation of bacteria, white colony bacteria selection, plasmid isolation and sequencing analysis. Sequencing result showed that the amplified PCR product length had 834 bp, encoding 204 amino acid residue and contained conserve sequence for GH (growth hormone gen. Homolog analysis showed sequence similarity of

  3. Targeted deep DNA methylation analysis of circulating cell-free DNA in plasma using massively parallel semiconductor sequencing.

    Science.gov (United States)

    Vaca-Paniagua, Felipe; Oliver, Javier; Nogueira da Costa, Andre; Merle, Philippe; McKay, James; Herceg, Zdenko; Holmila, Reetta

    2015-01-01

    To set up a targeted methylation analysis using semiconductor sequencing and evaluate the potential for studying methylation in circulating cell-free DNA (cfDNA). Methylation of VIM, FBLN1, LTBP2, HINT2, h19 and IGF2 was analyzed in plasma cfDNA and white blood cell DNA obtained from eight hepatocellular carcinoma patients and eight controls using Ion Torrent™ PGM sequencer. h19 and IGF2 showed consistent methylation levels and methylation was detected for VIM and FBLN1, whereas LTBP2 and HINT2 did not show methylation for target regions. VIM gene promoter methylation was higher in HCC cfDNA than in cfDNA of controls or white blood cell DNA. Semiconductor sequencing is a suitable method for analyzing methylation profiles in cfDNA. Furthermore, differences in cfDNA methylation can be detected between controls and hepatocellular carcinoma cases, even though due to the small sample set these results need further validation.

  4. Avoiding cross hybridization by choosing nonredundant targets on cDNA arrays

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Knudsen, Steen

    2002-01-01

    PROBEWIZ designs PCR primers for amplifying probes for cDNA arrays. The probes are designed to have minimal homology to other expressed sequences from a given organism. The primer selection is based on user-defined penalties for homology, primer quality, and proximity to the 3' end....

  5. Isolation of an ATP synthase cDNA from Sinonovacula constricta ...

    African Journals Online (AJOL)

    Yomi

    2012-01-24

    Jan 24, 2012 ... 2Ningbo City College of Vocational Technology, Ningbo, 315100 People's Republic of China. 3National Marine Environmental Monitoring Center. Dalian, 116023, People's ... The SMART cDNA library of S. constricta was constructed by our laboratory. Random sequencing of the library using T3 primer.

  6. Rapid and Deep Proteomes by Faster Sequencing on a Benchtop Quadrupole Ultra-High-Field Orbitrap Mass Spectrometer

    DEFF Research Database (Denmark)

    Kelstrup, Christian D; Jersie-Christensen, Rosa R; Batth, Tanveer Singh

    2014-01-01

    Shotgun proteomics is a powerful technology for global analysis of proteins and their post-translational modifications. Here, we investigate faster sequencing speed of the latest Q Exactive HF mass spectrometer, which features an ultra-high-field Orbitrap mass analyzer. Proteome coverage is evalu......Shotgun proteomics is a powerful technology for global analysis of proteins and their post-translational modifications. Here, we investigate faster sequencing speed of the latest Q Exactive HF mass spectrometer, which features an ultra-high-field Orbitrap mass analyzer. Proteome coverage...

  7. Attenuation and cell culture adaptation of hepatitis A virus (HAV): a genetic analysis with HAV cDNA.

    OpenAIRE

    Cohen, J I; Rosenblum, B; Feinstone, S M; Ticehurst, J; Purcell, R H

    1989-01-01

    RNA transcripts of hepatitis A virus (HAV) HM-175 cDNA from attenuated, cell culture-adapted HAV were infectious in cell culture. A full-length HAV cDNA from wild-type HAV (propagated in marmosets in vivo) was constructed. Chimeric cDNAs that contained portions of both wild-type and attenuated genomes were produced. Oligonucleotide-directed mutagenesis was used to engineer a point mutation into the VP1 gene of attenuated HAV cDNA, so that the sequence of this capsid protein would be identical...

  8. Molecular characterization of a Leishmania donovani cDNA clone with similarity to human 20S proteasome a-type subunit

    DEFF Research Database (Denmark)

    Christensen, C B; Jørgensen, L; Jensen, A T

    2000-01-01

    Using plasma from patients infected or previously infected with Leishmania donovanii, we isolated a L. donovanii cDNA clone with similarity to the proteasome a-type subunit from humans and other eukaryotes. The cDNA clone, designated LePa, was DNA sequenced and Northern blot analysis of L...

  9. Deep Illumina-based shotgun sequencing reveals dietary effects on the structure and function of the fecal microbiome of growing kittens.

    Directory of Open Access Journals (Sweden)

    Oliver Deusch

    Full Text Available Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome.Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high-protein, low-carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC were collected at 8, 12 and 16 weeks of age (n = 6 per group. A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007 between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022 enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome.These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary

  10. cDNA cloning and immunological characterization of the rye grass allergen Lol p I.

    Science.gov (United States)

    Perez, M; Ishioka, G Y; Walker, L E; Chesnut, R W

    1990-09-25

    The complete amino acid sequence of two "isoallergenic" forms of Lol p I, the major rye grass (Lolium perenne) pollen allergen, was deduced from cDNA sequence analysis. cDNA clones isolated from a Lolium perenne pollen library contained an open reading frame coding for a 240-amino acid protein. Comparison of the nucleotide and deduced amino acid sequence of two of these clones revealed four changes at the amino acid level and numerous nucleotide differences. Both clones contained one possible asparagine-linked glycosylation site. Northern blot analysis shows one RNA species of 1.2 kilobases. Based on the complete amino acid sequence of Lol p I, overlapping peptides covering the entire molecule were synthesized. Utilizing these peptides we have identified a determinant within the Lol p I molecule that is recognized by human leukocyte antigen class II-restricted T cells obtained from persons allergic to rye grass pollen.

  11. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  12. Deep sequencing of RNA from immune cell-derived vesicles uncovers the selective incorporation of small non-coding RNA biotypes with potential regulatory functions.

    Science.gov (United States)

    Nolte-'t Hoen, Esther N M; Buermans, Henk P J; Waasdorp, Maaike; Stoorvogel, Willem; Wauben, Marca H M; 't Hoen, Peter A C

    2012-10-01

    Cells release RNA-carrying vesicles and membrane-free RNA/protein complexes into the extracellular milieu. Horizontal vesicle-mediated transfer of such shuttle RNA between cells allows dissemination of genetically encoded messages, which may modify the function of target cells. Other studies used array analysis to establish the presence of microRNAs and mRNA in cell-derived vesicles from many sources. Here, we used an unbiased approach by deep sequencing of small RNA released by immune cells. We found a large variety of small non-coding RNA species representing pervasive transcripts or RNA cleavage products overlapping with protein coding regions, repeat sequences or structural RNAs. Many of these RNAs were enriched relative to cellular RNA, indicating that cells destine specific RNAs for extracellular release. Among the most abundant small RNAs in shuttle RNA were sequences derived from vault RNA, Y-RNA and specific tRNAs. Many of the highly abundant small non-coding transcripts in shuttle RNA are evolutionary well-conserved and have previously been associated to gene regulatory functions. These findings allude to a wider range of biological effects that could be mediated by shuttle RNA than previously expected. Moreover, the data present leads for unraveling how cells modify the function of other cells via transfer of specific non-coding RNA species.

  13. Effects of polymerase, template dilution and cycle number on PCR based 16 S rRNA diversity analysis using the deep sequencing method

    Directory of Open Access Journals (Sweden)

    Zou Fei

    2010-10-01

    Full Text Available Abstract Background The primer and amplicon length have been found to affect PCR based estimates of microbial diversity by pyrosequencing, while other PCR conditions have not been addressed using any deep sequencing method. The present study determined the effects of polymerase, template dilution and PCR cycle number using the Solexa platform. Results The PfuUltra II Fusion HS DNA Polymerase (Stratagene with higher fidelity showed lower amount of PCR artifacts and determined lower taxa richness than the Ex Taq (Takara. More importantly, the two polymerases showed different efficiencies for amplifying some of very abundant sequences, and determined significantly different community structures. As expected, the dilution of the DNA template resulted in a reduced estimation of taxa richness, particularly at the 200 fold dilution level, but the community structures were similar for all dilution levels. The 30 cycle group increased the PCR artifacts while comparing to the 25 cycle group, but the determined taxa richness was lower than that of the 25 cycle group. The PCR cycle number did not changed the microbial community structure significantly. Conclusions These results highlight the PCR conditions, particularly the polymerase, have significant effect on the analysis of microbial diversity with next generation sequencing methods.

  14. MicroRNAs in Amoebozoa: deep sequencing of the small RNA population in the social amoeba Dictyostelium discoideum reveals developmentally regulated microRNAs.

    Science.gov (United States)

    Avesson, Lotta; Reimegård, Johan; Wagner, E Gerhart H; Söderbom, Fredrik

    2012-10-01

    The RNA interference machinery has served as a guardian of eukaryotic genomes since the divergence from prokaryotes. Although the basic components have a shared origin, silencing pathways directed by small RNAs have evolved in diverse directions in different eukaryotic lineages. Micro (mi)RNAs regulate protein-coding genes and play vital roles in plants and animals, but less is known about their functions in other organisms. Here, we report, for the first time, deep sequencing of small RNAs from the social amoeba Dictyostelium discoideum. RNA from growing single-cell amoebae as well as from two multicellular developmental stages was sequenced. Computational analyses combined with experimental data reveal the expression of miRNAs, several of them exhibiting distinct expression patterns during development. To our knowledge, this is the first report of miRNAs in the Amoebozoa supergroup. We also show that overexpressed miRNA precursors generate miRNAs and, in most cases, miRNA* sequences, whose biogenesis is dependent on the Dicer-like protein DrnB, further supporting the presence of miRNAs in D. discoideum. In addition, we find miRNAs processed from hairpin structures originating from an intron as well as from a class of repetitive elements. We believe that these repetitive elements are sources for newly evolved miRNAs.

  15. Construction and sequence sampling of deep-coverage, large-insert BAC libraries for three model lepidopteran species

    Directory of Open Access Journals (Sweden)

    Zhao Shaying

    2009-06-01

    Full Text Available Abstract Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together

  16. Identification and analysis of miRNAs in human breast cancer and teratoma samples using deep sequencing

    DEFF Research Database (Denmark)

    Nygaard, Sanne; Jacobsen, Anders; Lindow, Morten

    2009-01-01

    . METHODS: Here we describe the analysis of 454 pyrosequencing of small RNA from four different tissues: Breast cancer, normal adjacent breast, and two teratoma cell lines. We developed a pipeline for identifying new miRNAs, emphasizing extracting and retaining as much data as possible from even noisy...... sequencing data. We investigated differential expression of miRNAs in the breast cancer and normal adjacent breast samples, and systematically examined the mature sequence end variability of miRNA compared to non-miRNA loci. RESULTS: We identified five novel miRNAs, as well as two putative alternative...... precursors for known miRNAs. Several miRNAs were differentially expressed between the breast cancer and normal breast samples. The end variability was shown to be significantly different between miRNA and non-miRNA loci. CONCLUSIONS: Pyrosequencing of small RNAs, together with a computational pipeline, can...

  17. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing.

    Science.gov (United States)

    Jain, Mukesh; Chevala, V V S Narayana; Garg, Rohini

    2014-11-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  18. Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence

    International Nuclear Information System (INIS)

    Ikuta, T.; Szeto, S.; Yoshida, A.

    1986-01-01

    Class I human alcohol dehydrogenase (ADH; alcohol:NAD + oxidoreductase, EC 1.1.1.1) consists of several homo- and heterodimers of α, β, and γ subunits that are governed by the ADH1, ADH2, and ADH3 loci. The authors previously cloned a full length of cDNA for the β subunit, and the complete sequence of 374 amino acid residues was established. cDNAs for the α and γ subunits were cloned and characterized. A human liver cDNA library, constructed in phage λgt11, was screened by using a synthetic oligonucleotide probe that was matched to the γ but not to the β sequence. Clone pUCADHγ21 and clone pUCADHα15L differed from β cDNA with respect to restriction sites and hybridization with the nucleotide probe. Clone pUCADHγ21 contained an insertion of 1.5 kilobase pairs (kbp) and encodes 374 amino acid residues compatible with the reported amino acid sequence of the γ subunit. Clone pUCADHα15L contained an insertion of 2.4 kbp and included nucleotide sequences that encode 374 amino acid residues for another subunit, the γ subunit. In addition, this clone contained the sequences that encode the COOH-terminal part of the β subunit at its extended 5' region. The amino acid sequences and coding regions of the cDNAs of the three subunits are very similar. A high degree of resemblance is observed also in their 3' noncoding regions. However, distinctive differences exist in the vicinity of the Zn-binding cysteine residue at position 46. Based on the cDNA sequences and the deduced amino acid sequences of the three subunits, their structural and evolutionary relationships are discussed

  19. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    Science.gov (United States)

    Qiu, Jie; Wang, Yu; Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19-0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  20. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA.

    Science.gov (United States)

    Liu, Baohong; Tang, Xiaoyan; Qiu, Feng; Tao, Chunmei; Gao, Junhui; Ma, Mengmeng; Zhong, Tingyan; Cai, JianPing; Li, Yixue; Ding, Guohui

    2016-01-01

    Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods-the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method-together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping.

  1. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    Directory of Open Access Journals (Sweden)

    Jie Qiu

    Full Text Available Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou and a wild line (Lanxi 1 collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1 no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2 besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3 high heterozygous rates (0.19-0.49 were observed in several semi-wild lines; and (4 over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  2. Cloning of the mouse cDNA encoding DNA topoisomerase I and chromosomal location of the gene.

    Science.gov (United States)

    Koiwai, O; Yasui, Y; Sakai, Y; Watanabe, T; Ishii, K; Yanagihara, S; Andoh, T

    1993-03-30

    The mouse cDNA encoding DNA topoisomerase I (TopoI) was cloned and the nucleotide sequence of 3512 bp was determined. The cDNA clone contained an open reading frame encoding a protein of 767 amino acids (aa), which is 2 aa longer than its human counterpart. Overall aa sequence homology between the mouse and human, and between the mouse and yeast (Saccharomyces cerevisiae) sequences was 96% and 42%, respectively. The mouse TopI gene was mapped at position 54.5 on chromosome 2 from linkage analyses of a three-point cross test with Geg, Ada, and a as marker genes.

  3. Digital analysis of cDNA abundance; expression profiling by means of restriction fragment fingerprinting

    Directory of Open Access Journals (Sweden)

    Regenbogen Johannes

    2002-03-01

    Full Text Available Abstract Background Gene expression profiling among different tissues is of paramount interest in various areas of biomedical research. We have developed a novel method (DADA, Digital Analysis of cDNA Abundance, that calculates the relative abundance of genes in cDNA libraries. Results DADA is based upon multiple restriction fragment length analysis of pools of clones from cDNA libraries and the identification of gene-specific restriction fingerprints in the resulting complex fragment mixtures. A specific cDNA cloning vector had to be constructed that governed missing or incomplete cDNA inserts which would generate misleading fingerprints in standard cloning vectors. Double stranded cDNA was synthesized using an anchored oligo dT primer, uni-directionally inserted into the DADA vector and cDNA libraries were constructed in E. coli. The cDNA fingerprints were generated in a PCR-free procedure that allows for parallel plasmid preparation, labeling, restriction digest and fragment separation of pools of 96 colonies each. This multiplexing significantly enhanced the throughput in comparison to sequence-based methods (e.g. EST approach. The data of the fragment mixtures were integrated into a relational database system and queried with fingerprints experimentally produced by analyzing single colonies. Due to limited predictability of the position of DNA fragments on the polyacrylamid gels of a given size, fingerprints derived solely from cDNA sequences were not accurate enough to be used for the analysis. We applied DADA to the analysis of gene expression profiles in a model for impaired wound healing (treatment of mice with dexamethasone. Conclusions The method proved to be capable of identifying pharmacologically relevant target genes that had not been identified by other standard methods routinely used to find differentially expressed genes. Due to the above mentioned limited predictability of the fingerprints, the method was yet tested only with

  4. Gene expression in the deep biosphere.

    Science.gov (United States)

    Orsi, William D; Edgcomb, Virginia P; Christman, Glenn D; Biddle, Jennifer F

    2013-07-11

    Scientific ocean drilling has revealed a deep biosphere of widespread microbial life in sub-seafloor sediment. Microbial metabolism in the marine subsurface probably has an important role in global biogeochemical cycles, but deep biosphere activities are not well understood. Here we describe and analyse the first sub-seafloor metatranscriptomes from anaerobic Peru Margin sediment up to 159 metres below the sea floor, represented by over 1 billion complementary DNA (cDNA) sequence reads. Anaerobic metabolism of amino acids, carbohydrates and lipids seem to be the dominant metabolic processes, and profiles of dissimilatory sulfite reductase (dsr) transcripts are consistent with pore-water sulphate concentration profiles. Moreover, transcripts involved in cell division increase as a function of microbial cell concentration, indicating that increases in sub-seafloor microbial abundance are a function of cell division across all three domains of life. These data support calculations and models of sub-seafloor microbial metabolism and represent the first holistic picture of deep biosphere activities.

  5. Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.

    Science.gov (United States)

    Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo

    2009-07-06

    In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using

  6. Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing

    Directory of Open Access Journals (Sweden)

    Herrera-Estrella Luis

    2009-07-01

    Full Text Available Abstract Background In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs, although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. Results To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20–454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20–454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20–454 sequences and corresponding levels of gene expression. Conclusion A protocol was developed that significantly

  7. Design and Screening of M13 Phage Display cDNA Libraries

    Directory of Open Access Journals (Sweden)

    Yuliya Georgieva

    2011-02-01

    Full Text Available The last decade has seen a steady increase in screening of cDNA expression product libraries displayed on the surface of filamentous bacteriophage. At the same time, the range of applications extended from the identification of novel allergens over disease markers to protein-protein interaction studies. However, the generation and selection of cDNA phage display libraries is subjected to intrinsic biological limitations due to their complex nature and heterogeneity, as well as technical difficulties regarding protein presentation on the phage surface. Here, we review the latest developments in this field, discuss a number of strategies and improvements anticipated to overcome these challenges making cDNA and open reading frame (ORF libraries more readily accessible for phage display. Furthermore, future trends combining phage display with next generation sequencing (NGS will be presented.

  8. The induced earthquake sequence related to the St. Gallen deep geothermal project (Switzerland): Fault reactivation and fluid interactions imaged by microseismicity

    Science.gov (United States)

    Diehl, T.; Kraft, T.; Kissling, E.; Wiemer, S.

    2017-09-01

    In July 2013, a sequence of more than 340 earthquakes was induced by reservoir stimulations and well-control procedures following a gas kick at a deep geothermal drilling project close to the city of St. Gallen, Switzerland. The sequence culminated in an ML 3.5 earthquake, which was felt within 10-15 km from the epicenter. High-quality earthquake locations and 3-D reflection seismic data acquired in the St. Gallen project provide a unique data set, which allows high-resolution studies of earthquake triggering related to the injection of fluids into macroscopic fault zones. In this study, we present a high-precision earthquake catalog of the induced sequence. Absolute locations are constrained by a coupled hypocenter-velocity inversion, and subsequent double-difference relocations image the geometry of the ML 3.5 rupture and resolve the spatiotemporal evolution of seismicity. A joint interpretation of earthquake and seismic data shows that the majority of the seismicity occurred in the pre-Mesozoic basement, hundreds of meters below the borehole and the targeted Mesozoic sequence. We propose a hydraulic connectivity between the reactivated fault and the borehole, likely through faults mapped by seismic data. Despite the excellent quality of the seismic data, the association of seismicity with mapped faults remains ambiguous. In summary, our results document that the actual hydraulic properties of a fault system and hydraulic connections between its fault segments are complex and may not be predictable upfront. Incomplete knowledge of fault structures and stress heterogeneities within highly complex fault systems additionally challenge the degree of predictability of induced seismicity related to underground fluid injections.

  9. Integrative analysis of deep sequencing data identifies estrogen receptor early response genes and links ATAD3B to poor survival in breast cancer.

    Directory of Open Access Journals (Sweden)

    Kristian Ovaska

    Full Text Available Identification of responsive genes to an extra-cellular cue enables characterization of pathophysiologically crucial biological processes. Deep sequencing technologies provide a powerful means to identify responsive genes, which creates a need for computational methods able to analyze dynamic and multi-level deep sequencing data. To answer this need we introduce here a data-driven algorithm, SPINLONG, which is designed to search for genes that match the user-defined hypotheses or models. SPINLONG is applicable to various experimental setups measuring several molecular markers in parallel. To demonstrate the SPINLONG approach, we analyzed ChIP-seq data reporting PolII, estrogen receptor α (ERα, H3K4me3 and H2A.Z occupancy at five time points in the MCF-7 breast cancer cell line after estradiol stimulus. We obtained 777 ERa early responsive genes and compared the biological functions of the genes having ERα binding within 20 kb of the transcription start site (TSS to genes without such binding site. Our results show that the non-genomic action of ERα via the MAPK pathway, instead of direct ERa binding, may be responsible for early cell responses to ERα activation. Our results also indicate that the ERα responsive genes triggered by the genomic pathway are transcribed faster than those without ERα binding sites. The survival analysis of the 777 ERα responsive genes with 150 primary breast cancer tumors and in two independent validation cohorts indicated the ATAD3B gene, which does not have ERα binding site within 20 kb of its TSS, to be significantly associated with poor patient survival.

  10. Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability

    Directory of Open Access Journals (Sweden)

    Holder Thomas

    2013-01-01

    Full Text Available Abstract Background Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation. Results In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias, which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals. Conclusions Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation. Reviewer This article was reviewed by Sándor Pongor, L. Aravind and Anthony M. Poole.

  11. Human F7 sequence is split into three deep clades that are related to FVII plasma levels.

    Science.gov (United States)

    Sabater-Lleal, Maria; Soria, José Manuel; Bertranpetit, Jaume; Almasy, Laura; Blangero, John; Fontcuberta, Jordi; Calafell, Francesc

    2006-02-01

    It is widely accepted that FVII levels are strongly, consistently, and independently related to cardiovascular risk. These levels are influenced by genetic and environmental factors. Among the genetic factors, only a limited number of polymorphisms in the F7 gene have been reported, and they explain only a small proportion of the genetic variability. Recently, we have accomplished the complete dissection of the F7 quantitative trait locus responsible for all of the genetic variability observed in FVII levels. Now, we present the thorough study of the haplotype organization of F7 DNA sequence variation among individuals and the evolutionary processes that produced this variation, by sequencing 15 kb of genomic DNA sequence from the F7 locus in 40 unrelated individual (80 chromosomes) from the genetic analysis of idiopathic thrombophilia (GAIT) project as well as four non-human primate species. Our study revealed 49 polymorphisms, of which 39 SNPs were further considered. Genotyping of these DNA variations in the whole family-based GAIT sample helped resolve linkage phases, and a total of 37 distinct haplotypes were identified.Tajima's D was significantly positive in this sample, suggesting balancing selection. This parameter was a reflection of the phylogenetic structure of F7 haplotype, which was deeply split into three well-supported clades or haplogroups, suggesting that functional differences among F7 variants do not depend on a few single-site variations. Moreover, haplogroup 2 was associated with high FVII levels and haplogroup 3 with low levels. In this study, we have for the first time established a clear relation between genotypic variability structure and phenotypic variability of a particular quantitative trait involved in a complex disease.

  12. Characterization of the small RNA transcriptomes of androgen dependent and independent prostate cancer cell line by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Gang Xu

    2010-11-01

    Full Text Available Given the important roles of miRNA in post-transcriptional regulation and its implications for cancer, characterization of miRNA facilitates us to uncover molecular mechanisms underlying the progression of androgen-independent prostate cancer (PCa. The emergence of next-generation sequencing technologies has dramatically changed the speed of all aspects of sequencing in a rapid and cost-effective fashion, which can permit an unbiased, quantitive and in-depth investigation of small RNA transcriptome. In this study, we used high-throughput Illumina sequencing to comprehensively represent the full complement of individual small RNA and to characterize miRNA expression profiles in both the androgen dependent and independent Pca cell line. At least 83 miRNAs are significantly differentially expressed, of which 41 are up-regulated and 42 are down-regulated, indicating these miRNAs may be involved in the transition of LNCaP to an androgen-independent phenotype. In addition, we have identified 43 novel miRNAs from the androgen dependent and independent PCa library and 3 of them are specific to the androgen-independent PCa. Function annotation of target genes indicated that most of these differentially expressed miRNAs tend to target genes involved in signal transduction and cell communication, epically the MAPK signaling pathway. The small RNA transcriptomes obtained in this study provide considerable insights into a better understanding of the expression and function of small RNAs in the development of androgen-independent prostate cancer.

  13. Clinical Application of Targeted Deep Sequencing in Solid-Cancer Patients and Utility for Biomarker-Selected Clinical Trials.

    Science.gov (United States)

    Kim, Seung Tae; Kim, Kyoung-Mee; Kim, Nayoung K D; Park, Joon Oh; Ahn, Soomin; Yun, Jae-Won; Kim, Kyu-Tae; Park, Se Hoon; Park, Peter J; Kim, Hee Cheol; Sohn, Tae Sung; Choi, Dong Il; Cho, Jong Ho; Heo, Jin Seok; Kwon, Wooil; Lee, Hyuk; Min, Byung-Hoon; Hong, Sung No; Park, Young Suk; Lim, Ho Yeong; Kang, Won Ki; Park, Woong-Yang; Lee, Jeeyun

    2017-10-01

    Molecular profiling of actionable mutations in refractory cancer patients has the potential to enable "precision medicine," wherein individualized therapies are guided based on genomic profiling. The molecular-screening program was intended to route participants to different candidate drugs in trials based on clinical-sequencing reports. In this screening program, we used a custom target-enrichment panel consisting of cancer-related genes to interrogate single-nucleotide variants, insertions and deletions, copy number variants, and a subset of gene fusions. From August 2014 through April 2015, 654 patients consented to participate in the program at Samsung Medical Center. Of these patients, 588 passed the quality control process for the 381-gene cancer-panel test, and 418 patients were included in the final analysis as being eligible for any anticancer treatment (127 gastric cancer, 122 colorectal cancer, 62 pancreatic/biliary tract cancer, 67 sarcoma/other cancer, and 40 genitourinary cancer patients). Of the 418 patients, 55 (12%) harbored a biomarker that guided them to a biomarker-selected clinical trial, and 184 (44%) patients harbored at least one genomic alteration that was potentially targetable. This study demonstrated that the panel-based sequencing program resulted in an increased rate of trial enrollment of metastatic cancer patients into biomarker-selected clinical trials. Given the expanding list of biomarker-selected trials, the guidance percentage to matched trials is anticipated to increase. This study demonstrated that the panel-based sequencing program resulted in an increased rate of trial enrollment of metastatic cancer patients into biomarker-selected clinical trials. Given the expanding list of biomarker-selected trials, the guidance percentage to matched trials is anticipated to increase. © AlphaMed Press 2017.

  14. Deep Sequencing Reveals Novel Genetic Variants in Children with Acute Liver Failure and Tissue Evidence of Impaired Energy Metabolism.

    Directory of Open Access Journals (Sweden)

    C Alexander Valencia

    Full Text Available The etiology of acute liver failure (ALF remains elusive in almost half of affected children. We hypothesized that inherited mitochondrial and fatty acid oxidation disorders were occult etiological factors in patients with idiopathic ALF and impaired energy metabolism.Twelve patients with elevated blood molar lactate/pyruvate ratio and indeterminate etiology were selected from a retrospective cohort of 74 subjects with ALF because their fixed and frozen liver samples were available for histological, ultrastructural, molecular and biochemical analysis.A customized next-generation sequencing panel for 26 genes associated with mitochondrial and fatty acid oxidation defects revealed mutations and sequence variants in five subjects. Variants involved the genes ACAD9, POLG, POLG2, DGUOK, and RRM2B; the latter not previously reported in subjects with ALF. The explanted livers of the patients with heterozygous, truncating insertion mutations in RRM2B showed patchy micro- and macrovesicular steatosis, decreased mitochondrial DNA (mtDNA content <30% of controls, and reduced respiratory chain complex activity; both patients had good post-transplant outcome. One infant with severe lactic acidosis was found to carry two heterozygous variants in ACAD9, which was associated with isolated complex I deficiency and diffuse hypergranular hepatocytes. The two subjects with heterozygous variants of unknown clinical significance in POLG and DGUOK developed ALF following drug exposure. Their hepatocytes displayed abnormal mitochondria by electron microscopy.Targeted next generation sequencing and correlation with histological, ultrastructural and functional studies on liver tissue in children with elevated lactate/pyruvate ratio expand the spectrum of genes associated with pediatric ALF.

  15. Appearances can be deceptive: revealing a hidden viral infection with deep sequencing in a plant quarantine context.

    Science.gov (United States)

    Candresse, Thierry; Filloux, Denis; Muhire, Brejnev; Julian, Charlotte; Galzi, Serge; Fort, Guillaume; Bernardo, Pauline; Daugrois, Jean-Heindrich; Fernandez, Emmanuel; Martin, Darren P; Varsani, Arvind; Roumagnac, Philippe

    2014-01-01

    Comprehensive inventories of plant viral diversity are essential for effective quarantine and sanitation efforts. The safety of regulated plant material exchanges presently relies heavily on techniques such as PCR or nucleic acid hybridisation, which are only suited to the detection and characterisation of specific, well characterised pathogens. Here, we demonstrate the utility of sequence-independent next generation sequencing (NGS) of both virus-derived small interfering RNAs (siRNAs) and virion-associated nucleic acids (VANA) for the detailed identification and characterisation of viruses infecting two quarantined sugarcane plants. Both plants originated from Egypt and were known to be infected with Sugarcane streak Egypt Virus (SSEV; Genus Mastrevirus, Family Geminiviridae), but were revealed by the NGS approaches to also be infected by a second highly divergent mastrevirus, here named Sugarcane white streak Virus (SWSV). This novel virus had escaped detection by all routine quarantine detection assays and was found to also be present in sugarcane plants originating from Sudan. Complete SWSV genomes were cloned and sequenced from six plants and all were found to share >91% genome-wide identity. With the exception of two SWSV variants, which potentially express unusually large RepA proteins, the SWSV isolates display genome characteristics very typical to those of all other previously described mastreviruses. An analysis of virus-derived siRNAs for SWSV and SSEV showed them to be strongly influenced by secondary structures within both genomic single stranded DNA and mRNA transcripts. In addition, the distribution of siRNA size frequencies indicates that these mastreviruses are likely subject to both transcriptional and post-transcriptional gene silencing. Our study stresses the potential advantages of NGS-based virus metagenomic screening in a plant quarantine setting and indicates that such techniques could dramatically reduce the numbers of non

  16. Appearances can be deceptive: revealing a hidden viral infection with deep sequencing in a plant quarantine context.

    Directory of Open Access Journals (Sweden)

    Thierry Candresse

    Full Text Available Comprehensive inventories of plant viral diversity are essential for effective quarantine and sanitation efforts. T